Subsections


8.6 Development Release Series 7.1

This is the development release series of Condor. The details of each version are described below.


Version 7.1.4

Release Notes:

  • The owner of the log file for the condor_vm-gahp has changed to the condor user. In Condor 7.1.2 and previous versions, it was owned by the user that the virtual machine is started under. Therefore, the owner of and permissions on an existing log file are likely to be incorrect. To correct the problem, an administrator may modify file permissions such that the condor user may read and write the log file. Alternatively, an administrator may delete the file, and Condor will create a new file with the expected owner and permissions. In addition, the definition for VM_GAHP_LOG in the condor_config.generic file has changed for Condor 7.1.3.

  • The vm universe no longer supports the use of the xm command for running Xen virtual machines. The virsh tool should be used instead.

  • Condor no longer supports the standard universe feature in its ports to Solaris. We may resurrect this feature in the future if demand for it on this port grows again to sufficient levels.

New Features:

  • Local entries in the configuration file may now be specified by pre-pending a local name and a period to the normal name. Local settings take precedence over the other settings. The local name can be specified on the command line to all daemons via the new -local-name command line option.

    See section 3.3.1 for more details on how the local name will be used in the configuration, and section 3.9.2 for more details on the command line parameters.

  • Dynamic Startd Provisioning: New configuration options allow for slots to be broken into job-sized pieces. While this feature is still under ongoing development, we felt that what we had so far, although not yet fulfilling our complete vision, is useful enough in its present form to bring value to some installations.

  • condor_submit_dag is now automatically run recursively on nested DAGs (unless the new -no_recurse option is specified). See [*] for details.

  • Added the new SUBDAG EXTERNAL keyword (for specifying nested DAGs) to condor_dagman. See [*] for details.

  • It is now possible to have multiple rotations of the ``event log'' file, such as ``EventLog'', ``EventLog.1'', ``EventLog.2'', ...

  • The VM universe can now run VMware virtual machines on machines using privilege separation without requiring the condor_vm-gahp binary to be setuid root. Running the condor_vm-gahp as setuid root is no longer supported for VMware or Xen.

  • Condor now supports the ability for the condor_master to run a program as it shuts down. This can be particularly useful for doing a graceful shutdown, followed by, a reboot. This is accomplished through the new MASTER_SHUTDOWN_$<$Name$>$ configuration variable. The configuration variable MASTER_SHUTDOWN_$<$Name$>$ is defined on page [*]), and the manual page for condor_set_shutdown is on page [*].

  • The condor_lease_manager is a new daemon. It provides a mechanism for managing leases to resources described by Condor's ClassAd mechanism. These resources and leases are managed to be persistent.

  • VM universe now works with privilege separation (PrivSep) for VMware jobs. Xen is still not supported in PrivSep mode.

  • Added the DIR directive for the SPLICE keyword in the DAGMan language. Please read section 2.10.6 on page [*] for more information.

  • For gt4 type grid jobs (i.e. WS GRAM), include a request to retry failed attempts at file clean-up in the RSL job description.

  • Improved the scalability of some algorithms used by the condor_schedd and condor_negotiator when dealing with large numbers of startds.

  • Added the ability for the condor_master (actually, any DaemonCore process with children) to kill child processes that have quit responding SIGABRT instead of SIGKILL. This is for debugging purposes on UNIX systems, and is controlled by the new NOT_RESPONDING_WANT_CORE configuration parameter. If the child process is configured with CREATE_CORE_FILES enabled, the child process will then generate a core dump. This feature is currently implemented only on UNIX systems.

    See NOT_RESPONDING_WANT_CORE on page [*], NOT_RESPONDING_TIMEOUT on page[*], and CREATE_CORE_FILES on page [*] for more details.

  • Condor can now be configured to keep a backup of the job queue log on a local file system in case condor_schedd operations involving writes, flushes, or syncs to the job queue log fail. This is most likely to happen when the job queue log is stored on a network file system like NFS. Such a backup enables an administrator to see that a job failed to submit, but does not perform any automatic recovery. See below for the these configuration parameters.

  • Added preliminary support for ``Green Computing''. This is supported only on Linux and Windows. See section 3.16 on page [*] on ``Power Management'' for more details.

Configuration Variable Additions and Changes:

  • Local versions of configuration parameters can now be specified via the use of the ``-local-name'' command line parameters (see the above ``New Features'' entry).

  • A new configuration parameter EVENT_LOG_MAX_ROTATIONS has been added to allow multiple rotations of the event log file. See [*] for details.

  • A new configuration parameter EVENT_LOG_ROTATION_LOCK has been added to allow allow configuration of an alternate file for Condor to use while rotating event log files. See [*] for details.

  • The configuration parameter MAX_EVENT_LOG has been renamed to EVENT_LOG_MAX_SIZE. For backward compatibility, if EVENT_LOG_MAX_SIZE is not defined, Condor will also try MAX_EVENT_LOG. See [*] for details.

  • The condor_vm-gahp no longer requires its own configuration file. It now uses the normal Condor configuration file. Parameters that used to reside in the condor_vm-gahp's file should now be placed in the Condor configuration file.

  • The following VM universe-related configuration parameters have been removed:
    • VM_GAHP_CONFIG
    • VM_MAX_MEMORY
    • XEN_CONTROLLER
    • XEN_VIF_PARAMETER
    • XEN_NAT_VIF_PARAMETER
    • XEN_BRIDGE_VIF_PARAMETER
    • XEN_IMAGE_IO_TYPE

    VMWARE_LOCAL_SETTINGS_FILE and XEN_LOCAL_SETTINGS_FILE have been added. They allow a machine administrator to add settings to the virtual machine configuration files written by Condor for VMware and Xen. See [*] and [*] for details.

  • The configuration parameter family MASTER_SHUTDOWN_$<$Name$>$ can be used in conjunction with condor_set_shutdown to cause the condor_master to execute a specified program as it shuts down. See [*] and condor_set_shutdown manual page for more details.

  • The configuration parameter NOT_RESPONDING_WANT_CORE controls the type of signal sent to child processes that DaemonCore has determined are no longer responding. See the above discussion of the addition of this feature and NOT_RESPONDING_WANT_CORE on page [*] for details.

  • The configuration parameter LOCAL_QUEUE_BACKUP_DIR should be set to the pathname of a directory that is writable by the Condor user and is located on a non-network file system. This is part of the ``Job Queue Backup'' feature, above.

  • The configuration parameter LOCAL_XACT_BACKUP_FILTER controls whether or not the condor_schedd will attempt to keep backups of transactions that were not written the job queue log. If it is set to to FAILED, the condor_schedd will attempt to keep a backup of the transaction in the local queue backup directory, defined by LOCAL_QUEUE_BACKUP_DIR, only if operations fail on the job queue log. If it is set to none NONE, no backups should be performed even in the event of failure. If it is set to ALL, then at all transactions should be backed up. The ALL value will create quite a large number of files and slow the condor_schedd substantially; it is only likely to be useful for users who are developing or debugging Condor. This is part of the Job Queue Backup feature.

Bugs Fixed:

  • In some rare cases, the condor_startd failed to fully preempt jobs. The job itself was killed, but the condor_starter process watching over it would not be killed. The slot would then stay in the Preempting state indefinitely.

  • condor_q performed poorly when querying a remote pool, using -pool. It was using an older latency-bound protocol even when the remote condor_schedd was new enough to use the improved protocol that first appeared in version 6.9.3.

  • When using USE_VISIBLE_DESKTOP the user's (slot or owner) access-control entry removed from the Desktop's access-control list. This fixes the previous behavior were users were added and never removed, resulting in an overflow in access-control list, which can only contain a fixed number of access-control entries.

  • Fixed a bug where if log line caching was enabled in condor_dagman and condor_dagman failed during the recovery process, the cache would stay active. Now the cache is disabled in all cases at the end of recovery.

  • Fixed a couple of bugs relevant only to the GLEXEC_STARTER mode of operation. One bug would result in the SPOOL directory being deleted if local universe jobs (which are not supported in GLEXEC_STARTER mode) were submitted. The other bug prevented COD jobs from running. Neither of these are problems for the newer recommended GLEXEC_JOB mode.

  • Fixed a bug that could cause the condor_procd to crash, depending on the timing of its process snapshots.

  • Fixed a bug that caused job status notifications from WS GRAM 4.2 servers to be lost.

  • Fixed a file descriptor leak in the condor_vm-gahp.

  • Jobs now go on hold with a clear hold reason if a path to a directory is put in the transfer files list. Previously, the attempt to run the job would simply fail and return to the idle state.

  • If MAX_EVENT_LOG set to 0, then let event log grow without bounds. Previously this behavior was broken, and setting MAX_EVENT_LOG to 0 resulted in the log rotating with every event. Now it works as documented.

Known Bugs:

  • When fixing the USE_VISIBLE_DESKTOP bug, a new one was inadvertently introduced. The bug manifests irrespective of the definition of USE_VISIBLE_DESKTOP : the new code attempts to remove the current user's access-control entry from the Desktop's access-control list even when it was not added by Condor. This has the effect of inhibiting the creation of new process for the logged on user.

Additions and Changes to the Manual:

  • The extra space character injected into the names of Condor daemons and programs has been removed.

  • Previously undocumented Condor Perl module subroutines have been documented.


Version 7.1.3

Release Notes:

  • This developer release includes the majority of the bug fixes released in stable version 7.0.5, including the security patches documented in that release. See section 8.7 below.

  • Updated the version of Globus Toolkit: The Condor binaries are now linked against Globus v4.2.0.

  • Updated the version of OpenSSL: The Condor binaries are now linked against OpenSSL 0.9.8h.

  • Updated the version of GCB: The Condor binaries are now linked against GCB 1.5.6.

  • Changes to the ALLOW_* and DENY_* configuration variables no longer require the use of the -full option to condor_reconfig upon reconfiguration.

New Features:

  • Added a new mechanism termed Concurrency Limits. This mechanism allows the Condor pool administrator to define an arbitrary number of consumable resources in the configuration file of the matchmaker. The availability of these consumable resources will be taken into account during the matchmaking process. Individual jobs can specify how many of each type of consumable resource is required. Typical applications of Concurrency Limits could include management of software licenses, database connections, or any other consumable resource that is external to Condor. NOTE: Documentation still being written on this feature. See section 3.13.14) for documentation.

  • Added support for Condor to manage serial high throughput computing workloads on the IBM Blue Gene supercomputer. The IBM Blue Gene/P is now a supported platform.

  • Extended Job Hooks (see section 4.4) to allow for alternate transformation and/or monitoring engines for the Job Router (see section 5.6. Routing is still controlled by the Job Router, but if Job Router Hooks are configured, then external programs or scripts can be used to transform and monitor the job instead of Condor's internal engine.

  • Added support for the new protocol for WS GRAM introduced in Globus 4.2. For each WS GRAM resource, Condor automatically determines whether it is speaking the 4.0 or 4.2 version of the protocol and responds appropriately. When setting grid_resource in the submit file, use gt4 for both WS GRAM 4.0 and 4.2.

  • Added the ability for Windows slot users to load and run their jobs within the context of their profile. This includes the My Documents directory hierarchy, its monikers, and the user's registry hive. To use the profile, add a load_profile command to the submit description file. A current restriction prevents the use of load_profile in conjunction with run_as_owner. Please refer to section 6.2.5 for further details.

  • The StarterLog file for local universe jobs now displays the job id in each line in the file, so that interleaved messages relevant to different jobs running concurrently can be identified.

  • Added the -AllowVersionMismatch command line option to condor_submit_dag and condor_dagman to (if absolutely necessary) allow a version mismatch between condor_dagman and the .condor.sub file used to submit it. This permits a Condor version mismatch between condor_submit_dag and condor_dagman).

  • Streamlined the protocol between submit and execute machines; in some instances, fewer messages will be exchanged over the network.

  • When network requests are denied because of the authorization policy, Condor now logs an explanation in the daemon log that denied the request. This helps the administrator understand why the policy denied the request, in case it is not obvious. A similar explanation may be logged for requests that are accepted. This is only generated if D_SECURITY is added to the daemon's debug options.

Configuration Variable Additions and Changes:

  • Added the new configuration variable MAX_PENDING_STARTD_CONTACTS . This limits the number of simultaneous connection attempts by the condor_schedd when it is requesting claims from the condor_startds. The intention is to protect the condor_schedd from being overloaded by authentication operations. The default is 0, which indicates no limit.

  • Added the new configuration variable SEC_INVALIDATE_SESSIONS_VIA_TCP , which defaults to True. Previously, attempts to use an invalid security session resulted in a UDP rather than a TCP response. In networks with different firewall rules for UDP and TCP, the filtering of the session invalidation messages was easily overlooked, since it would not typically happen during the initial vetting of the pool. If these packets were filtered out, then at the subsequent condor_collector restart, no daemons would be able to advertise themselves to the pool until their existing security sessions expired. The old behavior can be achieved by setting this configuration parameter to False.

  • Added the new configuration variable SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION . This is a special authentication mechanism designed to minimize overhead in the condor_schedd when communicating with the execute machine. Essentially, matchmaking results in a secret being shared between the condor_schedd and condor_startd, and this is used to establish a strong security session between the execute and submit daemons without going through the usual security negotiation protocol. This is especially important when operating at large scale over high latency networks, as in a glidein pool with one submit machine and thousands of execute machines on a network with 0.1 second round trip times. See [*] for details.

  • Added configuration entry GLEXEC_JOB which replaces the functionality previously encapsulated in GLEXEC_STARTER . Using GLEXEC_JOB enables privilege separation in Condor via glexec in a manner much more consistent with how Condor's own privilege separation mechanism works. Specifically, the user identity switching will now occur between the condor_starter and the actual user job.

  • Added configuration parameter AMAZON_GAHP_WORKER_MAX_NUM to specify a ceiling on the number of threads spawned on the submit machine to support jobs running on Amazon EC2. Defaults to 5.

Bugs Fixed:

  • Includes bug fixes from Condor v7.0.5, including the security fixes. See section 8.7.

  • Fixed a bug in the condor_schedd that would cause it to except if a crontab entry was incorrectly formatted.

  • Fixed a bug in the CondorView server (collector) that caused it to except (crash) when it received a machine ClassAd without a valid state. It now logs this under level D_ALWAYS and ignores the ClassAd.

  • Fixed a bug from Condor version 7.1.2 that would cause Condor daemons to start consuming a lot of cpu time after rare types of communication failures during security negotiation.

  • Fixed a bug from Condor version 7.1.2 that in rare cases could cause Condor to fail to recognize when a call to exec() fails on Unix platforms.

  • Fixed problems with configuration parameter JOB_INHERITS_STARTER_ENVIRONMENT when using PrivSep.

  • Improved the deletion of Amazon EC2 jobs when the server is unreachable.

  • Fixed problems with Condor parallel universe jobs when recovering from a reboot of the submit machine.

Known Bugs:

  • None.

Additions and Changes to the Manual:

  • None.


Version 7.1.2

Release Notes:

  • None.

New Features:

  • Added formatTime(), a built-in ClassAd function to create a formatted representation of the time. A detailed description of this function is available in section 4.1.2, which documents all of the available built-in ClassAd functions.

  • Improved Condor's authentication handshake, so that daemons such as the condor_schedd, which initiate connections to other daemons, spend less time waiting for responses. Authentication over high latency networks is still rather expensive in Condor, so it still may be necessary to scale up by running more condor_schedd and condor_collector daemons than one would need for equivalent workloads on a low latency network. Additional improvements in this area are planned.

Configuration Variable Additions and Changes:

  • None.

Bugs Fixed:

  • Fixed a memory leak, introduced in Condor version 7.1.1, which caused the condor_startd daemon to grow without bound.

  • Fixed a bug in condor_dagman that caused the user log file of the first node job in a DAG to get created with 0600 permissions, regardless of the user's umask. Note that this fix involved removing the -condorlog and -storklog command-line arguments from condor_submit_dag and condor_dagman.

  • Fixed a problem from Condor version 7.1.1 that in some cases caused the condor_starter to stop sending updates about the job status or to send updates too frequently.

Known Bugs:

  • None.

Additions and Changes to the Manual:

  • None.


Version 7.1.1

Release Notes:

  • None.

New Features:

  • Added a new feature to condor_dagman which caches the log lines emitted to the dagman.out file when in recovery mode and emits the cache as one call to the logging subsystem when the cache size limit is reached. Under NFS conditions, this prevents an open and close per line of the log and greatly improves performance. This feature is off by default and is controlled by DAGMAN_DEBUG_CACHE_ENABLE, which takes a boolean, and DAGMAN_DEBUG_CACHE_SIZE, which is an integer in bytes of how big the cache should be before flushing.

  • Included some Windows example jobs (submit files and binaries).

  • Added a new feature to the DAGMan language called splicing. Please read section 2.10.6 on page [*].

  • The Prepare Job Hook can now modify the job ClassAd before execution. For a complete description of the new hook system, read section 4.4 on page [*].

  • Condor now coerces the result of $$([]) expressions within submit description files to strings. This means that submit files can do simple arithmetic. For example, you can describe a command-line argument as:

    arguments = $$([$(PROCESS)+100])

    and condor_submit will expand the argument to be the expected value.

  • Condor daemons now periodically update the ctime of their log files, instead of the mtime, as they previously did. At start up, the daemons use this ctime to determine how long they may have been down.

  • Added the capability to the condor_startd to allow it to power down machines based a user specified policy. See section 3.16 on [*] on Power Management for more details.

  • condor_off now supports the -peaceful option for the condor_schedd, in addition to the existing support that already existed for the condor_startd. When peacefully shut down, the condor_schedd stops starting new jobs and waits for all running jobs to finish before exiting. The default shut down behavior is still -graceful, which checkpoints and stops all running standard universe jobs and gracefully disconnects from other types of jobs in the hopes of later restarting and reconnecting to them without any disturbance to the running job.

  • The condor_job_router now supports deletion of attributes when transforming job ClassAds from vanilla to grid universe. It also behaves more deterministically when choosing from multiple possible routes. Rather than picking one at random, it uses a round-robin selection.

  • condor_dagman now checks that its submit file was generated by a condor_submit_dag with the same version as condor_dagman itself. It is a fatal error for the versions to differ.

Configuration Variable Additions and Changes:

  • Added DAGMAN_DEBUG_CACHE_ENABLE and DAGMAN_DEBUG_CACHE_SIZE which allow DAGMan to maintain a cache of log lines and write out the cache as one open/write/close sequence. DAGMAN_DEBUG_CACHE_ENABLE is a boolean which turns on the ability for caching and defaults to False. DAGMAN_DEBUG_CACHE_SIZE is a positive integer and represents the size of the cache in bytes and defaults to 5 Megabytes.

  • The existing BIND_ALL_INTERFACES configuration variable now defaults to True.

  • Added the HIBERNATE expression, which, when evaluated in the context of each slot, determines if a machine should enter a low power state. See page [*] for more information.

  • Added the HIBERNATE_CHECK_INTERVAL configuration variable, which, if set to a non-zero value, enables the condor_startd to place the machine in a low power state based on the evaluation of the HIBERNATE expression. See page [*] for more information.

  • The existing VALID_SPOOL_FILES configuration variable now automatically includes SCHEDD.lock, the lock file used for high availability condor_schedd fail over. Other high availability lock files are not currently included.

  • Added the SEC_DEFAULT_AUTHENTICATION_TIMEOUT configuration variable, where the definition DEFAULT may be replaced by the usual list of contexts for security settings (for example, CLIENT, READ, and WRITE). This specifies the number of seconds that Condor should allow for the authentication of network connections to complete. Previously, GSI authentication was hard-coded to allow 5 minutes for authentication. Now it uses the same default as all other methods: 20 seconds.

  • Added the STARTER_UPDATE_INTERVAL_TIMESLICE configuration variable, which specifies the highest fraction of time that the condor_starter should spend collecting monitoring information about the job, such as disk usage. It defaults to 0.1. If checking the disk usage of the job takes a long time, the condor_starter will monitor less frequently than specified by STARTER_UPDATE_INTERVAL.

Bugs Fixed:

  • Fixed a bug introduced in 7.1.0 affecting configurations in which authentication of all communication between the condor_shadow and condor_schedd is required. This caused failure in the final update after the job had finished running. The result was that the job would return to the idle state to run again.

  • Fixed a bug in Java universe where each slot would be told to potentially use all the memory on the machine. Now, each JVM receives the physical memory divided by the number of slots.

  • On Windows, slot users would sometimes show up in the Windows Welcome Screen. This has now been resolved. The slot users need to be manually removed for this to take effect and the machine may need to be rebooted for the setting to be honored.

  • Fixed a bug in the ClassAd string() function. The function now properly converts integers and floats to their string representation.

  • The Windows Installer is now completely internationalized: it will no longer fail to install because of a missing "Users" group; instead, it will use the regionally appropriate group.

  • Interoperability with Samba (as a PDC) has been improved. Condor uses a fast form of login during credential validation. Unfortunately, this login procedure fails under Samba, even if the credentials are valid. The new behavior is to attempt the fast login, and on failure, fall back to the slower form.

  • Windows slot users no longer have the Batch Privilege added, nor does Condor first attempt a Batch login for slot users. This was causing permission problems on hardened versions of Windows, such as Windows Sever 2003, in that not interactive users lacked the permission to run batch files (via the cmd.exe tool). This affected any user submitting jobs that used batch files as the executable.

  • If the IWD is not defined in a job classified ad that was either fetched by the condor_startd via job hooks, or pushed to the condor_startd via COD, the condor_starter no longer treats this as a fatal error, and instead uses the temporary job execution sandbox as the initial working directory.

  • Made some fixes to the new-style rescue DAG feature:
    • condor_submit_dag no longer needs the -force flag if a rescue DAG will be run, even if the files generated by condor_submit_dag already exist.
    • condor_submit_dag with the -force flag now renames any existing new-style rescue DAG files, and therefore runs the original DAG.

  • Fixed a problem that caused new-style rescue DAGs to fail when condor_submit_dag is invoked with the -usedagdir flag.

Known Bugs:

  • None.

Additions and Changes to the Manual:

  • The manual now contains Windows installation instructions for controlling the configuration for the vm universe.


Version 7.1.0

Release Notes:

  • Upgrading to 7.1.0 from previous versions of Condor will make existing Standard Universe jobs that have already run fail to match to machines running Condor 7.1.0 unless the job previously ran on a machine using the Red Hat 5.0 release of Condor. This is because the value of the CheckpointPlatform attribute of the machine ClassAd has changed in order to better represent checkpoint compatibility. If this affects you, you can use condor_qedit to change the LastCheckpointPlatform attribute of existing Standard Universe jobs to match the new CheckpointPlatform advertised by the machine ClassAd where the job last ran.

  • Condor no longer supports root configuration files (for example, /etc/condor/condor_config.root,  condor/condor_config.root, and the file defined by the configuration variable LOCAL_ROOT_CONFIG_FILE). This feature was intended to give limited powers to a Unix administrator to configure some aspects of Condor without gaining root powers. However, given the flexibility of the configuration system, we decided that this was not practical. As long as Condor is started up as root, it should be clearly understood that whoever has the ability to edit the Condor configuration files can effectively run arbitrary programs as root.

New Features:

  • In the past, Condor has always sent work to the execute machines by pushing jobs to the condor_startd, either from the condor_schedd or via condor_cod. As of version 7.1.0, The condor_startd now has the ability to pull work by fetching jobs via a system of plug-ins or hooks. Additional hooks are invoked by the condor_starter to help manage work (especially for fetched jobs, but the condor_starter hooks can be defined and invoked for other kinds of jobs as well). For a complete description of the new hook system, read section 4.4 on page [*].

  • Added the capability to insert commands into the .condor.sub file produced by condor_submit_dag with the -append and -insert_sub_file command-line arguments to condor_submit_dag and the DAGMAN_INSERT_SUB_FILE configuration variable. See the condor_submit_dag manual page on page [*] and the configuration variable definition on page [*] for more information.

  • For platforms running a Windows operating system, the Arch machine ClassAd attribute more correctly reflects the architectures supported. Instead of values "INTEL" and "UNDEFINED", the values will now be: "INTEL" for x86, "IA64" for Intel Itanium, and "X86_64" for both AMD and Intel 64-bit processors. These values are listed in the unnumbered subsection labeled Machine ClassAd Attributes on page [*].

  • The Windows MSI installer now supports extended vm universe options. These new options include: the ability to set the networking type, how much memory the vm universe can use on a host, and the ability to set the version of VMware installed on the host.

  • The condor_status and condor_q command line tools now have a version option which prints the version of those specific tools. This can be useful when multiple versions of Condor are installed on the same machine.

  • The configuration variable CONDOR_VIEW_HOST may now contain a port number and may (if desired) refer to a condor_collector daemon running on the same host as the condor_collector that is forwarding ads. It is also now possible to use the forwarded ads for matchmaking purposes. For example, several collectors could forward ads to a single aggregating collector which a condor_negotiator then uses as its source of information for matchmaking.

  • condor_dagman deals with rescue DAGs in a more sophisticated way; this is especially helpful for nested DAGs. See the rescue DAG subsection [*] of the condor_dagman manual section for more information.

  • Additional logging details for unusual error cases to help identify problems.

  • A new (optional) daemon named condor_job_router has been added, so far only on Unix. It may be configured to transform vanilla universe jobs into grid universe jobs, for example to send excess jobs to other sites via Condor-C or Condor-G. For details, see page [*].

  • Previously, condor_q -better-analyze was supported on most but not all versions of Linux. It is now supported on all Unix platforms but not yet on Windows.

Configuration Variable Additions and Changes:

  • Added new configuration variables ALLOW_CLIENT and DENY_CLIENT as client-side authorization controls. When using a mutual authentication method (such as GSI, SSL, or Kerberos), these variables allow the specification of which authenticated servers the Condor tools and daemons should trust when they form a connection to the server. Because of the addition of these variables, the GSI-specific, client-side authorization configuration variable GSI_DAEMON_NAME is retired, and no longer valid.

  • Added the DAGMAN_INSERT_SUB_FILE variable, which allows a file of commands to be inserted into .condor.sub files generated by condor_submit_dag. See page [*] for more information.

  • The semantics of CLAIM_WORKLIFE were previously not clearly defined before the start of the first job. A delay between the condor_schedd claiming a slot and the condor_shadow starting a job could be caused by the submit machine being very busy or by JOB_START_DELAY. Previously, such a delay would unpredictably result in the first job being rejected if CLAIM_WORKLIFE expired during that time. Now, CLAIM_WORKLIFE is defined to apply only after the first job has started. Therefore, setting it to zero has the effect of allowing exactly one job per claim to run. The default is still the special value -1, which places no limit on how long the slot may continue accepting new jobs from the condor_schedd that claimed it.

  • Added the DAGMAN_OLD_RESCUE variable, which controls whether condor_dagman writes rescue DAGs in the old way. See page [*] for more information.

  • Added the DAGMAN_AUTO_RESCUE variable, which controls whether condor_dagman automatically runs an existing rescue DAG. See page [*] for more information.

  • Added the DAGMAN_MAX_RESCUE_NUM variable, which controls the maximum "new-style" rescue DAG number written or automatically run by condor_dagman. See page [*] for more information.

Bugs Fixed:

  • The Condor Build ID is now printed by condor_version and placed in the logs for machines running a Windows operating system.

  • condor_quill and the condor_dbmsd correctly register themselves with the Windows firewall.

  • condor_submit_dag now avoids possibly running off the end of the argument list if an argument requiring a value does not have one.

  • The condor_submit_dag -debug argument now must be specified with at least -de to avoid conflict with the -dagman argument.

  • Added missing information about the -config argument to condor_submit_dag's usage message.

  • condor_dagman no longer considers duplicate edges in a DAG a fatal error (it is now a warning).

Known Bugs:

  • No hook is invoked if a fetched job does not contain enough data to be spawned by a condor_starter or if other errors prevent the job from being run after the condor_startd agrees to accept the work. This limitation will be addressed in a future version of Condor, most likely via the addition of a new hook invoked whenever the condor_starter fails to spawn a job. For more information about the new hook system included in Condor version 7.1.0, read section 4.4 on page [*].

Additions and Changes to the Manual:

  • Added "WINNT60" for the Vista operating system to the documented list of possible values for the machine ClassAd attribute OpSys.

condor-admin@cs.wisc.edu