Subsections
8.2 Development Release Series 7.5
This is the development release series of Condor.
The details of each version are described below.
Version 7.5.6
Release Notes:
- Condor version 7.5.6 not yet released.
New Features:
- Condor no longer relies on DNS to determine its IP address.
Instead, it examines the list of system network devices.
Configuration Variable and ClassAd Attribute Additions and Changes:
- NETWORK_INTERFACE and
PRIVATE_NETWORK_INTERFACE may now specify a network
device name as well as an IP address. The `*' character may also
be used as a wildcard. This makes it easier to apply the same
configuration to a large number of machines, because the IP address
does not have to be customized for each host.
Bugs Fixed:
- A bug has been fixed that caused SOAP transactions in the
condor_schedd daemon to result in log message of the form, ``Timer
X not found''. This bug is not known to have produced any other
undesired behaviors.
Known Bugs:
Additions and Changes to the Manual:
Version 7.5.5
Release Notes:
- Condor version 7.5.5 not yet released.
- This version of Condor uses a different layout in the spool
directory for storing files belonging to jobs that are in the queue.
Conversion of the spool directory is automatic when upgrading, but
be aware that downgrading to a previous version of Condor
requires extra effort. The procedure for downgrading is either
to drain all jobs with spooled files from the queue, or to manually
convert the spool back to the older format. To manually convert
back to the older format, stop Condor and back up the spool directory
in case of problems. Then move all subdirectories matching the form
$(SPOOL)/<#>/<#>/cluster<#>.proc<#>.subproc<#> into
$(SPOOL). Also do this for any files of the form
$(SPOOL)/<#>/cluster<#>.ickpt.subproc<#>. Edit
$(SPOOL)/job_queue.log with a text editor, and change all
references to the old paths to the new paths. Then, remove
$(SPOOL)/spool_version. Finally, start up Condor.
New Features:
- Negotiation is now handled asynchronously in the condor_schedd daemon.
This means that the condor_schedd remains responsive during
negotiation and is less prone to falling behind on communication
with condor_shadow processes.
- Improved monitoring and avoidance of a lock convoy problem
observed when there were more than 30,000 condor_shadow processes.
At this scale,
locking the condor_shadow daemon's log on each write to the log file
has been observed
on Linux platforms to sometimes result in a situation where the system does
very little productive work, and is instead consumed by rapid context
switching between the condor_shadow daemons that are waiting for the lock.
- On Linux platforms, if the condor_schedd daemon's spool directory is
on an ext3 file system, Condor can now scale to a larger number
of spooled jobs. Previously, Condor created two subdirectories
within the spool directory for each spooled job and for each running
job. The ext3 file system only supports 31,997 subdirectories. This
effectively limited the number of spooled jobs to less than 16,000.
Now, Condor creates a hierarchy of subdirectories within
the spool directory, to increase the limit on the number of spooled jobs
in ext3 to 320,000,000, which is likely to be larger than other limits
on the size of the job queue, such as memory.
- The condor_dagman and condor_submit_dag command-line flag
-DumpRescue causes the dump of an incomplete Rescue DAG,
when the parsing of the DAG input file fails.
This may help in figuring out what went wrong.
See section 2.10.7 for complete details on Rescue DAGs.
- The condor_shadow daemon uses less memory than it has since
Condor version 7.5.0.
Memory usage should now be similar to the 7.4 series.
- condor_dagman now has the capability to create the
jobstate.log file needed for the Pegasus workflow manager.
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable LOCK_DEBUG_LOG_TO_APPEND
controls whether a daemon's debug lock is used when appending to the log.
When the default value of False,
the debug lock is only used when rotating the log file.
When True, the debug lock is used when writing to
the log as well as when rotating the log file.
See section 3.3.4 for the complete definition.
Bugs Fixed:
Known Bugs:
Additions and Changes to the Manual:
Version 7.5.4
Release Notes:
- Condor version 7.5.4 released on October 20, 2010.
- All of the bug fixes and features which are in
Condor version 7.4.4 are in this 7.5.4 release.
- The release now contains all header files necessary to compile
code that uses the job log reading and writing utilities contained
in libcondorapi. Some headers were missing starting in Condor 7.5.1.
New Features:
- Concurrency limits now work with parallel universe jobs
scheduled by the dedicated scheduler.
- Transfer of directories is now supported by
transfer_input_files and
transfer_output_files for non-grid universes and
Condor-C. The auto-selection of output files, however, remains the
same: new directories in the job's output sandbox are not
automatically selected as outputs to be transferred.
- Paths other than simple file names with no directory information
in transfer_output_files previously did not have well
defined behavior. Now, paths are supported for non-grid universes
and Condor-C. When a path to an output file or directory is
specified, this specifies the path to the file on the execute side.
On the submit side, the file is placed in the job's initial working
directory and it is named using the base name of the original path.
For example, path/to/output_file becomes output_file
in the job's initial working directory. The name and path of the
file that is written on the submit side may be modified by using
transfer_output_remaps.
- The condor_shared_port daemon is now supported on Windows platforms.
- Jobs can now by submitted to multiple EC2 servers via the amazon
grid type. The server's URL must be specified via the grid_resource
submit description file command for each job.
See section 5.3.7 for details.
- The grid universe's amazon grid type can now be used to submit
virtual machine jobs to Eucalyptus systems via the EC2 interface.
- condor_q now uses the queue-management API's projection feature when
used with -run, -hold, -goodput, -cputime,
-currentrun, and -io options when called with no display options
or with -format.
- Decreased the CPU utilization of condor_dagman when it is
submitting ready jobs into Condor.
- condor_dagman now logs the number of queued jobs in the DAG
that are on hold,
as part of the DAG status message in the dagman.out file.
- condor_dagman now logs a note in the dagman.out file
when the condor_submit_dag and condor_dagman versions differ,
even if the difference is permissible.
- Added the capability for condor_dagman to create and periodically
rewrite a file that lists the status of all nodes within a DAG.
Alternatively, the file may be continually updated as the DAG runs.
See section 2.10.10 for details.
- The condor_schedd daemon now uses a better algorithm for
determining which flocking level is being negotiated. No special
configuration is required for the new algorithm to work. In the
past, the algorithm depended on DNS and the
configuration variables NEGOTIATOR_HOST and
FLOCK_NEGOTIATOR_HOSTS. In some networking environments,
such as that of a multi-homed central manager, it was difficult to
configure things correctly. When wrongly configured, negotiation
would be aborted with the message, Unknown negotiator. The new
algorithm is only used when the condor_negotiator is version 7.5.4 or
newer. Of course, the condor_schedd daemon must still be configured to
authorize the condor_negotiator daemon at the NEGOTIATOR
authorization level.
- condor_advertise has a new option, -multiple, which
allows multiple ClassAds to be published. This is more efficient than
publishing each ClassAd in a separate invocation of condor_advertise.
- The condor_job_router is no longer restricted to routing only vanilla
universe jobs. It also now automatically avoids recursively routing jobs.
- The condor_schedd now writes the submit event to the user job log.
Previously, condor_submit wrote the event.
- The condor_schedd daemon now scales better when there are many
job auto clusters.
- The condor_q command with option -run, -hold,
-goodput, -cputime, -currentrun or -io
is now much more efficient in its communication with the condor_schedd.
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable SOAP_SSL_SKIP_HOST_CHECK
can be used to disable the standard check that a SOAP server's host name
matches the host name in its X.509 certificate. This is useful when submitting
grid type amazon jobs to Eucalyptus servers, which often have certificates
with a host name of localhost.
- Added default values for <SUBSYS>_LOG configuration variables.
If a <SUBSYS>_LOG configuration variable is not set in
files condor_config or condor_config.local,
it will default to $(LOG)/<SUBSYS>LOG.
- The new job ClassAd attribute CommittedSuspensionTime
is a running total of the number of seconds the job has spent in
suspension during time in which the job was not evicted without a
checkpoint. This complements the existing attribute
CumulativeSuspensionTime, which includes all time spent in
suspension, regardless of job eviction.
- The new job ClassAd attributes CommittedSlotTime and
CumulativeSlotTime are just like CommittedTime and
RemoteWallClockTime respectively, except the new attributes
are weighted by the SlotWeight of the machine(s) that ran
the job.
- The new configuration variable
SYSTEM_JOB_MACHINE_ATTRS specifies a list of machine
attributes that should be recorded in the job ClassAd. The default
attributes are Cpus and SlotWeight. When there are
multiple run attempts, history of machine attributes from previous
run attempts may be kept. The number of run attempts to store is
specified by the new configuration variable
SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH , which defaults
to 1. A machine attribute named X will be inserted into the
job ClassAd as an attribute named MachineAttrX0. The previous
value of this attribute will be named MachineAttrX1, the
previous to that will be named MachineAttrX2, and so on, up to
the specified history length. Additional attributes to record may be
specified on a per-job basis by using the new job_machine_attrs
submit file command. The history length may also be extended on a
per-job basis by using the new submit file command
job_machine_attrs_history_length.
- The new configuration variable
NEGOTIATION_CYCLE_STATS_LENGTH specifies how many
recent negotiation cycles should be included in the history that is
published in the condor_negotiator's ClassAd. The default is 3. See
page
for the
definition of this configuration variable, and see
page
for a
list of attributes that are published.
- The configuration variable FLOCK_NEGOTIATOR_HOSTS is now
optional. Previously, the condor_schedd daemon refused to flock without
this setting. When this is not set, the addresses of the flocked
condor_negotiator daemons are found by querying the flocked
condor_collector daemons.
Of course, the condor_schedd daemon must still be configured to
authorize the condor_negotiator daemon at the NEGOTIATOR
authorization level. Therefore, when using host-based security,
FLOCK_NEGOTIATOR_HOSTS may still be useful as a macro for inserting
the negotiator hosts into the relevant authorization lists.
- The configuration variable FLOCK_HOSTS is no longer used.
For backward compatibility, this setting used to be treated as a default
for FLOCK_COLLECTOR_HOSTS and FLOCK_NEGOTIATOR_HOSTS.
- The configuration variable AMAZON_EC2_URL is now only used
for previously-submitted jobs when upgrading Condor to version 7.5.4 or
beyond. New grid type amazon jobs must specify which EC2 service to use
by setting the grid_resource submit description file command.
- The new job ClassAd attribute NumPids is the total number of
child processes a running job has.
- The new configuration variable DAGMAN_MAX_JOB_HOLDS
specifies the maximum number of times a DAG node job is allowed to go
on hold. See section 3.3.26 for details.
- The configuration variable STARTD_SENDS_ALIVES now only
needs to be set for the condor_schedd daemon. Also, the default value has
changed to True.
- The job ClassAd attributes amazon_user_data and
amazon_user_data_file can now both be used for the same
job. When both are provided, the two blocks of data are concatenated,
with the value of the one specified by amazon_user_data
occurring first.
- The new configuration variable GRAM_VERSION_DETECTION
can be used to disable Condor's attempts to distinguish between gt2
(GRAM2) and gt5 (GRAM5) servers.
The default value is True.
If set to False, Condor trusts the gt2 or gt5 value
provided in the job's grid_resource attribute.
- The new job ClassAd attribute ResidentSetSize is an integer
measuring the amount of physical memory in use by the job on the execute
machine in kilobytes.
- The new job ClassAd attribute X509UserProxyExpiration is an
integer representing when the job's X.509 proxy credential will expire,
measured in the
number of seconds since the epoch (00:00:00 UTC, Jan 1, 1970).
- The new configuration variable SCHEDD_CLUSTER_MAXIMUM_VALUE
is an upper bound on assigned job cluster ids. If set to
value
, the maximum job cluster id assigned to any job will be
.
When the maximum id is reached, assignment of cluster ids will wrap around
back to SCHEDD_CLUSTER_INITIAL_VALUE. The default value is zero,
which does not set a maximum cluster id.
- The default value of configuration variable
MAX_ACCEPTS_PER_CYCLE has been changed from 1 to 4.
- The configuration variable NEW_LOCKING , introduced in
Condor version 7.5.2, has been changed to
CREATE_LOCKS_ON_LOCAL_DISK and now defaults to True.
Bugs Fixed:
- Fixed a bug that occurred with x64 flavors of the Windows operating system.
Condor was setting the default value of Arch to INTEL when it
should have been X86_64. This was a consequence of the fact that the
Condor runs in the WOW64 sandbox on 64-bit Windows. This was fixed so that
Arch would contain the value for the native architecture rather than
the WOW64 sandbox architecture.
- Fixed a bug in the user privilege switching code in Windows that
caused the condor_shadow daemon to except when the condor_schedd
daemon attempted to re-use it.
- Fixed the output in the condor_master daemon log file to be
clearer when an authorized user tries to use condor_config_val
-set and ENABLE_PERSISTENT_CONFIG is False.
The previous
output implied that the operation succeeded when, in fact, it did not.
- Since Condor version 7.5.2,
the following condor_job_router features were
effectively non-functional: UseSharedX509UserProxy,
JobShouldBeSandboxed, and JobFailureTest.
- The submit description file command copy_to_spool
did not work properly in Condor version 7.5.3.
When sending the executable to the execute machine, it was
transferred from the original path rather than from the spooled copy
of the file.
- When output files were auto-selected and spooled, Condor-C and
condor_transfer_data would copy back both the output files and
all other contents of the job's spool directory, which typically
included the spooled input and the user log.
Now, only the output files are retrieved.
To adjust which files are retrieved, the job
attribute SpooledOutputFiles can be manipulated, but this
typically should be managed by Condor.
- The condor_master daemon now invalidates its ClassAd,
as represented in the condor_collector daemon, before it shuts down.
- Fixed a bug that caused vm universe jobs to not run
if the VMware .vmx file contained a space.
- Fixed a bug introduced in Condor version 7.5.1 that caused integers
in ClassAd expressions that had leading zeros to be read as octal (base eight).
- Fixed a bug introduced in Condor version 7.5.1 that did not recognize
a semicolon as a separator of function arguments in ClassAds.
- Fixed a bug that caused integers larger than
in a ClassAd
expression to be parsed incorrectly. Now, when these integers are
encountered, the largest 32-bit integer (with matching sign) is used.
- Fixed a bug that caused the condor_gridmanager to exit when
receiving badly-formatted error messages from the nordugrid_gahp.
- Fixed a problem affecting the use of version 7.5.3 condor_startd and
condor_master daemons in a pool with a condor_collector from before
version 7.5.2. On shutdown, the condor_startd and the condor_master
caused all condor_startd and condor_master ClassAds, respectively,
to be removed from the condor_collector.
- Fixed a bug that caused delegation of an X.509 RFC proxy between
two Condor processes to fail.
- Fixed a bug in condor_submit that would cause failures if a file
name containing a space was used with the submit description file commands
append_files, jar_files or
vmware_dir.
- Fixed a bug that could cause the condor_gridmanager to lock up if
a GAHP server it was using wrote a large amount of data to its stderr.
- Fixed a bug that could cause the condor_gridmanager to wrongly
conclude that a gt2 (that is, GRAM2) server was a gt5
(that is, GRAM5) server.
Such a conclusion can be disastrous, as Condor's mechanisms to
prevent overloading a gt2 server are then disabled. The new
configuration variable GRAM_VERSION_DETECTION can be used
to disable Condor's attempts to distinguish between the two.
- Fixed a bug introduced in Condor version 7.5.3.
When file transfer failed for a grid universe job of grid type
cream,
Condor would write a hold event to the job log,
but not actually put the job on hold.
- Fixed a bug in the condor_gridmanager that could cause it to crash
while handling cream grid type jobs destined for different resources.
- Fixed a bug that prevented the condor_shadow from managing
additional jobs after its first job completed when
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION was set to True.
- The timestamps in the log defined by PROCD_LOG
now print the real time.
- Fixed how some daemons advertise themselves to the condor_collector.
Now, all daemons set the attribute MyType to indicate what
type of daemon they are.
- condor_chirp no longer crashes on a put operation,
if the remote file name is omitted.
- Fixed the packaging of Hadoop File System support in Condor. This includes
updating to HDFS 0.20.2 and making the HDFS web interface work properly.
- Condor no longer tries to invoke glexec if the job's X.509 proxy
is expired.
Known Bugs:
- Using host names for host-based authentication,
such as in the definitions of configuration variables
ALLOW_* and DENY_*,
does not work on Mac OS X 10.4.
Later versions of the OS are not affected.
As a work around, IP addresses can be used instead of host names.
Additions and Changes to the Manual:
Version 7.5.3
Release Notes:
- Condor version 7.5.3 released on June 29, 2010.
New Features:
- condor_q -analyze now notices the -l option, and if both
are given, then the analysis prints out the list of machines
in each analysis category.
- The behavior of macro expansion in the configuration file has
changed. Previously, most macros were effectively treated as
undefined unless explicitly assigned a value in the configuration
file. Only a small number of special macros had pre-defined values
that could be referred to via macro expansion. Examples include
FULL_HOSTNAME and DETECTED_MEMORY. Now, most
configuration settings that have default values can be referred to
via macro expansion. There are a small number of exceptions where
the default value is too complex to represent in the current
implementation of the configuration table. Examples include the
security authorization settings. All such configuration settings
will also be reported as undefined by condor_config_val unless
they are explicitly set in the configuration file.
- Unauthenticated connections are now identified as
unauthenticated@unmapped. Previously, unauthenticated
connections were not assigned a name, so some authorization policies
that needed to distinguish between authenticated and unauthenticated
connections were not expressible. Connections that are
authenticated but not mapped to a name by the mapfile used to be
given the name auth-method@unmappeduser, where
auth-method is the authentication method that was used. Such
connections are now given the name auth-method@unmapped.
Connections that match *@unmapped are now forbidden from
doing operations that require a user id, regardless of configuration
settings. Such operations include job submission, job removal, and
any other job management commands that modify jobs.
- There has been a change of behavior when authentication fails.
Previously, authentication failure always resulted in the command
being rejected, regardless of whether the ALLOW/DENY settings
permitted unauthenticated access or not. This is still true if either
the client or server specifies that authentication is required.
However, if both sides specify that authentication is not required
(i.e. preferred or optional), then authentication failure only results
in the command being rejected if the ALLOW/DENY settings reject
unauthenticated access. This change makes it possible to have some
commands accept unauthenticated users from some network addresses
while only allowing authenticated users from others.
- Improved log messages when failing to authenticate requests. At
least the IP address of the requester is identified in all cases.
- The new submit file command job_ad_information_attrs
may be used to specify attributes from the job ad that should be saved
in the user log whenever a new event is being written. See
page
for details.
- Administrative commands now support the -constraint option, which
accepts a ClassAd expression. This applies to condor_checkpoint,
condor_off, condor_on, condor_reconfig, condor_reschedule,
condor_restart, condor_set_shutdown, and condor_vacate.
- File transfer plugins can be used for vm universe jobs. Notably,
file:// URLs can be used to allow VM image files to be pre-staged
on the execute machine. The submit description file command
vmware_dir is now optional.
If it is not given, then all relevant VMware image files
must be listed in transfer_input_files, possibly as URLs.
- File transfers for CREAM grid universe jobs are now initiated by
the condor_gridmanager. This removes the need for a GridFTP server
on the client machine.
- Improved the parallelism of file transfers for nordugrid jobs.
- Removed the distinction between regular and full reconfiguration
of Condor daemons. Now, all reconfigurations are full and require the
WRITE authorization level. condor_reconfig accepts but ignores the
-full command-line option.
- The batch_gahp, used for pbs and lsf grid universe jobs, has been
updated from version 1.12.2 to 1.16.0.
- condor_dagman now prints a message to the dagman.out file
when it truncates a node job user log file.
- condor_dagman now allows node categories to include
nodes from different splices. See section 2.10.6
for details.
- condor_dagman now allows category throttles in splices to
be overridden by higher levels in the DAG splicing structure.
See section 2.10.6 for details.
- Daemon logs can now be rotated several times instead of only once
into a single .old file. In order to do so, the newly introduced
configuration variable MAX_NUM_<SUBSYS>_LOG needs to be set
to a value greater than 1. The file endings will be ISO timestamps, and
the oldest rotated file will still have the ending .old.
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable JOB_ROUTER_LOCK specifies a
lock file used to
ensure that multiple instances of the condor_job_router never run
with the same values of JOB_ROUTER_NAME.
Multiple instances running
with the same name could lead to mismanagement of routed jobs.
- The new configuration variable ROOSTER_MAX_UNHIBERNATE
is an integer
specifying the maximum number of machines to wake up per cycle.
The default value of 0 means no limit.
- The new configuration variable ROOSTER_UNHIBERNATE_RANK
is a ClassAd
expression specifying which machines should be woken up first in a
given cycle. Higher ranked machines are woken first.
If the number of machines to be woken up is limited by
ROOSTER_MAX_UNHIBERNATE, the rank may be used for
determining which machines are woken before reaching the limit.
- The new configuration variable CLASSAD_USER_LIBS
is a list of libraries
containing additional ClassAd functions to be used during ClassAd
evaluation.
- The new configuration variable SHADOW_WORKLIFE
specifies the number of seconds after which the condor_shadow will exit,
when the current job finishes, instead of fetching a new job to
manage. Having the condor_shadow continue managing jobs helps
reduce overhead and can allow the condor_schedd to achieve higher
job completion rates. The default is 3600, one hour. The value 0
causes condor_shadow to exit after running a single job.
- The new configuration variable MAX_NUM_<SUBSYS>_LOG
will determine how often the daemon log of SUBSYS will rotate.
The default value is 1 which leads to the old behavior of a single
rotation into a .old file.
Bugs Fixed:
- Configuration variables with a default value of 0
that were not defined in the configuration file
were treated as though they were undefined by condor_config_val.
Now Condor treats this case like any other:
the default value is displayed.
- Starting in Condor version 7.5.1,
using literals with a logical operator
in a ClassAd expression (for example, 1 || 0) caused the expression
to evaluate to the value ERROR. The previous behavior has been
restored: zero values are treated as False,
and non-zero values are treated as True.
- Starting in Condor version 7.5.0,
the condor_schedd no longer supported queue
management commands when security negotiation was disabled,
for example, if SEC_DEFAULT_NEGOTIATION = NEVER.
- Fixed a bug introduced in Condor version 7.5.1.
ClassAd string literals containing
characters with negative ASCII values were not accepted.
- Fixed a bug introduced in Condor version 7.5.0,
which caused Condor to not renew
job leases for CREAM grid jobs in most situations.
- Question marks occurring in a ClassAd string are no longer preceded
by a backslash when the ClassAd is printed.
Known Bugs:
Additions and Changes to the Manual:
Version 7.5.2
Release Notes:
- Condor version 7.5.2 released on April 26, 2010.
- Condor no longer supports SuSE 8 Linux on the Itanium 64 architecture.
- The following submit description file commands are no longer recognized.
Their functionality is replaced by the command grid_resource.
- grid_type
- globusscheduler
- jobmanager_type
- remote_schedd
- remote_pool
- unicore_u_site
- unicore_v_site
New Features:
- The condor_schedd daemon uses less disk bandwidth when logging
updates to job ClassAds from running jobs and also when removing jobs
from the queue and handling job eviction and condor_shadow exceptions.
This should improve performance in situations where
disk bandwidth is a limiting factor.
Some cases of updates to the job user log
have also been optimized to be less disk intensive.
- The condor_schedd daemon uses less CPU when scheduling
some types of job queues. Most likely to benefit from this improvement is
a large queue of short-running, non-local, and non-scheduler universe jobs,
with at least one idle local or scheduler universe job.
- The condor_schedd automatically grants the condor_startd
authority to renew leases on claims and to evict claims.
Previously, this required that the condor_startd be trusted for
general DAEMON-level command access. Now this only
requires READ-level command access. The specific commands
that the condor_startd sends to the condor_schedd can
effectively only operate on the claims associated with that condor_startd,
so this change does not open up these operations to access by anyone
with READ access. It reduces the level of trust that
the condor_schedd must have in the condor_startd.
- The condor_procd's log now rotates if logging is activated.
The default maximum size is 10Mbytes. To change the default,
use the configuration variable MAX_PROCD_LOG .
- For Unix systems only,
user job log and global job event log lock files can now optionally
be created in a directory on a
local drive by setting NEW_LOCKING to True.
See section 3.3.4 for
the details of this configuration variable.
- condor_dagman and condor_submit_dag now default to lazy
creation of the .condor.sub files for nested DAGs.
condor_submit_dag no longer creates them, and condor_dagman
itself creates the files as the DAG is run.
The previous "eager" behavior can
be obtained with a combination of command-line and configuration settings.
There are several advantages to the "lazy" submit file creation:
- The DAG file for a nested DAG does not have to exist until that node
is ready to run, so the DAG file can be dynamically created by earlier
parts of the top-level DAG (including by the PRE script of the nested
DAG node).
- It is now possible to have nested DAGs within splices, which is not
possible with "eager" submit file creation.
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable
DAGMAN_GENERATE_SUBDAG_SUBMITS controls whether
condor_dagman itself generates the .condor.sub files for
nested DAGs, rather than relying on condor_submit_dag "eagerly"
creating them. See section 3.3.26 for
more information.
- The new configuration variable NEW_LOCKING can specify that
job user logs and the global job event log to be written to a local drive,
avoiding locking problems with NFS.
See section 3.3.4 for
the details of this configuration variable.
Bugs Fixed:
- The condor_job_router failed to work on SLES 9 PowerPC,
AIX 5.2 PowerPC,
and YDL 5 PowerPC due to a problem in how it detected EOF in the job queue log.
- When jobs are removed, the condor_schedd sometimes did not
quickly reschedule a different job to run on the slot to which the
removed job had been matched. Instead, it would take up to
SCHEDD_INTERVAL seconds to do so.
- Fixed a bug introduced in Condor version 7.5.1 that caused the
gahp_server to crash when
first communicating with most gt2 or gt5 GRAM servers.
Known Bugs:
Additions and Changes to the Manual:
Version 7.5.1
Release Notes:
- Condor version 7.5.1 released on March 2, 2010.
- Some, but not all of the bug fixes and features which are in
Condor version 7.4.2, are in this 7.5.1 release.
- The Condor release is now available as a proper RPM or Debian
package.
- Condor now internally uses the version of New ClassAds provided
as a stand-alone library (http://www.cs.wisc.edu/condor/classad/).
Previously, Condor
used an older version of ClassAds that was heavily tied to the Condor
development libraries. This change should be transparent in the
current development series. In the next development series (7.7.x),
Condor will begin to use features of New ClassAds that were unavailable in
Old ClassAds.
Section 4.1.1 details the differences.
- HPUX 11.00 is no longer a supported platform.
New Features:
- A port number defined within CONDOR_VIEW_HOST may now use
a shared port.
- The condor_master no longer pauses for 3 seconds after starting
the condor_collector. However, if the configuration variable
COLLECTOR_ADDRESS_FILE defines a file,
the condor_master will wait for that file to be created
before starting other daemons.
- In the grid universe, Condor can now automatically distinguish
between GRAM2 and GRAM5 servers, that is grid types gt2 and
gt5.
Users can submit jobs using a grid type of gt2 or gt5
for either type of server.
- Grid universe jobs using the CREAM grid system now batch up
common requests into larger single requests. This
reduces network traffic, increases the number of parallel tasks
the Condor can handle at once, and reduces the load on the remote
gatekeeper.
- The new submit description file command cream_attributes
sets additional attribute/value pairs for the CREAM job description
that Condor creates when submitting a grid universe job
destined for the CREAM grid system.
- The condor_q command with option -analyze is now performs
the same analysis as previously occurred with the -better-analyze option.
Therefore, the output of condor_q with the -analyze option
has different output than before.
The -better-analyze option is still recognized and behaves the same
as before, though it may be removed from a future version.
- Security sessions that are not used for longer than an hour are
now removed from the security session cache to limit memory usage.
- The number of security sessions in the cache is now advertised in
the daemon ClassAd as MonitorSelfSecuritySessions.
- condor_dagman now has the capability to run DAGs containing nodes
that are declared to be NOOPs - for these nodes, a job is never actually
submitted. See section 2.10.2 for information.
- The submit file attribute vm_macaddr can now be used to set
the MAC address for vm universe jobs that use VMware. The range of valid
MAC addresses is constrained by limits imposed by VMware.
- The condor_q command with option -globus
is now much more efficient in its communication with the condor_schedd.
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable STRICT_CLASSAD_EVALUATION
controls whether new or old ClassAd expression evaluation semantics are
used. In new ClassAd semantics, an unscoped attribute reference is only
looked up in the local ad. The default is False (use old ClassAd semantics).
- The configuration variable
DELEGATE_FULL_JOB_GSI_CREDENTIALS now applies to all proxy
delegations done between Condor daemons and tools.
The value is a boolean and defaults to False,
which means that when doing delegation Condor will now create a limited proxy
instead of a full proxy.
- The new configuration variable
SEC_<access-level>_SESSION_LEASE specifies the maximum
number of seconds an unused security session will be kept in a daemon's
session cache before being removed to save memory. The default is 3600.
If the server and client have different configurations, the smaller
one will be used.
Bugs Fixed:
Known Bugs:
Additions and Changes to the Manual:
Version 7.5.0
Release Notes:
- All bug fixes and features which are in 7.4.1 are in this 7.5.0 release.
New Features:
- Added the new daemon condor_shared_port for Unix platforms
(except for HPUX).
It allows Condor daemons to share a
single network port. This makes opening access to Condor through a
firewall easier and safer. It also increases the scalability of a
submit node by decreasing port usage. See
section 3.3.36 for more information.
- Improved CCB's handling of rude NAT/firewalls that silently drop
TCP connections.
- Simplified the publication of daemon addresses.
PublicNetworkIpAddr and PrivateNetworkIpAddr have been removed.
MyAddress contains both public and private addresses. For now,
<Subsys>IpAddr contains the same information. In a future release,
the latter may be removed.
- Changes to TCP_FORWARDING_HOST,
PRIVATE_NETWORK_ADDRESS, and
PRIVATE_NETWORK_NAME can now be made without requiring a
full restart. It may take up to one condor_collector update interval
for the changes to become visible.
- Network compatibility with Condor prior to 6.3.3 is no longer
supported unless SEC_CLIENT_NEGOTIATION is set to
NEVER. This change removes the risk of communication errors
causing performance problems resulting from automatic fall-back to the
old protocol.
- For efficiency, authentication between the condor_shadow and
condor_schedd daemons is now able to be cached and reused in more
cases. Previously, authentication for updating job information was
only cached if read access was configured to require authentication.
- condor_config_val will now report the default value for
configuration variables that are not set in the configuration files.
- The condor_gridmanager now uses a single status call to obtain
the status of all CREAM grid universe jobs from the remote server.
- The condor_gridmanager will now retry CREAM commands that time out.
- Forwarding a renewed proxy for CREAM grid universe jobs to the
remote server is now much more efficient.
Configuration Variable and ClassAd Attribute Additions and Changes:
- Removed the configuration variable
COLLECTOR_SOCKET_CACHE_SIZE.
Configuration of this parameter used to be mandatory to enable TCP updates
to the condor_collector. Now no special configuration of the
condor_collector is required to allow TCP updates, but it is
important to ensure that there are sufficient file descriptors for
efficient operation. See section 3.7.6 for
more information.
- The new configuration variable USE_SHARED_PORT
is a boolean value that specifies
whether a Condor process should rely on the condor_shared_port daemon for
receiving incoming connections. Write access to
DAEMON_SOCKET_DIR is required for this to take effect.
The default is False. If set to True, SHARED_PORT
should be added to DAEMON_LIST. See
section 3.3.36 for more information.
- Added the new configuration variable CCB_HEARTBEAT_INTERVAL.
It is the maximum
number of seconds of silence on a daemon's connection to the CCB server
after which it will ping the server to verify that the connection still
works.
The default value is 1200 (20 minutes).
This feature serves to both speed
up detection of dead connections and to generate a guaranteed minimum
frequency of activity to attempt to prevent the connection from being
dropped.
Bugs Fixed:
- Fixed problem with a ClassAd debug function,
so it now properly emits debug information for ClassAd IfThenElse
clauses.
Known Bugs:
Additions and Changes to the Manual:
condor-admin@cs.wisc.edu