Subsections
8.5 Stable Release Series 7.2
This is a stable release series of Condor.
As usual, only bug fixes (and potentially, ports to new platforms)
will be provided in future 7.2.x releases.
New features will be added in the 7.3.x development series.
The details of each version are described below.
Version 7.2.4
Release Notes:
New Features:
Configuration Variable Additions and Changes:
Bugs Fixed:
- Fixed a bug in the checkpoint server that caused failure of
checkpoint image storage and retrieval if the requesting submission
machine was running a 32-bit installation of Condor and the checkpoint
server was from a 64 bit installation, or vice versa. The checkpoint
image server, both the 32-bit and 64-bit installation, now handles both
protocols. It is recommended that any checkpoint server installation which
may be used in a flocking situation or other federated joining of pools
use the 64-bit binary. This is due to the possibility that there could be
a checkpoint image larger than what is representable in 32 bits. A 32-bit
checkpoint image server will now notice if this situation occurs and log
a message suggesting an upgrade to the 64-bit version.
- Fixed a bug that caused condor_procd to sometimes fail when monitoring
processes with environments larger than 1MB.
- Fixed a bug that caused condor_dagman to fail in recovery mode on
a DAG in which any nodes had been retried.
- Xen-based virtual machines now have the correct amount of memory.
Previously, the amount of memory was too small by a factor of 1024.
- Fixed a bug in the handling of $$(VARIABLE) submit
file expressions.
- Fixed a bug in the code related to USE_VISIBLE_DESKTOP
that was causing the windows created by the job behave incorrectly.
- Fixed a bug that caused Stork to treat successful file transfers
as failed.
- Fixed several bugs in the user log reader in the handling of
files of size zero.
- Fixed a problem affecting parallel universe jobs with very short
tasks. If any of the parallel tasks exited before the first node
started, the entire job was prematurely treated as though it had
finished. If the job ClassAd attribute ParallelShutdownPolicy was
set to "WAIT_FOR_ALL", the job was prematurely treated as though it
had finished if all stated tasks completed before the remaining tasks
started.
Known Bugs:
Additions and Changes to the Manual:
- Descriptions and definitions of all commands that may be placed within
the submit description file have been moved from the condor_submit
manual page to section
.
- Added a description of the configuration variable
NEGOTIATOR_MATCHLIST_CACHING.
See 3.3.17 for the definition.
Version 7.2.3
Release Notes:
- The header files for ClassAds are now included within the release.
New Features:
- Enhanced the Debian 5.0 Condor port on the x86_64 platform to
include support for the standard universe.
Configuration Variable Additions and Changes:
- The new integer configuration variable
SEC_TCP_SESSION_DEADLINE specifies the
number of seconds after which the client should give up its attempt to
establish a security session with a daemon that it is connecting to.
The default value is 120 seconds.
- The new configuration variables SCHEDD_CLUSTER_INITIAL_VALUE
and SCHEDD_CLUSTER_INCREMENT_VALUE are integers that
specify the cluster number to use for the first job submission,
and the stride used to increment the cluster id upon successive submissions.
See 3.3.11 and
3.3.11
for the complete definitions of these variables.
Bugs Fixed:
Known Bugs:
Additions and Changes to the Manual:
Version 7.2.2
Release Notes:
New Features:
- Added a full port of Condor to Debian 5.0 on the x86 platform.
- Added a clipped port of Condor to Debian 5.0 on the x86_64 platform.
- Added the -DumpRescue command-line flag to condor_dagman
and condor_submit_dag. This flag is intended mainly for testing.
- Added support for the -debug option to condor_qedit.
- The Job Router now uses a time slice timer for periodic expression
evaluation, similar to the condor_schedd daemon.
The evaluation interval is controlled by
the configuration variable PERIODIC_EXPR_INTERVAL,
and defaults to 60 seconds, the same default value used by
the condor_schedd daemon.
- The Job Router now resets the source job, if a failure occurs when
updating the condor_schedd daemon for a periodic expression that
evaluated to True. The job's periodic expressions should be
evaluated again some time in the future with a successful update.
Configuration Variable Additions and Changes:
- The new boolean configuration variable
EVENT_LOG_FSYNC provides control of the behavior of
Condor when writing events to the event log. Previously,
the behavior was as if this parameter were set to False.
See 3.3.4 for the complete definition of
this variable.
- The new boolean configuration variable
EVENT_LOG_LOCKING provides control of the behavior of
Condor when writing events to the event log. Previously,
the behavior was controlled by ENABLE_USERLOG_LOCKING.
See 3.3.4 for the complete definition of
this variable.
- The new string configuration variable TRANSFERER
specifies the path to the condor_transferer program which is
invoked by the condor_replication daemon to perform the actual
transfer of the file set by STATE_FILE.
This is part of the high availability framework.
Prior to Condor 7.2.2, the value of TRANSFERER was hard coded to
$(RELEASE_DIR)/sbin/condor_transferer. The use of
this hard coded behavior should be considered obsolete behavior, and
will be removed in a future version of Condor.
- The PREEMPTION_REQUIREMENTS and the RANK
expression in the matchmaker can now reference many more ClassAd
attributes than just SubmittorPrio. New attributes allow
this expression to take into account resources currently in use, as
well as group usage and quota info. New attributes are:
SubmitterUserResourcesInUse,
RemoteUserResourcesInUse,
RemoteGroupResourcesInUse, RemoteGroupQuota,
SubmitterGroupResourcesInUse,
SubmitterGroupQuota.
- Added JOB_ROUTER_ATTRS_TO_COPY configuration
option. This is a comma separated list of attributes that the Job
Router should copy from the routed ad to the source ad in addition
to internally hard coded attributes that are copied.
- Added JOB_ROUTER_RELEASE_ON_HOLD. configuration
option that will control whether the Job Router will reset the
source job to an untouched state if it needs to yield the job
because the routed job went on hold. The option defaults to
resetting the source job.
- The new configuration variables PREEMPTION_REQUIREMENTS_STABLE
and PREEMPTION_RANK_STABLE identify for Condor if all
attributes in the variables PREEMPTION_REQUIREMENTS and
PREEMPTION_RANK will not change within
a negotiation interval.
- The new configuration variables OFFLINE_LOG
and OFFLINE_EXPIRE_ADS_AFTER specify the location of
persistent machine ClassAds for hibernating machines,
as well as the lifetime of the persistent ClassAds.
Bugs Fixed:
Known Bugs:
Additions and Changes to the Manual:
- A manual page for condor_power now appears in the manual.
condor_power sends a packet to a machine in a low power state,
to cause the machine to wake from that state.
- Reorganized the user manual section that describes DAGMan.
- Added a note about the fact that environment values specified
with the environment submit description file command override values from
the submittor's environment, as imported with getenv = True.
- Added new information to the section on Power Management
pertaining to the handling of hibernating machines.
Version 7.2.1
Release Notes:
- This release addresses reported 7.2.0 problems with the
Windows distribution.
New Features:
- Condor now has a clipped port to i386 Debian 5.0 (Lenny).
- Added standard universe support for gfortran.
- Added support for standard output and standard error to be greater
than 2 Gigabytes.
Configuration Variable Additions and Changes:
Bugs Fixed:
- Fixed a bug in the condor_collector which could cause it
to hang indefinitely while reading network input in rare conditions.
- Fixed a bug in condor_chirp for Windows which was causing it
to crash on invocation.
- Fixed a bug in the Windows condor_mail program, which was causing
it to become unresponsive when run. If left running, the application also
increased its memory consumption.
- Fixed a bug that could cause the condor_schedd to never
evaluate periodic expressions.
- Fixed a bug on Unix platforms where condor_configure would
provide incorrect defaults for the JAVA_MAXHEAP_ARGUMENT
attribute in the installed configuration files. The new current
default for Sun Java JVMs is -Xmx1024m.
- Fixed a bug on Unix platforms where condor_configure would
imply that using the Unix user root or UID 0 for the
-owner option is a good thing. It is not, and would then complain
that it could not find user root in the password file.
- Fixed a bug on Unix platforms where condor_configure would
emit errors about not being able to execute ldd when installing
Condor on the Mac OS X 10.5 platform. condor_configure now
correctly detects shared library requirements when installing the
Condor binaries on the Mac OS X 10.5 platform.
- Fixed a bug where execute-side daemons started before the
condor_credd would fail to match with Windows jobs with
run_as_owner set. This condition persisted until the
execute-side daemons were either restarted or reconfigured.
- Fixed a problem affecting the Job Router and Condor-C. When jobs
spool input files, they enter a temporary hold state, which could
trigger actions by a naive periodic remove or release expression.
Periodic expressions are no longer evaluated when in this temporary
hold state, which has the hold reason "Spooling input data files".
- The example init script condor.boot.generic erroneously claimed
that the condor_master would begin sending SIGKILL to child
processes after 20 seconds if SIGQUIT (the fast shutdown) failed. The
condor_master will actually wait $(SHUTDOWN_FAST_TIMEOUT)
seconds, a value that currently defaults to 300 seconds.
- Environment variable names are now properly treated as
case-insensitive on Windows. The most common symptom of this bug was
the inability to specify a custom PATH environment variable
for a job from its submit description file.
- Changed condor_submit -debug to issue a warning when ignoring
environment variables. This occurs with getenv = True set
in a submit description file.
- Fixed a long-standing memory leak in SOAP interface.
This caused the leak of a few hundred bytes of memory for each connection.
This could eventually have caused the condor_schedd daemon to crash.
- Fixed Job Router hooks so that their output is properly
propagated where appropriate.
- Implemented a fix for the condor_startd that prevents it from
crashing if the user specified the configuration variable
NUM_SLOTS_TYPE_N, without also specifying SLOT_TYPE_N.
- The sample configuration files now correctly set the default
universe to vanilla. This default has been true since 7.2.0,
but was not reflected in the sample configuration files.
- Fixed a bug that incorrectly set the value of the
job ClassAd attribute RequestMemory to be 1024 times its
correct size due to a mismatch in units;
the attribute RequestMemory is given in Mbytes, while
the attribute ImageSize is given in Kbytes.
- Fixed a memory leak in condor_dagman that leaked a small
amount of memory for each job submitted.
- Fixed a bug that was causing the network mask to be advertised
as a Condor sinful string, rather than a dotted-quad.
- Fixed a handle leak in the condor_procd on Windows.
Known Bugs:
Additions and Changes to the Manual:
- Added a FAQ entry for Windows describing how machines
with miss-configured performance counters may cause the condor_procd
to crash.
- Added a manual page for the command condor_router_history.
Version 7.2.0
Release Notes:
- A bug in some older Xen kernels can result in Condor errors
due to a broken assumption in the condor_procd daemon.
See the FAQ entry at section 7.7 for details.
- A problem has been discovered when using snapshot disks with
vm universe VMware jobs,
if the path that the condor_vm-gahp uses to refer to the
virtual machine's VMX file contains a symbolic link.
See the FAQ entry at section 7.3 for details.
- The name of the Amazon EC2 GAHP binary has changed from
amazon-gahp to amazon_gahp. This makes it consistent
with the naming of other Condor binaries.
New Features:
- The default universe for jobs is now
vanilla, instead of standard.
The default can be changed using the configuration variable
DEFAULT_UNIVERSE .
- VMware vm universe jobs now have any BIOS settings saved in
an nvram file in the vmware_dir given in the
job's submit file transferred to the execute machine, so that they
apply to the job's execution.
- Daemons that become unresponsive are now killed using the
SIGABRT signal, which causes a core file to be dropped.
Setting the configuration variable NOT_RESPONDING_WANT_CORE
to False will revert to the previous behavior that used
the SIGKILL signal.
- The condor_job_router and the
condor_q command with the -better-analyze option now
support more ClassAd functions than they previously did. They now
support all ClassAd functions, except for those with names beginning
with the string stringList.
- condor_status given the options -submitters -xml
no longer emits a single blank line when there are no submitters,
instead it prints valid XML output with an empty body.
Configuration Variable Additions and Changes:
- The HAD configuration variable NEGOTIATOR_STATE_FILE
has changed its name to STATE_FILE.
Bugs Fixed:
Known Bugs:
Additions and Changes to the Manual:
- Initial documentation for dynamic provisioning is available
in section 3.13.9.
- Documentation for Kerberos authentication
(see section 3.6.3)
and associated configuration variables has been updated.
condor-admin@cs.wisc.edu