Condor's Web Service (WS) API provides a way for application developers to interact with Condor, without needing to utilize Condor's command-line tools. In keeping with the Condor philosophy of reliability and fault-tolerance, this API is designed to provide a simple and powerful way to interact with Condor. Condor daemons understand and implement the SOAP (Simple Object Access Protocol) XML API to provide a web service interface for Condor job submission and management.
To deal with the issues of reliability and fault-tolerance, a two-phase commit mechanism to provides a transaction-based protocol. The following API description describes interaction between a client using the API and both the condor_schedd and condor_collector daemons to illustrate transactions for use in job submission, queue management and ClassAd management functions.
All applications using the API to interact with the condor_schedd will need to use transactions. A transaction is an ACID unit of work (atomic, consistent, isolated, and durable). The API limits the lifetime of a transaction, and both the client (application) and the server (the condor_schedd daemon) may place a limit on the lifetime. The server reserves the right to specify a maximum duration for a transaction.
The client initiates a transaction using the beginTransaction() method. It ends the transaction with either a commit (using commitTransaction()) or an abort (using abortTransaction()).
Not all operations in the API need to be performed within a transaction. Some accept a null transaction. A null transaction is a SOAP message with
<transaction xsi:type="ns1:Transaction" xsi:nil="true"/>Often this is achieved by passing the programming language's equivalent of
null in place of a transaction identifier.
It is possible that some operations will have access to more
information when they are used inside a transaction. For instance, a
getJobAds().
query would have access to the jobs that are pending in a
transaction, which are not committed and therefore not visible
outside of the transaction.
Transactions are as ACID compliant as possible.
Therefore, do not query for information
outside of a transaction on which to make a decision inside a
transaction based on the query's results.
A ClassAd is required to describe a job. The job ClassAd will be submitted to the condor_schedd within a transaction using the submit() method. The complexity of job ClassAd creation may be simplified by the createJobTemplate() method. It returns an instance of a ClassAd structure that may be further modified. A necessary part of the job ClassAd are the job attributes ClusterId and ProcId, which uniquely identify the cluster and the job within a cluster. Allocation and assignment of (monotonically increasing) ClusterId values utilize the newCluster() method. Jobs may be submitted within the assigned cluster only until the newCluster() method is invoked a subsequent time. Each job is allocated and assigned a (monotonically increasing) ProcId within the current cluster using the newJob() method. Therefore, the sequence of method calls to submit a set of jobs initially calls newCluster(). This is followed by calls to newJob() and then submit() for each job within the cluster.
As an example, here are sample cluster and job numbers that result from the ordered calls to submission methods:
There is the potential that a call to submit() will fail. Failure means that the job is in the queue, and it typically indicates that something needed by the job has not been sent. As a result the job has no hope in successfully running. It is possible to recover from such a failure by trying to resend information that the job will need. It is also completely acceptable to abort and make another attempt. To simplify the client's effort in figuring out what the job requires, a discoverJobRequirements() method accepting a job ClassAd and returning a list of things that should be sent along with the job is provided.
A common job submission case requires the job's executable and input files to be transferred from the machine where the application is running to the machine where the condor_schedd daemon is running. This is the analogous situation to running condor_submit using the -spool or -remote option. The executable and input files must be sent directly to the condor_schedd daemon, which places all files in a spool location.
The two methods declareFile() and sendFile() work in tandem to transfer files to the condor_schedd daemon. The declareFile() method causes the condor_schedd daemon to create the file in its spool location, or indicate in its return value that the file already exists. This increases efficiency, as resending an existing file is a waste of resources. The sendFile() method sends base64 encoded data. sendFile() may be used to send an entire file, or chunks of files as desired.
The declareFile() method has both required and optional arguments. declareFile() requires the name of the file and its size in bytes. The optional arguments relate hash information. A hash type of NOHASH disables file verification; the condor_schedd daemon will not have a reliable way to determine the existence of the file being declared.
Methods for retrieving files are most useful when a job is completed. Consider the categorization of the typical life-cycle for a job:
Once the job enters Middle Age, the getFile() method retrieves a file. The listSpool() method assists by providing a list of all the job's files in the spool location.
The job enters Old Age by the application's use of the
closeSpool() method.
It causes the condor_schedd daemon to remove the
job from the queue,
and the job's spool files are no longer available.
As there is no requirement for the application to invoke
the closeSpool() method,
jobs can potentially remain in the queue forever.
The configuration variable SOAP_LEAVE_IN_QUEUE
may mitigate this problem.
When this boolean variable evaluates to False,
a job enters Old Age.
A reasonable example for this configuration variable is
SOAP_LEAVE_IN_QUEUE = ((JobStatus==4) && ((ServerTime - CompletionDate) < (60 * 60 * 24)))
This expression results in Old age for a job (removed from the queue),
once the job has been Middle Aged (been completed) for 24 hours.
Condor daemons understand and communicate using the SOAP XML protocol. An application seeking to use this protocol will require code that handles the communication. The XML WSDL (Web Services Description Language) that Condor implements is included with the Condor distribution. It is in $(RELEASE_DIR)/lib/webservice. The WSDL must be run through a toolkit to produce language-specific routines that do communication. The application is compiled with these routines.
Condor must be configured to enable responses to SOAP calls. Please see section 3.3.33 for definitions of the configuration variables related to the web services API. The WS interface is listening on the condor_schedd daemon's command port. To obtain a list of all the the condor_schedd daemons in the pool with a WS interface, issue the command:
% condor_status -schedd -constraint "HasSOAPInterface=?=TRUE"With this information, a further command locates the port number to use:
% condor_status -schedd -constraint "HasSOAPInterface=?=TRUE" -l | grep MyAddress
Condor's security configuration must be set up such that access is authorized for the SOAP client. See Section 3.6.7 for information on how to set the ALLOW_SOAP and DENY_SOAP configuration variables.
The API's routines can be roughly categorized into ones that deal with
StatusAndTransaction beginTransaction(int duration);
Status commitTransaction(Transaction transaction);
Status abortTransaction(Transaction transaction);
StatusAndTransaction extendTransaction( Transaction transaction, int duration);
StatusAndRequirements submit(Transaction transaction, int clusterId, int jobId, ClassAd jobAd);
StatusAndClassAd createJobTemplate(int clusterId, int jobId, String owner, UniverseType type, String command, String arguments, String requirements);
enum UniverseType { STANDARD = 1, VANILLA = 5, SCHEDULER = 7, MPI = 8, GRID = 9, JAVA = 10, PARALLEL = 11, LOCALUNIVERSE = 12, VM = 13 };
StatusAndRequirements discoverJobRequirements( ClassAd jobAd);
Status declareFile(Transaction transaction, int clusterId, int jobId, String name, int size, HashType hashType, String hash);
enum HashType { NOHASH, MD5HASH };
Status sendFile(Transaction transaction, int clusterId, int jobId, String name, int offset, Base64 data);
StatusAndBase64 getFile(Transaction transaction, int clusterId, int jobId, String name, int offset, int length);
Status closeSpool(Transaction transaction, int clusterId, int jobId);
StatusAndFileInfoArray listSpool(Transaction transaction, int clusterId, int jobId);
StatusAndInt newCluster(Transaction transaction);
Status removeCluster(Transaction transaction, int clusterId, String reason);
StatusAndInt newJob(Transaction transaction, int clusterId);
Status removeJob(Transaction transaction, int clusterId, int jobId, String reason, boolean forceRemoval);
Status holdJob(Transaction transaction, int clusterId, int jobId, string reason, boolean emailUser, boolean emailAdmin, boolean systemHold);
Status releaseJob(Transaction transaction, int clusterId, int jobId, String reason, boolean emailUser, boolean emailAdmin);
StatusAndClassAdArray getJobAds(Transaction transaction, String constraint);
This method does much the same as the first element from the array returned by
getJobAds(transaction, "(ClusterId==clusterId && JobId==jobId)")
A prototype is
StatusAndClassAd getJobAd(Transaction transaction, int clusterId, int jobId);
Status requestReschedule();
Status insertAd(ClassAdType type, ClassAdStruct ad);
enum ClassAdType { STARTD_AD_TYPE, QUILL_AD_TYPE, SCHEDD_AD_TYPE, SUBMITTOR_AD_TYPE, LICENSE_AD_TYPE, MASTER_AD_TYPE, CKPTSRVR_AD_TYPE, COLLECTOR_AD_TYPE, STORAGE_AD_TYPE, NEGOTIATOR_AD_TYPE, HAD_AD_TYPE, GENERIC_AD_TYPE };
ClassAdArray queryStartdAds(String constraint);
ClassAdArray queryScheddAds(String constraint);
ClassAdArray queryMasterAds(String constraint);
ClassAdArray querySubmittorAds(String constraint);
ClassAdArray queryLicenseAds(String constraint);
ClassAdArray queryStorageAds(String constraint);
ClassAdArray queryAnyAds(String constraint);
StatusAndString getVersionString();
StatusAndString getPlatformString();
Many methods return a status. Table 4.5 lists and defines the StatusCode return values.
|
The following quote from the DRMAA Specification 1.0 abstract nicely describes the purpose of the API:
The Distributed Resource Management Application API (DRMAA), developed by a working group of the Global Grid Forum (GGF),
provides a generalized API to distributed resource management systems (DRMSs) in order to facilitate integration of application programs. The scope of DRMAA is limited to job submission, job monitoring and control, and the retrieval of the finished job status. DRMAA provides application developers and distributed resource management builders with a programming model that enables the development of distributed applications tightly coupled to an underlying DRMS. For deployers of such distributed applications, DRMAA preserves flexibility and choice in system design.
The API allows users who write programs using DRMAA functions and link to a DRMAA library to submit, control, and retrieve information about jobs to a Grid system. The Condor implementation of a portion of the API allows programs (applications) to use the library functions provided to submit, monitor and control Condor jobs.
See the DRMAA site (http://www.drmaa.org) to find the API specification for DRMA 1.0 for further details on the API.
The library was developed from the DRMA API Specification 1.0 of January 2004 and the DRMAA C Bindings v0.9 of September 2003. It is a static C library that expects a POSIX thread model on Unix systems and a Windows thread model on Windows systems. Unix systems that do not support POSIX threads are not guaranteed thread safety when calling the library's functions.
The object library file is called libcondordrmaa.a, and it is located within the <release>/lib directory in the Condor download. Its header file is called lib_condor_drmaa.h, and it is located within the <release>/include directory in the Condor download. Also within <release>/include is the file lib_condor_drmaa.README, which gives further details on the implementation.
Use of the library requires that a local condor_schedd daemon must be running, and the program linked to the library must have sufficient spool space. This space should be in /tmp or specified by the environment variables TEMP, TMP, or SPOOL. The program linked to the library and the local condor_schedd daemon must have read, write, and traverse rights to the spool space.
The library currently supports the following specification-defined job attributes:
The attribute DRMAA_NATIVE_SPECIFICATION can be used to direct all commands supported within submit description files. See the condor_submit manual page at section 9 for a complete list. Multiple commands can be specified if separated by newlines.
As in the normal submit file, arbitrary attributes can be added to the job's ClassAd by prefixing the attribute with +. In this case, you will need to put string values in quotation marks, the same as in a submit file.
Thus to tell Condor that the job will likely use 64 megabytes of memory (65536 kilobytes), to more highly rank machines with more memory, and to add the arbitrary attribute of department set to chemistry, you would set AttrDRMAA_NATIVE_SPECIFICATION to the C string:
drmaa_set_attribute(jobtemplate, DRMAA_NATIVE_SPECIFICATION,
"image_size=65536\nrank=Memory\n+department=\"chemistry\"",
err_buf, sizeof(err_buf)-1);
Condor has the ability to log a Condor job's significant events during its lifetime. This is enabled in the job's submit description file with the Log command.
This section describes the API defined by the C++ ReadUserLog class, which provides a programming interface for applications to read and parse events, polling for events, and saving and restoring reader state.
The following define enumerated types useful to the API.
All ReadUserLog constructors invoke one of the initialize() methods. Since C++ constructors cannot return errors, an application using any but the default constructor should call isIinitialized() to verify that the object initialized correctly, and for example, had permissions to open required files.
Note that because the constructors cannot return status information, most of these constructors will be eliminated in the future. All constructors, except for the default constructor with no parameters, will be removed. The application will need to call the appropriate initialize() method.
These methods are used to perform the initialization of the ReadUserLog objects. These initializers are used by all constructors that do real work. Applications should never use those constructors, should use the default constructor, and should instead use one of these initializer methods.
All of these functions will return false if there are problems such as being unable to open the log file, or true if successful.
Synopsis: Initialize to read the EventLog file.
NOTE: This method will likely be eliminated in the future, and this
functionality will be moved to a new ReadEventLog class.
Returns: bool; true: success, false: failed
Method parameters:
Synopsis: Initialize to read a specific log file.
Returns: bool; true: success, false: failed
Method parameters:
Path to the log file to read
If true, enable the reader to handle rotating log files,
which is only useful for global user logs
If true, try to open the rotated files
(with file names appended with .old or .1, .2, $...$)
first.
If true, the reader will open the file read-only and
disable locking.
Synopsis: Initialize to read a specific log file.
Returns: bool; true: success, false: failed
Method parameters:
Path to the log file to read
Limits what previously rotated files will be considered by the number
given in the file name suffix.
A value of 0 disables looking for rotated files.
A value of 1 limits the rotated file to be that with the file name suffix
of .old.
As only event logs are rotated, this parameter is only useful for
event logs.
If true, try to open the rotated files
(with file names appended with .old or .1, .2, $...$)
first.
If true, the reader will open the file read-only and
disable locking.
Synopsis: Initialize to continue from a persisted reader state.
Returns: bool; true: success, false: failed
Method parameters:
Reference to the persisted state to restore from
If true, the reader will open the file read-only and
disable locking.
Synopsis: Initialize to continue from a persisted reader state and set the
rotation parameters.
Returns: bool; true: success, false: failed
Method parameters:
Reference to the persisted state to restore from
Limits what previously rotated files will be considered by the number
given in the file name suffix.
A value of 0 disables looking for rotated files.
A value of 1 limits the rotated file to be that with the file name suffix
of .old.
As only event logs are rotated, this parameter is only useful for
event logs.
If true, the reader will open the file read-only and
disable locking.
4.5.3.4 Primary Methods
& event)
Synopsis: Read the next event from the log file.
Returns: ULogEventOutcome; Outcome of the log read attempt. ULogEventOutcome is an enumerated
type.
Method parameters:
& event
Pointer to an ULogEvent that is allocated by this call to
ReadUserLog::readEvent.
If no event is allocated, this pointer is
set to NULL. Otherwise the event needs to be delete()ed by the application.
Synopsis: Synchronize the log file if the last event read was an error. This
safe guard function should be called if there is some error reading an
event, but there are events after it in the file.
It will skip over the
bad event, meaning it will read up to and including the event separator,
so that the rest of the events can be read.
Returns: bool; true: success, false: failed
Method parameters:
4.5.3.5 Accessors
Synopsis: Check the status of the file, and whether it has grown, shrunk, etc.
Returns: ReadUserLog::FileStatus; the status of the log file, an
enumerated type.
Method parameters:
Synopsis: Check the status of the file, and whether it has grown, shrunk, etc.
Returns: ReadUserLog::FileStatus; the status of the log file, an
enumerated type.
Method parameters:
Set to true if the file is empty, false otherwise.
4.5.3.6 Methods for saving and restoring persistent reader state
The ReadUserLog::FileState structure is used to save and restore the state of the ReadUserLog state for persistence. The application should always use InitFileState() to initialize this structure.
All of these methods take a reference to a state buffer as their only parameter.
All of these methods return true upon success.
4.5.3.7 Save state to persistent storage
To save the state, do something like this:
ReadUserLog reader;
ReadUserLog::FileState statebuf;
status = ReadUserLog::InitFileState( statebuf );
status = reader.GetFileState( statebuf );
write( fd, statebuf.buf, statebuf.size );
...
status = reader.GetFileState( statebuf );
write( fd, statebuf.buf, statebuf.size );
...
status = UninitFileState( statebuf );
ReadUserLog::FileState statebuf; status = ReadUserLog::InitFileState( statebuf ); read( fd, statebuf.buf, statebuf.size ); ReadUserLog reader; status = reader.initialize( statebuf ); status = UninitFileState( statebuf ); ....
If the application needs access to the data elements in a persistent state, it should instantiate a ReadUserLogStateAccess object.
Synopsis: Constructor default
Returns: None
Constructor parameters:
Reference to the persistent state data to initialize from.
Synopsis: Destructor
Returns: None
Destructor parameters:
itemizeNone.
Synopsis: Checks if the buffer initialized
Returns: bool; true if successfully initialized, false otherwise
Method parameters:
Synopsis: Checks if the buffer is valid for use by
ReadUserLog::initialize()
Returns: bool; true if successful, false otherwise
Method parameters:
Synopsis: Get position within individual file.
NOTE: Can return an error if the result is too large to be
stored in a long.
Returns: bool; true if successful, false otherwise
Method parameters:
Byte position within the current log file
Synopsis: Get event number in individual file.
NOTE: Can return an error if the result is too large to be
stored in a long.
Returns: bool; true if successful, false otherwise
Method parameters:
Event number of the current event in the current log file
Synopsis: Position of the start of the current file in overall log.
NOTE: Can return an error if the result is too large
to be stored in a long.
Returns: bool; true if successful, false otherwise
Method parameters:
Byte offset of the start of the current file in the overall
logical log stream.
Synopsis: Get the event number of the first event in the current file
NOTE: Can return an error if the result is too large
to be stored in a long.
Returns: bool; true if successful, false otherwise
Method parameters:
This is the absolute event number of the first event in the
current file in the overall logical log stream.
Synopsis: Get the unique ID of the associated state file.
Returns: bool; true if successful, false otherwise
Method parameters:
buf
Buffer to fill with the unique ID of the current file.
Size in bytes of buf.
This is to prevent ReadUserLogFileState::getUniqId
from writing past the end of buf.
Synopsis: Get the sequence number of the associated state file.
Returns: bool; true if successful, false otherwise
Method parameters:
Sequence number of the current file
Synopsis: Get the position difference of two states given by this
and other.
NOTE: Can return an error if the result is too large to be
stored in a long.
Returns: bool; true if successful, false otherwise
Method parameters:
Reference to the state to compare to.
Difference in the positions
Synopsis: Get event number in individual file.
NOTE: Can return an error if the result is too large to be
stored in a long.
Returns: bool; true if successful, false otherwise
Method parameters:
Reference to the state to compare to.
Event number of the current event in
the current log file
Synopsis: Get the position difference of two states given by this
and other.
NOTE: Can return an error if the result is too large
to be stored in a long.
Returns: bool; true if successful, false otherwise
Method parameters:
Reference to the state to compare to.
Difference between the byte offset of the start of the current
file in the overall logical log stream and that of other.
Synopsis: Get the difference between the event number of the first event in
two state buffers (this - other).
NOTE: Can return an error if the result is too large
to be stored in a long.
Returns: bool; true if successful, false otherwise
Method parameters:
Reference to the state to compare to.
Difference between the absolute event number of the first event in
the current file in the overall logical log stream and that of
other.
4.5.3.11 Future persistence API
The ReadUserLog::FileState will likely be replaced with a new
C++ ReadUserLog::NewFileState, or a similarly named class that
will self initialize.
Additionally, the functionality of ReadUserLogStateAccess will be integrated into this class.
center This section has not yet been written
4.5.4 The Command Line Interface
center This section has not yet been written
4.5.5 The Condor GAHP
The Condor Perl module facilitates automatic submitting and monitoring of Condor jobs, along with automated administration of Condor. The most common use of this module is the monitoring of Condor jobs. The Condor Perl module can be used as a meta scheduler for the submission of Condor jobs.
The Condor Perl module provides several subroutines. Some of the subroutines are used as callbacks; an event triggers the execution of a specific subroutine. Other of the subroutines denote actions to be taken by Perl. Some of these subroutines take other subroutines as arguments.
4.5.6.1 Subroutines
The following is an example that uses the Condor Perl module.
The example uses the submit description file
mycmdfile.cmd to specify the submission of a job.
As the job is matched with a machine and begins to execute,
a callback subroutine (called execute)
sends a condor_vacate signal to the job,
and it increments a counter which keeps track of the
number of times this callback executes.
A second callback keeps a count of the number of times
that the job was evicted before the job completes.
After the job completes, the termination
callback (called normal) prints out a summary of what happened.
#!/usr/bin/perl
use Condor;
$CMD_FILE = 'mycmdfile.cmd';
$evicts = 0;
$vacates = 0;
# A subroutine that will be used as the normal execution callback
$normal = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "Job $cluster.$job exited normally without errors.\n";
print "Job was vacated $vacates times and evicted $evicts times\n";
exit(0);
};
$evicted = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "Job $cluster, $job was evicted.\n";
$evicts++;
&Condor::Reschedule();
};
$execute = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
$host = $parameters{'host'};
$sinful = $parameters{'sinful'};
print "Job running on $sinful, vacating...\n";
&Condor::Vacate($sinful);
$vacates++;
};
$cluster = Condor::Submit($CMD_FILE);
printf("Could not open. Access Denied\n");
break;
&Condor::RegisterExitSuccess($normal);
&Condor::RegisterEvicted($evicted);
&Condor::RegisterExecute($execute);
&Condor::Monitor($cluster);
&Condor::Wait();
This example program will submit the command file 'mycmdfile.cmd' and attempt to vacate any machine that the job runs on. The termination handler then prints out a summary of what has happened.
A second example Perl script facilitates the meta-scheduling of two of Condor jobs. It submits a second job if the first job successfully completes.
#!/s/std/bin/perl
# tell Perl where to find the Condor library
use lib '/unsup/condor/lib';
# tell Perl to use what it finds in the Condor library
use Condor;
$SUBMIT_FILE1 = 'Asubmit.cmd';
$SUBMIT_FILE2 = 'Bsubmit.cmd';
# Callback used when first job exits without errors.
$firstOK = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
$cluster = Condor::Submit($SUBMIT_FILE2);
if (($cluster) == 0)
{
printf("Could not open $SUBMIT_FILE2.\n");
}
&Condor::RegisterExitSuccess($secondOK);
&Condor::RegisterExitFailure($secondfails);
&Condor::Monitor($cluster);
};
$firstfails = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "The first job, $cluster.$job failed, exiting with an error. \n";
exit(0);
};
# Callback used when second job exits without errors.
$secondOK = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "The second job, $cluster.$job successfully completed. \n";
exit(0);
};
# Callback used when second job exits WITH an error.
$secondfails = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "The second job ($cluster.$job) failed. \n";
exit(0);
};
$cluster = Condor::Submit($SUBMIT_FILE1);
if (($cluster) == 0)
{
printf("Could not open $SUBMIT_FILE1. \n");
}
&Condor::RegisterExitSuccess($firstOK);
&Condor::RegisterExitFailure($firstfails);
&Condor::Monitor($cluster);
&Condor::Wait();
Some notes are in order about this example. The same task could be accomplished using the Condor DAGMan metascheduler. The first job is the parent, and the second job is the child. The input file to DAGMan is significantly simpler than this Perl script.
A third example using the Condor Perl module expands upon the second example. Whereas the second example could have been more easily implemented using DAGMan, this third example shows the versatility of using Perl as a metascheduler.
In this example, the result generated from the successful completion of the first job are used to decide which subsequent job should be submitted. This is a very simple example of a branch and bound technique, to focus the search for a problem solution.
#!/s/std/bin/perl
# tell Perl where to find the Condor library
use lib '/unsup/condor/lib';
# tell Perl to use what it finds in the Condor library
use Condor;
$SUBMIT_FILE1 = 'Asubmit.cmd';
$SUBMIT_FILE2 = 'Bsubmit.cmd';
$SUBMIT_FILE3 = 'Csubmit.cmd';
# Callback used when first job exits without errors.
$firstOK = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
# open output file from first job, and read the result
if ( -f "A.output" )
{
open(RESULTFILE, "A.output") or die "Could not open result file.";
$result = <RESULTFILE>;
close(RESULTFILE);
# next job to submit is based on output from first job
if ($result < 100)
{
$cluster = Condor::Submit($SUBMIT_FILE2);
if (($cluster) == 0)
{
printf("Could not open $SUBMIT_FILE2.\n");
}
&Condor::RegisterExitSuccess($secondOK);
&Condor::RegisterExitFailure($secondfails);
&Condor::Monitor($cluster);
}
else
{
$cluster = Condor::Submit($SUBMIT_FILE3);
if (($cluster) == 0)
{
printf("Could not open $SUBMIT_FILE3.\n");
}
&Condor::RegisterExitSuccess($thirdOK);
&Condor::RegisterExitFailure($thirdfails);
&Condor::Monitor($cluster);
}
}
else
{
printf("Results file does not exist.\n");
}
};
$firstfails = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "The first job, $cluster.$job failed, exiting with an error. \n";
exit(0);
};
# Callback used when second job exits without errors.
$secondOK = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "The second job, $cluster.$job successfully completed. \n";
exit(0);
};
# Callback used when third job exits without errors.
$thirdOK = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "The third job, $cluster.$job successfully completed. \n";
exit(0);
};
# Callback used when second job exits WITH an error.
$secondfails = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "The second job ($cluster.$job) failed. \n";
exit(0);
};
# Callback used when third job exits WITH an error.
$thirdfails = sub
{
%parameters = @_;
$cluster = $parameters{'cluster'};
$job = $parameters{'job'};
print "The third job ($cluster.$job) failed. \n";
exit(0);
};
$cluster = Condor::Submit($SUBMIT_FILE1);
if (($cluster) == 0)
{
printf("Could not open $SUBMIT_FILE1. \n");
}
&Condor::RegisterExitSuccess($firstOK);
&Condor::RegisterExitFailure($firstfails);
&Condor::Monitor($cluster);
&Condor::Wait();
condor-admin@cs.wisc.edu