Condor uses priorities to determine machine allocation for jobs. This section details the priorities and the allocation of machines (negotiation).
For accounting purposes, each user is identified by username@uid_domain. Each user is assigned a priority value even if submitting jobs from different machines in the same domain, or even if submitting from multiple machines in the different domains.
The numerical priority value assigned to a user is inversely related to the goodness of the priority. A user with a numerical priority of 5 gets more resources than a user with a numerical priority of 50. There are two priority values assigned to Condor users:
A user's RUP measures the resource usage of the user
through time.
Every user begins with a RUP of one half (0.5), and
at steady state, the RUP of a user equilibrates to the number of resources
used by that user. Therefore, if a specific user continuously uses exactly
ten resources for a long period of time, the RUP of that user stabilizes at
ten.
3.4.1 Real User Priority (RUP)
However, if the user decreases the number of resources used, the RUP gets better. The rate at which the priority value decays can be set by the macro PRIORITY_HALFLIFE , a time period defined in seconds. Intuitively, if the PRIORITY_HALFLIFE in a pool is set to 86400 (one day), and if a user whose RUP was 10 removes all his jobs, the user's RUP would be 5 one day later, 2.5 two days later, and so on.
The effective user priority (EUP) of a user is used to determine
how many resources that user may receive.
The EUP is linearly related to the RUP
by a priority factor which may be defined on a per-user basis.
Unless otherwise configured, the priority factor for all users is 1.0,
and so the EUP is the same as the the RUP.
However, if desired, the priority factors of
specific users (such as remote submitters) can be increased so that
others are served preferentially.
3.4.2 Effective User Priority (EUP)
The number of resources that a user may receive is inversely related
to the ratio between the EUPs of submitting users.
Therefore user
with EUP=5 will receive
twice as many resources as user
with EUP=10 and four times as many
resources as user
with EUP=20.
However, if
does not use the full number
of allocated resources,
the available resources are repartitioned and distributed among
remaining users according to the inverse ratio rule.
Condor supplies mechanisms to directly support two policies in which EUP may be useful:
The priority boost factors for individual users can be set with the
setfactor option of condor_userprio.
Details may be found in the condor_userprio manual page
on page
.
Priorities are used to ensure that users get their fair share of resources.
The priority values are used at allocation time, meaning during
negotiation and matchmaking.
Therefore, there are ClassAd attributes that take on defined values
only during negotiation, making them ephemeral.
In addition to allocation, Condor may preempt a machine claim
and reallocate it when conditions change.
3.4.3 Priorities in Negotiation and Preemption
Too many preemptions lead to thrashing, a condition in which negotiation for a machine identifies a new job with a better priority most every cycle. Each job is, in turn, preempted, and no job finishes. To avoid this situation, the PREEMPTION_REQUIREMENTS configuration variable is defined for and used only by the condor_negotiator daemon to specify the conditions that must be met for a preemption to occur. It is usually defined to deny preemption if a current running job has been running for a relatively short period of time. This effectively limits the number of preemptions per resource per time interval. Note that PREEMPTION_REQUIREMENTS only applies to preemptions due to user priority. It does not have any effect if the machine's RANK expression prefers a different job, or if the machine's policy causes the job to vacate due to other activity on the machine. See section 3.5.9 for a general discussion of limiting preemption.
The following ephemeral attributes may be used within policy definitions. Care should be taken when using these attributes, due to their ephemeral nature; they are not always defined, so the usage of an expression to check if defined such as
(RemoteUserPrio =?= UNDEFINED)is likely necessary.
Within these attributes, those with names that contain the
string Submitter refer to characteristics about the candidate job's user;
those with names that contain the string Remote
refer to characteristics about the user currently using the resource.
Further, those with names that end with the
string ResourcesInUse have values that may change within
the time period associated with a single negotiation cycle.
Therefore, the configuration variables PREEMPTION_REQUIREMENTS_STABLE
and and PREEMPTION_RANK_STABLE exist to inform the
condor_negotiator daemon that values may change.
See section 3.3.17 on
page
for
definitions of these configuration variables.
<N> on the machine.
3.4.4 Priority Calculation
This section may be skipped if the reader so feels, but for the curious,
here is Condor's priority calculation algorithm.
The RUP of a user
at time
,
, is calculated
every time interval
using the formula
The EUP of user
at time
,
is calculated by
As mentioned previously, the RUP calculation is designed so that at steady
state, each user's RUP stabilizes at the number of resources used by that user.
The definition of
ensures that the calculation of
can be
calculated over non-uniform time intervals
without affecting the
calculation. The time interval
varies due to events internal to
the system, but Condor guarantees that unless the central manager machine is
down, no matches will be unaccounted for due to this variance.
Negotiation is the method Condor undergoes periodically to match queued jobs with resources capable of running jobs. The condor_negotiator daemon is responsible for negotiation.
During a negotiation cycle, the condor_negotiator daemon accomplishes the following ordered list of items.
The condor_negotiator asks the condor_schedd for the "next job" from a given submitter/user. Typically, the condor_schedd returns jobs in the order of job priority. If priorities are the same, job submission time is used; older jobs go first. If a cluster has multiple procs in it and one of the jobs cannot be matched, the condor_schedd will not return any more jobs in that cluster on that negotiation pass. This is an optimization based on the theory that the cluster jobs are similar. The configuration variable NEGOTIATE_ALL_JOBS_IN_CLUSTER disables the cluster-skipping optimization. Use of the configuration variable SIGNIFICANT_ATTRIBUTES will change the definition of what the condor_schedd considers a cluster from the default definition of all jobs that share the same ClusterId.
3.4.6 The Layperson's Description of the Pie Spin and Pie Slice
Condor schedules in a variety of ways. First, it takes all users who have submitted jobs and calculates their priority. Then, it totals the number of resources available at the moment, and using the ratios of the user priorities, it calculates the number of machines each user could get. This is their pie slice.
The Condor matchmaker goes in user priority order, contacts each user, and asks for job information. The condor_schedd daemon (on behalf of a user) tells the matchmaker about a job, and the matchmaker looks at available resources to create a list of resources that match the requirements expression. With the list of resources that match, it sorts them according to the rank expressions within ClassAds. If a machine prefers a job, the job is assigned to that machine, potentially preempting a job that might already be running on that machine. Otherwise, give the machine to the job that the job ranks highest. If the machine ranked highest is already running a job, we may preempt running job for the new job. A default policy for preemption states that the user must have a 20% better priority in order for preemption to succeed. If the job has no preferences as to what sort of machine it gets, matchmaking gives it the first idle resource to meet its requirements.
This matchmaking cycle continues until the user has received all of the machines in their pie slice. The matchmaker then contacts the next highest priority user and offers that user their pie slice worth of machines. After contacting all users, the cycle is repeated with any still available resources and recomputed pie slices. The matchmaker continues spinning the pie until it runs out of machines or all the condor_schedd daemons say they have no more jobs.
By default, Condor does all accounting on a per-user basis, and this accounting is primarily used to compute priorities for Condor's fair-share scheduling algorithms. However, accounting can also be done on a per-group basis. Multiple users can all submit jobs into the same accounting group, and all of the jobs will be treated with the same priority.
To use an accounting group, each job inserts an attribute into the job ClassAd which defines the accounting group name for the job. A common name is decided upon and used for the group. The following line is an example that defines the attribute within the job's submit description file:
+AccountingGroup = "group_physics"
The AccountingGroup attribute is a string, and it therefore must be enclosed in double quote marks. The string may have a maximum length of 40 characters. The name should not be qualified with a domain. Certain parts of the Condor system do append the value $(UID_DOMAIN) (as specified in the configuration file on the submit machine) to this string for internal use. For example, if the value of UID_DOMAIN is example.com, and the accounting group name is as specified, condor_userprio will show statistics for this accounting group using the appended domain, for example
Effective
User Name Priority
------------------------------ ---------
group_physics@example.com 0.50
user@example.com 23.11
heavyuser@example.com 111.13
...
Additionally, the condor_userprio command allows administrators to remove an entity from the accounting system in Condor. The -delete option to condor_userprio accomplishes this if all the jobs from a given accounting group are completed, and the administrator wishes to remove that group from the system. The -delete option identifies the accounting group with the fully-qualified name of the accounting group. For example
condor_userprio -delete group_physics@example.com
Condor removes entities itself as they are no longer relevant. Intervention by an administrator to delete entities can be beneficial when the use of thousands of short term accounting groups leads to scalability issues.
Note that the name of an accounting group may include a period (.). Inclusion of a period character in the accounting group name only has relevance if the portion of the name before the period matches a group name, as described in the next section on group quotas.
The use of group quotas modifies the negotiation for available resources (machines) within a Condor pool. This solves the difficulties inherent when priorities assigned based on each single user are insufficient. This may be the case when different groups (of varying size) own computers, and the groups choose to combine their computers to form a Condor pool. Consider an imaginary Condor pool example with thirty computers; twenty computers are owned by the physics group and ten computers are owned by the chemistry group. One notion of fair allocation could be implemented by configuring the twenty machines owned by the physics group to prefer (using the RANK configuration macro) jobs submitted by the users identified as associated with the physics group. Likewise, the ten machines owned by the chemistry group are configured to prefer jobs from users associated with the the chemistry group. This routes jobs to execute on specific machines, perhaps causing more preemption than necessary. The (fair allocation) policy desired is likely somewhat different, if these thirty machines have been pooled. The desired policy does not tie users to specific sets of machines, but to numbers of machines (a quota). Given thirty similar machines, the desired policy allows users within the physics group to have preference on up to twenty of the machines within the pool, and the machines can be any of the machines that are available.
A quota for a set of users requires an identification of the set; members are called group users. Jobs under the group quota specify the group user with the AccountingGroup job ClassAd attribute. This is the same attribute as is used with group accounting.
The submit file syntax for specifying a group user includes both a group name and a user name. The syntax is
+AccountingGroup = "<group>.<user>"The
group is a name chosen for the group.
Group names are case-insensitive for negotiation.
Group names are not required to begin with the
string "group_",
as in the examples
"group_physics.newton" and "group_chemistry.curie",
but it is a useful convention,
because group names must not conflict with user names.
The period character between the group and the user name is
a required part of the syntax. NOTE: An accounting
group value lacking the period will cause the job to not
be considered part of the
group when negotiating, even if the group name has a quota.
Furthermore, there will be no warnings that the group quota is not
in effect for the job, as this syntax defines group accounting.
Configuration controls the order of negotiation for
groups and individual users,
as well as sets quotas
(preferentially allocated numbers of machines)
for the groups.
A declared number of slots specifies the quota for each group
(see GROUP_QUOTA_<groupname>
in section 3.3.17).
The sum of the quotas for all groups should typically be less than or equal to
the number of slots in the entire pool, but there are situations where it
can make sense to have the sum be greater than the size of the pool.
An example of this is where large quotas can be used to give
some groups a chance to
claim all slots before other groups have a chance to claim any.
If the sum is less than the number of
slots in the entire pool,
the slots are
allocated to the none group,
comprised of the general
users not submitting jobs in a group.
Where group users are specified for jobs, accounting is done per group user. It is no longer done by group, or by individual user.
Negotiation is changed when group quotas are used.
Condor negotiates first for defined groups,
and then for independent job submitters.
Given jobs belonging to different groups,
Condor negotiates first for the group
currently utilizing the smallest percentage of machines
in its quota.
After this,
Condor negotiates for the group
currently utilizing the second smallest percentage of machines
in its quota.
The last group will be the one with the highest percentage
of machines in its quota.
As an example, again use the imaginary pool and groups
given above.
If various users within group_physics have
jobs running on 15 computers,
then the physics group has 75% of the
machines within its quota.
If various users within group_chemistry have
jobs running on 5 computers,
then the chemistry group has 50% of the
machines within its quota.
Negotiation will take place for the chemistry group first.
For independent job submissions (those not part of any group),
the classic Condor user fair share algorithm still applies.
Note that there is no verification that a user is a member of the group that he claims. We rely on societal pressure for enforcement.
Configuration variables affect group quotas. See section 3.3.17 for detailed descriptions of the variables mentioned. Group names that may be given quotas to be used in negotiation are listed in the GROUP_NAMES macro. The names chosen must not conflict with Condor user names. Quotas (by group) are defined in numbers of machine slots. Each group may be assigned an initial value for its user priority factor with the GROUP_PRIO_FACTOR_<groupname> macro. If a group is currently allocated its entire quota of machines, and a group user has a submitted job that is not running, the GROUP_AUTOREGROUP macro allows the job to be considered a second time within the negotiation cycle along with all other individual users' jobs. The user name that is used for accounting and prioritization purposes is still the group user as specified by AccountingGroup in the job ClassAd.
#################### # # Example 1 # Configuration for group quotas # #################### GROUP_NAMES = group_physics, group_chemistry GROUP_QUOTA_group_physics = 20 GROUP_QUOTA_group_chemistry = 10 GROUP_PRIO_FACTOR_group_physics = 1.0 GROUP_PRIO_FACTOR_group_chemistry = 3.0 GROUP_AUTOREGROUP_group_physics = FALSE GROUP_AUTOREGROUP_group_chemistry = TRUE
This configuration specifies that the group_physics users will
get 20 machines and the group_chemistry users will get
ten machines. group_physics users will never get more than
20 machines; however, group_chemistry users can potentially get
more than ten machines because GROUP_AUTOREGROUP_chemistry
is true.
This could happen, for example, if there are only 15 jobs
submitted by group_physics users.
Also, the default priority
factor for the physics groups is 1.0, and the default priority factor
for the chemistry group is 3.0.
#################### # # Submit description file for group quota user # #################### ... +AccountingGroup = "group_physics.newton" ...
This submit file specifies that this job is to be negotiated as part of
the group_physics group and that the user is newton.
Remember that
both the group name and the user name are required for the group quota
to take effect.