sourCEntral - mobile manpages

pdf

sacct

NAME

sacct − displays accounting data for all jobs and job steps in the SLURM job accounting log

SYNOPSIS

sacct options

DESCRIPTION

Accounting information for jobs invoked with SLURM are logged in the job accounting log file.

The sacct command displays job accounting data stored in the job accounting log file in a variety of forms for your analysis. The sacct command displays information on jobs, job steps, status, and exitcodes by default. You can tailor the output with the use of the −−fields= option to specify the fields to be shown.

For the root user, the sacct command displays job accounting data for all users, although there are options to filter the output to report only the jobs from a specified user or group.

For the non−root user, the sacct command limits the display of job accounting data to jobs that were launched with their own user identifier (UID) by default. Data for other users can be displayed with the −−all, −−user, or −−uid options.

Note:

Much of the data reported by sacct has been generated by the wait3() and getrusage() system calls. Some systems gather and report incomplete information for these calls; sacct reports values of 0 for this missing data. See your systems getrusage(3) man page for information about which data are actually available on your system.

Options
−a , −−all

Displays the job accounting data for all jobs in the job accounting log file.

This is the default behavior when the sacct command is executed by the root user.

−b , −−brief

Displays a brief listing, which includes the following data:

jobid

status

exitcode

This option has no effect when the −−−dump option is also specified.

−d , −−dump

Displays (dumps) the raw data records.

This option overrides the −−brief and −−fields= options.

The section titled "INTERPRETING THE −−dump OPTION OUTPUT" describes the data output when this option is used.

−e time_spec , −−expire=time_spec

Removes job data from SLURMs current accounting log file (or the file specified with −−file) for jobs that completed more than time_spec ago and appends them to the expired log file.

If time_spec is an integer value only, it is interpreted as minutes. If time_spec is an integer followed by "h", it is interpreted as a number of hours. If time_spec is an integer followed by "d", it is interpreted as number of days. For example, "−−expire=14d" purges the job accounting log of all jobs that completed more than 14 days ago.

The expired log file is a file with the same name as the accounting log file, with ".expired" appended to the file name. For example, if the accounting log file is /var/log/slurmacct.log, the expired log file will be /var/log/slurmacct.log.expired.

−F field_list , −−fields=field_list

Displays the job accounting data specified by the field_list operand, which is a comma−separated list of fields. Space characters are not allowed in the field_list.

See the −−help−fields option for a list of the available fields. See the section titled "Job Accounting Fields" for a description of each field.

The job accounting data is displayed in the order specified by the field_list operand. Thus, the following two commands display the same data but in different order:

# sacct −−fields=jobid,status
Jobid Status
−−−−−−−−−− −−−−−−−−−−
3 COMPLETED
3.0 COMPLETED

# sacct −−fields=status,jobid
Status Jobid
−−−−−−−−−− −−−−−−−−−−
COMPLETED 3
COMPLETED 3.0

The default value for the field_list operand is "jobid,partition,process,ncpus,status,exitcode".

This option has no effect when the −−dump option is also specified.

−f file, −−file=file

Causes the sacct command to read job accounting data from the named file instead of the current SLURM job accounting log file.

−g gid, −−gid=gid

Displays the statistics only for the jobs started with GID gid.

−g group, −−group=group

Displays the statistics only for the jobs started by users in the group group.

−h , −−help

Displays a general help message.

−−help−fields

Displays a list of fields that can be specified with the −−fields option.

Fields available:
account blockid cpu cputime
elapsed end exitcode gid
group idrss inblock isrss
ixrss job jobid jobname
majflt minflt msgrcv msgsnd
ncpus nivcsw nodes nprocs
nsignals nswap ntasks nvcsw
outblocks pages partition rss
start status submit systemcpu
uid user usercpu vsize

The section titled "Job Accounting Fields" describes these fields.

−j job(.step) , −−jobs=job(.step)

Displays information about the specified job(.step) or list of job(.step)s.

The job(.step) parameter is a comma−separated list of jobs. Space characters are not permitted in this list.

The default is to display information on all jobs.

−l, −−long

Displays a long listing, which includes the following data:

jobid

jobname

partition

vsize

rss

pages

cputime

ntasks

ncpus

elapsed

status

exitcode

−−noheader

Prevents the display of the heading over the output. The default action is to display a header.

This option has no effect when used with the −−dump option.

−O , −−formatted_dump

Dumps accounting records in an easy−to−read format.

This option is provided for debugging.

−p partition_list , −−partition=partition_list

Displays information about jobs and job steps specified by the partition_list operand, which is a comma−separated list of partitions. Space characters are not allowed in the partition_list.

The default is to display information on jobs and job steps on all partitions.

−S , −−stat

Queries the status of a job as the job is running displaying the following data:

jobid

vsize

rss

pages

cputime

ntasks

status

You must also include the −−jobs=job(.step) option if no (.step) is given you will recieve the job.0 step.

−s state_list , −−state=state_list

Selects jobs based on their current state, which can be designated with the following state designators:

r

running

s

suspended

ca

cancelled

cd

completed

pd

pending

f

failed

to

timed out

nf

node_fail

The state_list operand is a comma−separated list of these state designators. Space characters are not allowed in the state_list.

−t , −−total

Displays only the cumulative statistics for each job. Intermediate steps are displayed by default.

−u uid, −−uid=uid

Displays the statistics only for the jobs started by the user whose UID is uid.

−u user, −−user=user

Displays the statistics only for the jobs started by user user.

−−usage

Displays a help message.

−v , −−verbose

Reports the state of certain variables during processing. This option is primarily used for debugging.

Job Accounting Fields
The following describes each job accounting field:

account

User supplied account number for the job

blockid

Block ID, applicable to BlueGene computers only

cpu

The sum of the system time (systemcpu) and user time (usercpu) in seconds

cputime

Minimum CPU time of any process followed by its task id along with the average of all processes running in the step.

elapsed

The jobs elapsed time.

The format of this fields output is as follows:

[DD−[hh:]]mm:ss

as defined by the following:

DD

days

hh

hours

mm

minutes

ss

seconds

end

Termination time of the job. Format output is as follows:

MM/DD−hh:mm:ss

as defined by the following:

MM

month

DD

day

hh

hours

mm

minutes

ss

seconds

exitcode

The first non−zero error code returned by any job step.

gid

The group identifier of the user who ran the job.

group

The group name of the user who ran the job.

idrss

Maximum unshared data size (in KB) of any process.

inblocks

Total block input operations for all processes.

isrss

Maximum unshared stack space size (in KB) of any process.

ixrss

Maximum shared memory (in KB) of any process.

job

The SLURM job identifier of the job.

jobid

The number of the job or job step. It is in the form: job.jobstep.

jobname

The name of the job or job step.

majflt

Maximum number of major page faults for any process.

minflt

Maximum number of minor page faults (page reclaims) for any process.

msgrcv

Total number of messages received for all processes.

msgsnd

Total number of messages sent for all processes.

ncpus

Total number of CPUs allocated to the job.

nivcsw

Total number of involuntary context switches for all processes.

nodes

A list of nodes allocated to the job.

nprocs

Total number of tasks in job. Identical to ntasks.

nsignals

Total number of signals received for all processes.

nswap

Maximum number of swap operations of any process.

ntasks

Total number of tasks in job.

nvcsw

Total number of voluntary context switches for all processes.

outblocks

Total block output operations for all processes.

pages

Maximum page faults of any process followed by its task id along with the average of all processes running in the step.

partition

Identifies the partition on which the job ran.

rss

Maximum resident set size of any process followed by its task id along with the average of all processes running in the step.

start

Initiation time of the job in the same format as end.

status

Displays the job status, or state.

Output can be RUNNING, SUSPENDED, COMPLETED, CANCELLED, FAILED, TIMEOUT, or NODE_FAIL.

submit

The time and date stamp (in Universal Time Coordinated, UTC) the job was submitted. The format of the output is identical to that of the end field.

systemcpu

The amount of system CPU time. (If job was running on multiple cpus this is a combination of all the times so this number could be much larger than the elapsed time.) The format of the output is identical to that of the elapsed field.

uid

The user identifier of the user who ran the job.

uid.gid

The user and group identifiers of the user who ran the job. (This field is used in record headers, and simply concatenates the uid and gid fields.)

user

The user name of the user who ran the job.

usercpu

The amount of user CPU time. (If job was running on multiple cpus this is a combination of all the times so this number could be much larger than the elapsed time.) The format of the output is identical to that of the elapsed field.

vsize

Maximum Virtual Memory size of any process followed by its task id along with the average of all processes running in the step.

INTERPRETING THE −DUMP OPTION OUTPUT

The sacct commands −−dump option displays data in a horizontal list of fields depending on the record type; there are three record types: JOB_START, JOB_STEP, and JOB_TERMINATED. There is a subsection that describes the output for each record type.

When the data output is a job accounting field, as described in the section titled "Job Accounting Fields", only the name of the job accounting field is listed. Otherwise, additional information is provided.

Note:

The output for the JOB_STEP and JOB_TERMINATED record types present a pair of fields for the following data: Total CPU time, Total User CPU time, and Total System CPU time. The first field of each pair is the time in seconds expressed as an integer. The second field of each pair is the fractional number of seconds multiplied by one million. Thus, a pair of fields output as "1 024315" means that the time is 1.024315 seconds. The least significant digits in the second field are truncated in formatted displays.

Output for the JOB_START Record Type
The following describes the horizontal fields output by the sacct −−dump option for the JOB_START record type.

Field #

Field

1

job

2

partition

3

submitted

4

The jobs start time; this value is the number of non−leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)

5

uid.gid

6

(Reserved)

7

JOB_START (literal string)

8

Job Record Version (1)

9

The number of fields in the record (16)

10

uid

11

gid

12

The job name

13

Batch Flag (0=no batch)

14

Relative SLURM priority

15

ncpus

16

nodes

Output for the JOB_STEP Record Type
The following describes the horizontal fields output by the sacct −−dump option for the JOB_STEP record type.

Field #

Field

1

job

2

partition

3

submitted

4

The jobs start time; this value is the number of non−leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)

5

uid.gid

6

(Reserved)

7

JOB_STEP (literal string)

8

Job Record Version (1)

9

The number of fields in the record (38)

10

jobid

11

end

12

Completion Status; the mnemonics, which may appear in uppercase or lowercase, are as follows:

CA

Cancelled

CD

Completed successfully

F

Failed

NF

Job terminated from node failure

R

Running

S

Suspended

TO

Timed out

13

exitcode

14

ntasks

15

ncpus

16

elapsed time in seconds expressed as an integer

17

Integer portion of the Total CPU time in seconds for all processes

18

Fractional portion of the Total CPU time for all processes expressed in microseconds

19

Integer portion of the Total User CPU time in seconds for all processes

20

Fractional portion of the Total User CPU time for all processes expressed in microseconds

21

Integer portion of the Total System CPU time in seconds for all processes

22

Fractional portion of the Total System CPU time for all processes expressed in microseconds

23

rss

24

ixrss

25

idrss

26

isrss

27

minflt

28

majflt

29

nswap

30

inblocks

31

outblocks

32

msgsnd

33

msgrcv

34

nsignals

35

nvcsw

36

nivcsw

37

vsize

Output for the JOB_TERMINATED Record Type
The following describes the horizontal fields output by the sacct −−dump option for the JOB_TERMINATED (literal string) record type.

Field #

Field

1

job

2

partition

3

submitted

4

The jobs start time; this value is the number of non−leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)

5

uid.gid

6

(Reserved)

7

JOB_TERMINATED (literal string)

8

Job Record Version (1)

9

The number of fields in the record (38)

Although thirty−eight fields are displayed by the sacct command for the JOB_TERMINATED record, only fields 1 through 12 are recorded in the actual data file; the sacct command aggregates the remainder.

10

The total elapsed time in seconds for the job.

11

end

12

Completion Status; the mnemonics, which may appear in uppercase or lowercase, are as follows:

CA

Cancelled

CD

Completed successfully

F

Failed

NF

Job terminated from node failure

R

Running

TO

Timed out

13

exitcode

14

ntasks

15

ncpus

16

elapsed time in seconds expressed as an integer

17

Integer portion of the Total CPU time in seconds for all processes

18

Fractional portion of the Total CPU time for all processes expressed in microseconds

19

Integer portion of the Total User CPU time in seconds for all processes

20

Fractional portion of the Total User CPU time for all processes expressed in microseconds

21

Integer portion of the Total System CPU time in seconds for all processes

22

Fractional portion of the Total System CPU time for all processes expressed in microseconds

23

rss

24

ixrss

25

idrss

26

isrss

27

minflt

28

majflt

29

nswap

30

inblocks

31

outblocks

32

msgsnd

33

msgrcv

34

nsignals

35

nvcsw

36

nivcsw

37

vsize

EXAMPLES

This example illustrates the default invocation of the sacct command:

# sacct
Jobid Jobname Partition Ncpus Status Exitcode
−−−−−−−−−− −−−−−−−−−− −−−−−−−−−− −−−−−−− −−−−−−−−−− −−−−−−−−
2 script01 srun 1 RUNNING 0
3 script02 srun 1 RUNNING 0
4 endscript srun 1 RUNNING 0
4.0 srun 1 COMPLETED 0

This example shows the same job accounting information with the brief option.

# sacct −−brief
Jobid Status Exitcode
−−−−−−−−−− −−−−−−−−−− −−−−−−−−
2 RUNNING 0
3 RUNNING 0
4 RUNNING 0
4.0 COMPLETED 0

# sacct −−total
Jobid Jobname Partition Ncpus Status Exitcode
−−−−−−−−−− −−−−−−−−−− −−−−−−−−−− −−−−−−− −−−−−−−−−− −−−−−−−−
3 sja_init andy 1 COMPLETED 0
4 sjaload andy 2 COMPLETED 0
5 sja_scr1 andy 1 COMPLETED 0
6 sja_scr2 andy 18 COMPLETED 2
7 sja_scr3 andy 18 COMPLETED 0
8 sja_scr5 andy 2 COMPLETED 0
9 sja_scr7 andy 90 COMPLETED 1
10 endscript andy 186 COMPLETED 0

This example demonstrates the ability to customize the output of the sacct command. The fields are displayed in the order designated on the command line.

# sacct −−fields=jobid,ncpus,ntasks,nsignals,status
Jobid Ncpus Ntasks Nsignals Status
−−−−−−−−−− −−−−−−− −−−−−−− −−−−−−−−− −−−−−−−−−−
3 2 1 0 COMPLETED
3.0 2 1 0 COMPLETED
4 2 2 0 COMPLETED
4.0 2 2 0 COMPLETED
5 2 1 0 COMPLETED
5.0 2 1 0 COMPLETED

COPYING

Copyright (C) 2005−2007 Copyright Hewlett−Packard Development Company L.P.

This file is part of SLURM, a resource management program. For details, see <https://computing.llnl.gov/linux/slurm/>.

SLURM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

SLURM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

FILES

/etc/slurm.conf

Entries to this file enable job accounting and designate the job accounting log file that collects system job accounting.

/var/log/slurm_accounting.log

The default job accounting log file. By default, this file is set to read and write permission for root only.

SEE ALSO

ps(1), srun(1), squeue(1), getrusage(2), time(2)

pdf