Skip to main content

Slurm Diagnostics

Prerequisites
  • Access Level: Super-admin.
  • Permission Requirements:
  • . VM Administration
    • View all VM configs in the system
    • Arbitrarily edit any VM config
    • Arbitrarily delete any VM config
    • View host machines and hardware information in any Libvirt realm
    • Register physical VM host machines in Libvirt realms
    • Edit VM host machine info in any Libvirt realm
    • Delete VM host machine info in any Libvirt realm
    • View external servers
    • Create external servers
    • Edit external servers
    • Arbitrarily view logs from any VM

Access SLURM Diagnostics from Management

  1. Go to the Management icon in the top left taskbar.
  2. Navigate to the Slurm section.
  3. Click the Slurm Diagnostics in the left panel.
  4. View the following diagnostics output as of the date,time and ID you access the data.
note

The date, time and ID you access the data may have the following format Mon Aug 15 09:34:46 2025 (1890128339).

Diagnostics Output Timestamps

Output TimestampsExample
sdiag output atMon Aug 15 09:34:46 2025 (1890128339)
Data sinceTue Sep 12 10:12:19 2025 (1890128339)

Diagnostics Output Data

Diagnostics OutputExample
Server thread count1
RPC queue enabled0
Agent queue size0
Agent count0
Agent thread count0
DBD Agent queue size0
Jobs submitted0
Jobs started0
Jobs completed0
Jobs canceled0
Jobs failed0
Job states tsMon Aug 15 09:34:46 2025 (1890128339)
Jobs pending0
Jobs running0

Main Schedule Statistics in Microseconds

Main schedule statistics (microseconds)Example
Last cycle15
Max cycle23
Total cycles100
Mean cycle12
Mean depth cycle0
Cycles per minute1
Last queue length0

Main Scheduler Exit Stats

Main scheduler exitExample
End of job queue100
Hit default_queue_depth0
Hit sched_max_job_start0
Blocked on licenses0
Hit max_rpc_cnt0
Timeout (max_sched_time)0

Backfilling Stats

Backfilling statsExample
Total backfilled jobs (since last slurm start)902
Total backfilled jobs (since last stats cycle start)0
Total backfilled heterogeneous job components0
Total cycles0
Last cycle whenMon Aug 15 09:34:46 2025 (1890128339)
Last cycle0
Max cycle0
Last depth cycle0
Last depth cycle (try sched)0
Last queue length0
Last table size1

Backfill Exit Stats

Backfill exitExample
End of job queue0
Hit bf_max_job_start0
Hit bf_max_job_test0
System state changed0
Hit table size limit (bf_node_space_size)0
Timeout (bf_max_time)0
Latency for 1000 calls to gettimeofday()29 microseconds

Remote Procedure Call Statistics by Message Type

Remote Procedure Call statistics by message typeCount ExampleAverage Time ExampleTotal Time Example
REQUEST_FED_INFO( 4021) count:12909ave_time:51total_time:1235521
REQUEST_KILL_JOB( 6011) count:12398ave_time:9533total_time:2134425
REQUEST_JOB_INFO_SINGLE( 5081) count:12309ave_time:128total_time:1235521
REQUEST_SUBMIT_BATCH_JOB( 5089) count:87221ave_time:826total_time:12345256
REQUEST_CANCEL_JOB_STEP( 4017) count:19812ave_time:3152total_time:91237789
REQUEST_COMPLETE_BATCH_SCRIPT( 5901) count:82710ave_time:2425total_time:09127816
REQUEST_PARTITION_INFO( 9281) count:12398ave_time:81total_time:912382
REQUEST_JOB_INFO( 1008) count:89127ave_time:473total_time:1238769
REQUEST_STEP_COMPLETE( 8019) count:2135ave_time:285total_time:1091823
REQUEST_HET_JOB_ALLOC_INFO( 8192) count:9817ave_time:211total_time:2398019
REQUEST_JOB_STEP_CREATE( 5910) count:6712ave_time:652total_time:2918232
REQUEST_AUTH_TOKEN( 6015) count:9812ave_time:373total_time:7216707
MESSAGE_EPILOG_COMPLETE( 9011) count:1236ave_time:221total_time:12389712
MESSAGE_NODE_REGISTRATION_STATUS( 2901) count:9871ave_time:178total_time:712778
REQUEST_NODE_INFO( 1029) count:176ave_time:122total_time:12892
REQUEST_BUILD_INFO( 1727) count:198ave_time:225total_time:512781
REQUEST_PING( 9826) count:15ave_time:88total_time:9812
REQUEST_STATS_INFO( 5661) count:52ave_time:129total_time:1236
REQUEST_JOB_STEP_INFO( 1921) count:73ave_time:125total_time:19231
ACCOUNTING_UPDATE_MSG(18832) count:12ave_time:61total_time:801
REQUEST_UPDATE_NODE( 7312) count:5ave_time:322total_time:18921
REQUEST_SHARE_INFO( 4401) count:2ave_time:152total_time:8787
REQUEST_TOPO_INFO( 2542) count:7ave_time:152total_time:1398
REQUEST_PERSIST_INIT( 7547) count:8ave_time:124total_time:785

Remote Procedure Call Statistics by User

Remote Procedure Call statistics by userCount ExampleAverage Time ExampleTotal Time Example
ticrypt(1234) count:239657ave_time:1523total_time:212387612
root(0) count:15252ave_time:5423total_time:352366690
slurm(452) count:25ave_time:64total_time:2456

Pending RPC Statistics

Pending RPC statisticsExample
No pending RPCsn/a