Slurm Diagnostics
Prerequisites
Access Level:Super-admin
Permission Requirements
- . VM Administration
- View all VM configs in the system
- Arbitrarily edit any VM config
- Arbitrarily delete any VM config
- View host machines and hardware information in any Libvirt realm
- Register physical VM host machines in Libvirt realms
- Edit VM host machine info in any Libvirt realm
- Delete VM host machine info in any Libvirt realm
- View external servers
- Create external servers
- Edit external servers
- Arbitrarily view logs from any VM
Access SLURM Diagnostics
- Go to the Management icon in the top left taskbar.
- Navigate to the Slurm section.
- Click the Slurm Diagnostics in the left panel.
- View the following diagnostics output as of
the date,time and ID you access the data.
note
The date, time and ID you access the data may have the following format Mon Aug 15 09:34:46 2025 (1890128339).
Diagnostics Output Timestamps
| Output Timestamps | Example |
|---|---|
sdiag output at | Mon Aug 15 09:34:46 2025 (1890128339) |
Data since | Tue Sep 12 10:12:19 2025 (1890128339) |
Diagnostics Output Data
| Diagnostics Output | Example |
|---|---|
Server thread count | 1 |
RPC queue enabled | 0 |
Agent queue size | 0 |
Agent count | 0 |
Agent thread count | 0 |
DBD Agent queue size | 0 |
Jobs submitted | 0 |
Jobs started | 0 |
Jobs completed | 0 |
Jobs canceled | 0 |
Jobs failed | 0 |
Job states ts | Mon Aug 15 09:34:46 2025 (1890128339) |
Jobs pending | 0 |
Jobs running | 0 |
Main Schedule Statistics in Microseconds
| Main schedule statistics (microseconds) | Example |
|---|---|
Last cycle | 15 |
Max cycle | 23 |
Total cycles | 100 |
Mean cycle | 12 |
Mean depth cycle | 0 |
Cycles per minute | 1 |
Last queue length | 0 |
Main Scheduler Exit Stats
| Main scheduler exit | Example |
|---|---|
End of job queue | 100 |
Hit default_queue_depth | 0 |
Hit sched_max_job_start | 0 |
Blocked on licenses | 0 |
Hit max_rpc_cnt | 0 |
Timeout (max_sched_time) | 0 |
Backfilling Stats
| Backfilling stats | Example |
|---|---|
Total backfilled jobs (since last slurm start) | 902 |
Total backfilled jobs (since last stats cycle start) | 0 |
Total backfilled heterogeneous job components | 0 |
Total cycles | 0 |
Last cycle when | Mon Aug 15 09:34:46 2025 (1890128339) |
Last cycle | 0 |
Max cycle | 0 |
Last depth cycle | 0 |
Last depth cycle (try sched) | 0 |
Last queue length | 0 |
Last table size | 1 |
Backfill Exit Stats
| Backfill exit | Example |
|---|---|
End of job queue | 0 |
Hit bf_max_job_start | 0 |
Hit bf_max_job_test | 0 |
System state changed | 0 |
Hit table size limit (bf_node_space_size) | 0 |
Timeout (bf_max_time) | 0 |
Latency for 1000 calls to gettimeofday() | 29 microseconds |
Remote Procedure Call Statistics by Message Type
| Remote Procedure Call statistics by message type | Count Example | Average Time Example | Total Time Example |
|---|---|---|---|
REQUEST_FED_INFO | ( 4021) count:12909 | ave_time:51 | total_time:1235521 |
REQUEST_KILL_JOB | ( 6011) count:12398 | ave_time:9533 | total_time:2134425 |
REQUEST_JOB_INFO_SINGLE | ( 5081) count:12309 | ave_time:128 | total_time:1235521 |
REQUEST_SUBMIT_BATCH_JOB | ( 5089) count:87221 | ave_time:826 | total_time:12345256 |
REQUEST_CANCEL_JOB_STEP | ( 4017) count:19812 | ave_time:3152 | total_time:91237789 |
REQUEST_COMPLETE_BATCH_SCRIPT | ( 5901) count:82710 | ave_time:2425 | total_time:09127816 |
REQUEST_PARTITION_INFO | ( 9281) count:12398 | ave_time:81 | total_time:912382 |
REQUEST_JOB_INFO | ( 1008) count:89127 | ave_time:473 | total_time:1238769 |
REQUEST_STEP_COMPLETE | ( 8019) count:2135 | ave_time:285 | total_time:1091823 |
REQUEST_HET_JOB_ALLOC_INFO | ( 8192) count:9817 | ave_time:211 | total_time:2398019 |
REQUEST_JOB_STEP_CREATE | ( 5910) count:6712 | ave_time:652 | total_time:2918232 |
REQUEST_AUTH_TOKEN | ( 6015) count:9812 | ave_time:373 | total_time:7216707 |
MESSAGE_EPILOG_COMPLETE | ( 9011) count:1236 | ave_time:221 | total_time:12389712 |
MESSAGE_NODE_REGISTRATION_STATUS | ( 2901) count:9871 | ave_time:178 | total_time:712778 |
REQUEST_NODE_INFO | ( 1029) count:176 | ave_time:122 | total_time:12892 |
REQUEST_BUILD_INFO | ( 1727) count:198 | ave_time:225 | total_time:512781 |
REQUEST_PING | ( 9826) count:15 | ave_time:88 | total_time:9812 |
REQUEST_STATS_INFO | ( 5661) count:52 | ave_time:129 | total_time:1236 |
REQUEST_JOB_STEP_INFO | ( 1921) count:73 | ave_time:125 | total_time:19231 |
ACCOUNTING_UPDATE_MSG | (18832) count:12 | ave_time:61 | total_time:801 |
REQUEST_UPDATE_NODE | ( 7312) count:5 | ave_time:322 | total_time:18921 |
REQUEST_SHARE_INFO | ( 4401) count:2 | ave_time:152 | total_time:8787 |
REQUEST_TOPO_INFO | ( 2542) count:7 | ave_time:152 | total_time:1398 |
REQUEST_PERSIST_INIT | ( 7547) count:8 | ave_time:124 | total_time:785 |
Remote Procedure Call Statistics by User
| Remote Procedure Call statistics by user | Count Example | Average Time Example | Total Time Example |
|---|---|---|---|
ticrypt | (1234) count:239657 | ave_time:1523 | total_time:212387612 |
root | (0) count:15252 | ave_time:5423 | total_time:352366690 |
slurm | (452) count:25 | ave_time:64 | total_time:2456 |
Pending RPC Statistics
| Pending RPC statistics | Example |
|---|---|
No pending RPCs | n/a |