Slurm Diagnostics
Prerequisites
- Access Level: Super-admin.
- Permission Requirements:
- . VM Administration
- View all VM configs in the system
- Arbitrarily edit any VM config
- Arbitrarily delete any VM config
- View host machines and hardware information in any Libvirt realm
- Register physical VM host machines in Libvirt realms
- Edit VM host machine info in any Libvirt realm
- Delete VM host machine info in any Libvirt realm
- View external servers
- Create external servers
- Edit external servers
- Arbitrarily view logs from any VM
Access SLURM Diagnostics from Management
- Go to the Management icon in the top left taskbar.
- Navigate to the Slurm section.
- Click the Slurm Diagnostics in the left panel.
- View the following diagnostics output as of
the date,time and ID you access the data
.
note
The date, time and ID you access the data may have the following format Mon Aug 15 09:34:46 2025 (1890128339)
.
Diagnostics Output Timestamps
Output Timestamps | Example |
---|---|
sdiag output at | Mon Aug 15 09:34:46 2025 (1890128339) |
Data since | Tue Sep 12 10:12:19 2025 (1890128339) |
Diagnostics Output Data
Diagnostics Output | Example |
---|---|
Server thread count | 1 |
RPC queue enabled | 0 |
Agent queue size | 0 |
Agent count | 0 |
Agent thread count | 0 |
DBD Agent queue size | 0 |
Jobs submitted | 0 |
Jobs started | 0 |
Jobs completed | 0 |
Jobs canceled | 0 |
Jobs failed | 0 |
Job states ts | Mon Aug 15 09:34:46 2025 (1890128339) |
Jobs pending | 0 |
Jobs running | 0 |
Main Schedule Statistics in Microseconds
Main schedule statistics (microseconds) | Example |
---|---|
Last cycle | 15 |
Max cycle | 23 |
Total cycles | 100 |
Mean cycle | 12 |
Mean depth cycle | 0 |
Cycles per minute | 1 |
Last queue length | 0 |
Main Scheduler Exit Stats
Main scheduler exit | Example |
---|---|
End of job queue | 100 |
Hit default_queue_depth | 0 |
Hit sched_max_job_start | 0 |
Blocked on licenses | 0 |
Hit max_rpc_cnt | 0 |
Timeout (max_sched_time) | 0 |
Backfilling Stats
Backfilling stats | Example |
---|---|
Total backfilled jobs (since last slurm start) | 902 |
Total backfilled jobs (since last stats cycle start) | 0 |
Total backfilled heterogeneous job components | 0 |
Total cycles | 0 |
Last cycle when | Mon Aug 15 09:34:46 2025 (1890128339) |
Last cycle | 0 |
Max cycle | 0 |
Last depth cycle | 0 |
Last depth cycle (try sched) | 0 |
Last queue length | 0 |
Last table size | 1 |
Backfill Exit Stats
Backfill exit | Example |
---|---|
End of job queue | 0 |
Hit bf_max_job_start | 0 |
Hit bf_max_job_test | 0 |
System state changed | 0 |
Hit table size limit (bf_node_space_size) | 0 |
Timeout (bf_max_time) | 0 |
Latency for 1000 calls to gettimeofday() | 29 microseconds |
Remote Procedure Call Statistics by Message Type
Remote Procedure Call statistics by message type | Count Example | Average Time Example | Total Time Example |
---|---|---|---|
REQUEST_FED_INFO | ( 4021) count:12909 | ave_time:51 | total_time:1235521 |
REQUEST_KILL_JOB | ( 6011) count:12398 | ave_time:9533 | total_time:2134425 |
REQUEST_JOB_INFO_SINGLE | ( 5081) count:12309 | ave_time:128 | total_time:1235521 |
REQUEST_SUBMIT_BATCH_JOB | ( 5089) count:87221 | ave_time:826 | total_time:12345256 |
REQUEST_CANCEL_JOB_STEP | ( 4017) count:19812 | ave_time:3152 | total_time:91237789 |
REQUEST_COMPLETE_BATCH_SCRIPT | ( 5901) count:82710 | ave_time:2425 | total_time:09127816 |
REQUEST_PARTITION_INFO | ( 9281) count:12398 | ave_time:81 | total_time:912382 |
REQUEST_JOB_INFO | ( 1008) count:89127 | ave_time:473 | total_time:1238769 |
REQUEST_STEP_COMPLETE | ( 8019) count:2135 | ave_time:285 | total_time:1091823 |
REQUEST_HET_JOB_ALLOC_INFO | ( 8192) count:9817 | ave_time:211 | total_time:2398019 |
REQUEST_JOB_STEP_CREATE | ( 5910) count:6712 | ave_time:652 | total_time:2918232 |
REQUEST_AUTH_TOKEN | ( 6015) count:9812 | ave_time:373 | total_time:7216707 |
MESSAGE_EPILOG_COMPLETE | ( 9011) count:1236 | ave_time:221 | total_time:12389712 |
MESSAGE_NODE_REGISTRATION_STATUS | ( 2901) count:9871 | ave_time:178 | total_time:712778 |
REQUEST_NODE_INFO | ( 1029) count:176 | ave_time:122 | total_time:12892 |
REQUEST_BUILD_INFO | ( 1727) count:198 | ave_time:225 | total_time:512781 |
REQUEST_PING | ( 9826) count:15 | ave_time:88 | total_time:9812 |
REQUEST_STATS_INFO | ( 5661) count:52 | ave_time:129 | total_time:1236 |
REQUEST_JOB_STEP_INFO | ( 1921) count:73 | ave_time:125 | total_time:19231 |
ACCOUNTING_UPDATE_MSG | (18832) count:12 | ave_time:61 | total_time:801 |
REQUEST_UPDATE_NODE | ( 7312) count:5 | ave_time:322 | total_time:18921 |
REQUEST_SHARE_INFO | ( 4401) count:2 | ave_time:152 | total_time:8787 |
REQUEST_TOPO_INFO | ( 2542) count:7 | ave_time:152 | total_time:1398 |
REQUEST_PERSIST_INIT | ( 7547) count:8 | ave_time:124 | total_time:785 |
Remote Procedure Call Statistics by User
Remote Procedure Call statistics by user | Count Example | Average Time Example | Total Time Example |
---|---|---|---|
ticrypt | (1234) count:239657 | ave_time:1523 | total_time:212387612 |
root | (0) count:15252 | ave_time:5423 | total_time:352366690 |
slurm | (452) count:25 | ave_time:64 | total_time:2456 |
Pending RPC Statistics
Pending RPC statistics | Example |
---|---|
No pending RPCs | n/a |