Logo Cineca Logo SCAI
MARCONI status
GALILEO100 status
LEONARDO status

You are here

Marconi100: job level GPU usage and accounting now available

Error message

Deprecated function: unserialize(): Passing null to parameter #1 ($data) of type string is deprecated in css_injector_init() (line 53 of /prod_service02/web-prod/hpc-web/sites/all/modules/css_injector/css_injector.module).
07/07/2021

Dear Users,

during last M100 maintenance, we configured SLURM resource manager so to collect statistics on the GPU usage and accounting for each job. The service is based on NVIDIA Data Center GPU Manager (DCGM), and produces a report per node, for all the requested GPUs, at the end of each job. The reports are saved in the job submit directory, in files named "dcgmi_stats_<nodename>_<jobid>.out".

The report contains statistics on the GPU usage (Power and Memory usage, etc.) for your run, and the assessment on the overall health state of GPUs.

Best regards,

HPC User Support @ CINECA