How to Calculate Effective SM Usage for SupremeRAID™ AE

How to Calculate Effective SM Usage for SupremeRAID™ AE

Summary

This article explains how to calculate the effective Streaming Multiprocessor (SM) usage of SupremeRAID™ AE workloads on Linux systems using NVIDIA DCGM and OpenCL (clinfo) tools.

This method helps administrators understand the actual GPU compute resources consumed by SupremeRAID™ AE and is useful for:

  • GPU capacity planning

  • vGPU / shared GPU resource evaluation

  • Workload sizing and performance analysis


Environment

  • Product: SupremeRAID™ AE

  • Operating System: Linux

  • GPU Vendor: NVIDIA

  • GPU Type: Discrete NVIDIA GPUs (e.g., H100 / H200 / A100 / L40)

  • Privileges: Root or sudo access required


Prerequisites

Ensure the following requirements are met before proceeding:

  • NVIDIA GPU driver is properly installed

  • SupremeRAID™ AE workload is running

  • Root or sudo privileges are available

  • Internet access is available (for package installation)


Procedure


Step 1 – Install Required Tools

Install the tools required to query GPU hardware information and monitor real-time GPU metrics:

  • clinfo – Used to retrieve OpenCL platform and device information

  • datacenter-gpu-manager (DCGM) – NVIDIA tool for monitoring GPU utilization metrics

sudo apt update
sudo apt install clinfo datacenter-gpu-manager

Note: On RHEL / Rocky / Alma Linux, package names or repositories may differ.


Step 2 – Enable NVIDIA DCGM Service

Start and enable the NVIDIA DCGM service:

sudo systemctl --now enable nvidia-dcgm

Optional verification:

systemctl status nvidia-dcgm

Step 3 – Monitor SM Active Ratio

Use dcgmi to monitor the SM Active metric.

  • Metric ID 1002 (SM Active)

    • Indicates the ratio of time during which at least one warp was active on an SM

  • -i <GPU_ID> specifies the target GPU index

# Example: Monitor GPU ID 0
sudo dcgmi dmon -e 1002 -i 0

Record the reported value (typically between 0.0 and 1.0).

Example:

  • 0.25 → 25% SM active ratio


Step 4 – Retrieve Total SM Count

Use clinfo to retrieve the total number of hardware Compute Units (SMs) for a specific GPU.

clinfo -d P:D | grep "Max compute units"

Parameter Explanation

  • P – OpenCL platform index

  • D – Device index within the selected platform

This allows you to query a specific GPU in multi-GPU or multi-platform systems.


Identify Platform and Device Index

To list all available OpenCL platforms and devices:

clinfo -l

Example Output

Platform #0: NVIDIA CUDA
+-- Device #0: NVIDIA H200
+-- Device #1: NVIDIA H200
+-- Device #2: NVIDIA H200
+-- Device #3: NVIDIA H200
+-- Device #4: NVIDIA H200
+-- Device #5: NVIDIA H200
+-- Device #6: NVIDIA H200
`-- Device #7: NVIDIA H200

In this example:

  • Platform index: 0 (NVIDIA CUDA)

  • Device indices: 0 through 7

Example Commands

Query SM count for Device #0:

clinfo -d 0:0 | grep "Max compute units"

Query SM count for Device #3:

clinfo -d 0:3 | grep "Max compute units"

Step 5 – Calculate Effective SM Usage

Apply the following formula:

Effective SM Usage=Total SM Count×SM Active Ratio\text{Effective SM Usage} = \text{Total SM Count} \times \text{SM Active Ratio}
Effective SM Usage=Total SM Count×SM Active Ratio


Calculation Example

Assume the system reports the following values:

  • Total SM Count (from clinfo): 132

  • SM Active Ratio (from dcgmi, Metric ID 1002): 0.053

Calculation

132×0.053=6.996≈7 SMs132 \times 0.053 = 6.996 \approx 7 \text{ SMs}
132×0.053=6.9967 SMs

Interpretation

This result indicates that the SupremeRAID™ AE workload is effectively utilizing compute resources equivalent to approximately 7 Streaming Multiprocessors, even though the physical GPU provides 132 SMs in total.

This typically suggests:

  • The workload is I/O-bound rather than compute-bound

  • GPU compute headroom remains available for:

    • Additional SupremeRAID™ AE workloads

    • Other GPU compute or AI workloads

  • The result is suitable for capacity planning and vGPU / GPU sharing scenarios


  • An SM Active Ratio around 5% is common for storage-accelerated workloads

  • This does not indicate underperformance or misconfiguration

  • For accurate planning, observe SM Active over:

    • Sustained workload duration

    • Peak I/O scenarios