How to Calculate Effective SM Usage for SupremeRAID™ AE

Summary

This article explains how to calculate the effective Streaming Multiprocessor (SM) usage of SupremeRAID™ AE workloads on Linux systems using NVIDIA DCGM and OpenCL (clinfo) tools.

This method helps administrators understand the actual GPU compute resources consumed by SupremeRAID™ AE and is useful for:

GPU capacity planning
vGPU / shared GPU resource evaluation
Workload sizing and performance analysis

Environment

Product: SupremeRAID™ AE
Operating System: Linux
GPU Vendor: NVIDIA
GPU Type: Discrete NVIDIA GPUs (e.g., H100 / H200 / A100 / L40)
Privileges: Root or sudo access required

Prerequisites

Ensure the following requirements are met before proceeding:

NVIDIA GPU driver is properly installed
SupremeRAID™ AE workload is running
Root or sudo privileges are available
Internet access is available (for package installation)

Procedure

Step 1 – Install Required Tools

Install the tools required to query GPU hardware information and monitor real-time GPU metrics:

clinfo – Used to retrieve OpenCL platform and device information
datacenter-gpu-manager (DCGM) – NVIDIA tool for monitoring GPU utilization metrics


sudo apt update
sudo apt install clinfo datacenter-gpu-manager

Note: On RHEL / Rocky / Alma Linux, package names or repositories may differ.

Step 2 – Enable NVIDIA DCGM Service

Start and enable the NVIDIA DCGM service:


sudo systemctl --now enable nvidia-dcgm

Optional verification:


systemctl status nvidia-dcgm

Step 3 – Monitor SM Active Ratio

Use dcgmi to monitor the SM Active metric.

Metric ID 1002 (SM Active)
- Indicates the ratio of time during which at least one warp was active on an SM
-i <GPU_ID> specifies the target GPU index


# Example: Monitor GPU ID 0
sudo dcgmi dmon -e 1002 -i 0

Record the reported value (typically between 0.0 and 1.0).

Example:

0.25 → 25% SM active ratio

Step 4 – Retrieve Total SM Count

Use clinfo to retrieve the total number of hardware Compute Units (SMs) for a specific GPU.


clinfo -d P:D | grep "Max compute units"

Parameter Explanation

P – OpenCL platform index
D – Device index within the selected platform

This allows you to query a specific GPU in multi-GPU or multi-platform systems.

Identify Platform and Device Index

To list all available OpenCL platforms and devices:


clinfo -l

Example Output


Platform #0: NVIDIA CUDA
 +-- Device #0: NVIDIA H200
 +-- Device #1: NVIDIA H200
 +-- Device #2: NVIDIA H200
 +-- Device #3: NVIDIA H200
 +-- Device #4: NVIDIA H200
 +-- Device #5: NVIDIA H200
 +-- Device #6: NVIDIA H200
 `-- Device #7: NVIDIA H200

In this example:

Platform index: 0 (NVIDIA CUDA)
Device indices: 0 through 7

Example Commands

Query SM count for Device #0:


clinfo -d 0:0 | grep "Max compute units"

Query SM count for Device #3:


clinfo -d 0:3 | grep "Max compute units"

Step 5 – Calculate Effective SM Usage

Apply the following formula:

Ratio\text{Effective SM Usage} = \text{Total SM Count} \times \text{SM Active Ratio}

Calculation Example

Assume the system reports the following values:

Total SM Count (from clinfo): 132
SM Active Ratio (from dcgmi, Metric ID 1002): 0.053

Calculation

132 \times 0.053 = 6.996 \approx 7 \text{ SMs}

Interpretation

This result indicates that the SupremeRAID™ AE workload is effectively utilizing compute resources equivalent to approximately 7 Streaming Multiprocessors, even though the physical GPU provides 132 SMs in total.

This typically suggests:

The workload is I/O-bound rather than compute-bound
GPU compute headroom remains available for:
- Additional SupremeRAID™ AE workloads
- Other GPU compute or AI workloads
The result is suitable for capacity planning and vGPU / GPU sharing scenarios

Additional Interpretation Notes (Optional but Recommended)

An SM Active Ratio around 5% is common for storage-accelerated workloads
This does not indicate underperformance or misconfiguration
For accurate planning, observe SM Active over:
- Sustained workload duration
- Peak I/O scenarios

Related Articles
[Linux] OS booting got the error message after GPU DMA allocated
Environment RAID Model: All Supreme RAID models Host Hardware: AMD/Intel Operating System: Linux SupremeRAID Driver: 1.3.x and later versions Description A known issue exists with the NVIDIA driver in older kernel versions, such as Ubuntu 20.04. ...
[Linux] Black Screen After Reboot When Installing NVIDIA Driver on Rocky Linux 10 / RHEL (Wayland Enabled)
Environment Operating System Rocky Linux 10 RHEL 9.4+ / 10 Other RHEL-based distributions with Wayland enabled by default GPU NVIDIA GPUs Display Server Wayland (default) GDM display manager Issue After installing the NVIDIA driver (either directly ...
[Linux] How to Resolve Graid Driver Failure After NVIDIA Upgrade#
Environment RAID Model: SupremeRAID™ SR1000 / SR1010 / SR1001 Host Hardware: all server (x86, Intel/AMD platform) Operating System: Linux SupremeRAID™ Version: all NVIDIA Driver : 570.124.04 (CUDA 12.8) [Default version] NVIDIA Driver : 580.65.06 ...
[Linux] Dell PowerEdge Servers Run Fans at Maximum Speed Due to PCIe Thermal Policy
Environment Item Details Server Platform Dell PowerEdge XE7745 RAID Card SupremeRAID™ SR1001, SR1000, SR1010 GPU NVIDIA T400, NVIDIA T1000, NVIDIA A2000 Management Interface Dell iDRAC Operating System Linux (distribution independent) Issue / Symptom ...
[Linux][2.0] Physical Drive Becomes Missing After Drive Group Creation on Intel P4510 or other Gen3 SSDs
Environment RAID Model: SupremeRAID SR1000 / SR1010 (and related models) Operating System: Linux (RHEL / Rocky / Alma / Ubuntu supported distributions) SupremeRAID Driver Version: Linux Driver 2.0 SSD Model: Intel P4510 Other PCIe Gen3 NVMe SSD Issue ...

How to Calculate Effective SM Usage for SupremeRAID™ AE

How to Calculate Effective SM Usage for SupremeRAID™ AE

Summary

Environment

Prerequisites

Procedure

Step 1 – Install Required Tools

Step 2 – Enable NVIDIA DCGM Service

Step 3 – Monitor SM Active Ratio

Step 4 – Retrieve Total SM Count

Parameter Explanation

Identify Platform and Device Index

Example Output

Example Commands

Step 5 – Calculate Effective SM Usage

Calculation Example

Calculation

Interpretation

Additional Interpretation Notes (Optional but Recommended)

Related Articles

[Linux] OS booting got the error message after GPU DMA allocated

[Linux] Black Screen After Reboot When Installing NVIDIA Driver on Rocky Linux 10 / RHEL (Wayland Enabled)

[Linux] How to Resolve Graid Driver Failure After NVIDIA Upgrade#

[Linux] Dell PowerEdge Servers Run Fans at Maximum Speed Due to PCIe Thermal Policy

[Linux][2.0] Physical Drive Becomes Missing After Drive Group Creation on Intel P4510 or other Gen3 SSDs