- NVIDIA B200 & H100: Fully support hardware-level MIG partitioning for strict isolation.
- NVIDIA L40S & L4: Do not support MIG, but can be partitioned using GPU Time-Slicing. See our dedicated guide for configuring Time-Slicing with NVIDIA GPU's.
Multi-Instance GPU (MIG) Configuration on NVIDIA B200 and H100
Multi-Instance GPU (MIG) allows you to partition an NVIDIA GPU into multiple isolated GPU instances. Each instance behaves like a standalone GPU with its own dedicated memory and compute resources. This provides high utilization and guaranteed Quality of Service (QoS) for smaller AI/ML inference workloads, development, and parallel data processing.
Using the NVIDIA GPU Operator, you can dynamically apply custom MIG partitioning configurations across your Kubernetes cluster by leveraging node labels and ConfigMaps.
Read more about GPU Operator MIG Configuration and
Configuring MIG slices on an NVIDIA B200 or H100 node requires the following steps:
- Labeling the target GPU node to specify hardware capability
- Deploying the NVIDIA GPU Operator configured for MIG management
- Creating a MIG partitioning ConfigMap containing your desired slicing profiles
- Labeling the node with your chosen profile to trigger the partitioning mechanism
Prerequisites
kubectlandhelminstalled- cluster config (kubeconfig)
- A node group equipped with MIG supported NVIDIA GPU (node referred to as
GPU-NODEin this guide)
1. Create the MIG partitioning profiles
First you need to define how the GPU's physical resources should be carved up. Create a ConfigMap called mig-config inside the gpu-operator namespace.
This example file outlines two modes: A small split layout providing seven 1g.23gb profiles, or a medium split layout providing three 2g.45gb profiles.
You can review the official list of Supported MIG Profiles from the NVIDIA documentation. Profiles support for example mixed slices and per-device rules.
Apply the following manifest to your cluster:
$ kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: mig-config
namespace: gpu-operator
data:
config.yaml: |
version: v1
mig-configs:
all-disabled:
- devices: all
mig-enabled: false
b200-split-small:
- devices: all
mig-enabled: true
mig-devices:
"1g.23gb": 7
b200-split-medium:
- devices: all
mig-enabled: true
mig-devices:
"2g.45gb": 3
EOF2. Upgrade the GPU Operator for MIG Management
The NVIDIA GPU Operator manages the full lifecycle of GPU components in your cluster. Before proceeding, ensure you have completed the base installation by following the NVIDIA Operator Installation Instructions.
Once the base operator is running, execute the following command to upgrade it with the specific parameters required for Multi-Instance GPU (MIG) slicing:
$ helm upgrade gpu-operator nvidia/gpu-operator \
-n gpu-operator \
--reuse-values \
--set mig.strategy=mixed \
--set migManager.config.name=mig-config--reuse-valuesensures that UpCloud's pre-configured driver and runtime settings are preserved.--set mig.strategy=mixedallows individual nodes to run a combination of completely different MIG slice dimensions simultaneously, maximizing hardware flexibility.--set migManager.config.name=mig-configtells the operator's internal MIG manager to watch for a custom configuration map namedmig-config, which we will create in the next step.
3. Apply MIG slicing to the Node
Once the configurations are available to the cluster, you can explicitly instruct the GPU operator to apply a desired layout to your physical machine. Label your target node using the configuration key specified inside the ConfigMap.
In this step, we apply the b200-split-small profile to fragment our hardware into 7 isolated micro-instances:
$ kubectl label node GPU-NODE nvidia.com/mig-config=b200-split-small
node/GPU-NODE labeledThe GPU Operator's internal MIG manager will intercept this label change, reset the GPU states on GPU-NODE, and partition the memory boundaries without requiring a full machine reboot.
4. Verify the allocated MIG resources
To verify that your GPU has successfully split into the intended micro-profiles, inspect the capacity metrics reported back to the Kubernetes scheduler by running:
$ kubectl get node GPU-NODE -o jsonpath='{.status.allocatable}' | grep nvidia.com/migYou should see an allocatable capacity representing seven distinct instances available for scheduling:
... "nvidia.com/mig-1g.23gb": "7" ...Your applications can now target these independent slices by adding specific resource allocations to their deployment specifications (e.g., limits: nvidia.com/mig-1g.23gb: 1).
Looking ahead: Dynamic Resource Allocation (DRA)
As of Kubernetes 1.35, Dynamic Resource Allocation (DRA) has graduated to GA as an advancement beyond traditional static device plugins. DRA allows workflows to request and reconfigure specific hardware resources, such as MIG slices. A dedicated guide for setting up DRA is coming soon! You can read about it Here
