Resources

Documentation

CTRL K

Guides

Kubernetes

Multi-Instance GPU (MIG) Configuration on NVIDIA B200 and H100

Last updated on: June 1, 2026

Multi-Instance GPU (MIG) allows you to partition an NVIDIA GPU into multiple isolated GPU instances. Each instance behaves like a standalone GPU with its own dedicated memory and compute resources. This provides high utilization and guaranteed Quality of Service (QoS) for smaller AI/ML inference workloads, development, and parallel data processing.

Using the NVIDIA GPU Operator, you can dynamically apply custom MIG partitioning configurations across your Kubernetes cluster by leveraging node labels and ConfigMaps.

Supported GPU Architectures

NVIDIA B200 & H100: Fully support hardware-level MIG partitioning for strict isolation.
NVIDIA L40S & L4: Do not support MIG, but can be partitioned using GPU Time-Slicing. See our dedicated guide for configuring Time-Slicing with NVIDIA GPU's.

Read more about GPU Operator MIG Configuration and

Configuring MIG slices on an NVIDIA B200 or H100 node requires the following steps:

Labeling the target GPU node to specify hardware capability
Deploying the NVIDIA GPU Operator configured for MIG management
Creating a MIG partitioning ConfigMap containing your desired slicing profiles
Labeling the node with your chosen profile to trigger the partitioning mechanism

Prerequisites

kubectl and helm installed
cluster config (kubeconfig)
A node group equipped with MIG supported NVIDIA GPU (node referred to as GPU-NODE in this guide)

1. Create the MIG partitioning profiles

First you need to define how the GPU's physical resources should be carved up. Create a ConfigMap called mig-config inside the gpu-operator namespace.

This example file outlines two modes: A small split layout providing seven 1g.23gb profiles, or a medium split layout providing three 2g.45gb profiles.

You can review the official list of Supported MIG Profiles from the NVIDIA documentation. Profiles support for example mixed slices and per-device rules.

Apply the following manifest to your cluster:

$ kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: mig-config
  namespace: gpu-operator
data:
  config.yaml: |
    version: v1
    mig-configs:
      all-disabled:
        - devices: all
          mig-enabled: false

      b200-split-small:
        - devices: all
          mig-enabled: true
          mig-devices:
            "1g.23gb": 7
      b200-split-medium:
        - devices: all
          mig-enabled: true
          mig-devices:
            "2g.45gb": 3
EOF

2. Upgrade the GPU Operator for MIG Management

The NVIDIA GPU Operator manages the full lifecycle of GPU components in your cluster. Before proceeding, ensure you have completed the base installation by following the NVIDIA Operator Installation Instructions.

Once the base operator is running, execute the following command to upgrade it with the specific parameters required for Multi-Instance GPU (MIG) slicing:

$ helm upgrade gpu-operator nvidia/gpu-operator \
    -n gpu-operator \
    --reuse-values \
    --set mig.strategy=mixed \
    --set migManager.config.name=mig-config

What each flag does

--reuse-values ensures that UpCloud's pre-configured driver and runtime settings are preserved.
--set mig.strategy=mixed allows individual nodes to run a combination of completely different MIG slice dimensions simultaneously, maximizing hardware flexibility.
--set migManager.config.name=mig-config tells the operator's internal MIG manager to watch for a custom configuration map named mig-config, which we will create in the next step.

3. Apply MIG slicing to the Node

Once the configurations are available to the cluster, you can explicitly instruct the GPU operator to apply a desired layout to your physical machine. Label your target node using the configuration key specified inside the ConfigMap.

In this step, we apply the b200-split-small profile to fragment our hardware into 7 isolated micro-instances:

$ kubectl label node GPU-NODE nvidia.com/mig-config=b200-split-small
node/GPU-NODE labeled

The GPU Operator's internal MIG manager will intercept this label change, reset the GPU states on GPU-NODE, and partition the memory boundaries without requiring a full machine reboot.

4. Verify the allocated MIG resources

To verify that your GPU has successfully split into the intended micro-profiles, inspect the capacity metrics reported back to the Kubernetes scheduler by running:

$ kubectl get node GPU-NODE -o jsonpath='{.status.allocatable}' | grep nvidia.com/mig

You should see an allocatable capacity representing seven distinct instances available for scheduling:

... "nvidia.com/mig-1g.23gb": "7" ...

Your applications can now target these independent slices by adding specific resource allocations to their deployment specifications (e.g., limits: nvidia.com/mig-1g.23gb: 1).

Looking ahead: Dynamic Resource Allocation (DRA)

🚀 Coming soon: Dynamic Resource Allocation

As of Kubernetes 1.35, Dynamic Resource Allocation (DRA) has graduated to GA as an advancement beyond traditional static device plugins. DRA allows workflows to request and reconfigure specific hardware resources, such as MIG slices. A dedicated guide for setting up DRA is coming soon! You can read about it Here

Contributed by: Onni Pylvänen

Can't find what you're looking for?

For more help you can contact our awesome 24/7 support team

Contact Support