{"id":1849,"date":"2025-07-14T10:33:51","date_gmt":"2025-07-14T07:33:51","guid":{"rendered":"https:\/\/upcloud.com\/global\/us\/resources\/tutorials\/running-llms-on-upcloud-gpus-with-ollama\/"},"modified":"2026-02-17T09:35:32","modified_gmt":"2026-02-17T09:35:32","slug":"running-llms-on-upcloud-gpus-with-ollama","status":"publish","type":"tutorial","link":"https:\/\/upcloud.com\/global\/resources\/tutorials\/running-llms-on-upcloud-gpus-with-ollama\/","title":{"rendered":"Running LLMs on UpCloud GPUs with Ollama"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">This tutorial will guide you through setting up Ollama and running the Mistral-7b model on your UpCloud GPU instance. We&#8217;ll focus on a straightforward approach to get you up and running in no time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Background: Sovereignty, Openness, and AI<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">&#8220;Open weights&#8221; in the context of AI refers to large language models (LLMs) where the trained model parameters (the &#8220;weights&#8221;) are publicly released and accessible. This is distinct from &#8220;open source&#8221; in software, where the source code is open. While the training code for an LLM might be proprietary, open weights allow researchers, developers, and businesses to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Innovate Freely:<\/strong> Build upon, fine-tune, and adapt models for specific use cases without vendor lock-in.<\/li>\n\n\n\n<li><strong>Promote Transparency:<\/strong> Enable inspection and understanding of how models behave, contributing to explainable AI and responsible development.<\/li>\n\n\n\n<li><strong>Foster Competition:<\/strong> Democratize access to powerful AI capabilities, preventing a few large corporations from monopolizing the field.<\/li>\n\n\n\n<li><strong>Ensure Data Privacy:<\/strong> Run models locally or on private cloud infrastructure without sending sensitive data to external APIs.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Mistral AI, a prominent European AI company, has been a key player in advocating for and releasing models with open weights, such as the Mistral 7B and Mixtral models. This approach empowers a broader community to utilize and improve AI technologies.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Why This Matters for Your UpCloud GPU Instance<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">By combining an UpCloud GPU instance with open-weights models like Mistral run via Ollama, you gain:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control and Privacy:<\/strong> You retain full control over your data and the inference process, aligning with data sovereignty requirements.<\/li>\n\n\n\n<li><strong>Cost-Efficiency:<\/strong> Running models on your own infrastructure can be more cost-effective for sustained use compared to API-based services.<\/li>\n\n\n\n<li><strong>Customization:<\/strong> The ability to fine-tune and adapt open-weights models to your specific needs is significantly enhanced.<\/li>\n\n\n\n<li><strong>Performance:<\/strong> Direct access to dedicated GPU resources on your instance provides optimal inference performance<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Step-by-Step Tutorial:<\/strong><\/h2>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>1.<\/strong> <strong>Deploy an UpCloud GPU instance<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Log in to <a href=\"https:\/\/hub.upcloud.com\">https:\/\/hub.upcloud.com<\/a> and select GPU servers from the left-side panel, then click \u201cDeploy server\u201d from the right side.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/upcloud.com\/media\/gpu1-1024x215.png\" alt=\"-\" class=\"wp-image-56837\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Select a GPU plan that fits your use case. You can choose up to 3 GPUs per Cloud Server, up to 20 cores and 256 GB of RAM.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/upcloud.com\/media\/gpu2-1024x758.png\" alt=\"-\" class=\"wp-image-56838\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Select <strong>Ubuntu 24.04 (with NVIDIA drivers &amp; CUDA) <\/strong>as the Operating System. This template has built-in capabilities to immediately get started by running GPU workloads.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Remember to add your SSH keys in the process, so you can access the server in the next step.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, press deploy!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>2. Connect to Your UpCloud Instance via SSH<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Open your terminal and connect to your UpCloud instance using its IP address and your SSH key:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">ssh ubuntu@your_instance_ip_address<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Replace <code>your_instance_ip_address<\/code> with the instance public IP address.<br><\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>3. Pull the Ollama Docker Image<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Once logged in, pull the official Ollama Docker image. This is a popular inference runtime for LLMs.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">docker pull ollama\/ollama<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>3. Run the Ollama Container with GPU Support<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Now, run the Ollama container, ensuring it has access to your GPU resources. The <code>--gpus=all<\/code> flag is crucial for this. We&#8217;ll map a local port <code>11434<\/code> (Ollama&#8217;s default port) to your host.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">docker run -d --gpus=all -v ollama:\/root\/.ollama -p 127.0.0.1:11434:11434 --name ollama ollama\/ollama<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">About the parameters:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>-d:<\/code> Runs the container in detached mode (in the background).<\/li>\n\n\n\n<li><code>--gpus=all<\/code>: Provides the container access to all available GPUs.<\/li>\n\n\n\n<li><code>-v ollama:\/root\/.ollama<\/code>: Creates a Docker volume to persist Ollama&#8217;s data (models, etc.) outside the container. This means your downloaded models won&#8217;t be lost if you restart or remove the container.<\/li>\n\n\n\n<li><code>-p 127.0.0.1:11434:11434<\/code>: Maps local port 11434 on your host machine to port 11434 inside the container.<\/li>\n\n\n\n<li><code>--name ollama<\/code>: Assigns a convenient name to your container.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>4. Pull a Mistral Model (e.g., Mistral 7B)<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Now that Ollama is running, you can interact with it to pull models. You&#8217;ll execute commands <em>inside<\/em> the running Ollama container.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To pull the popular Mistral 7B model:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">docker exec -it ollama ollama pull mistral:7b<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The download progress will be shown in your terminal.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>5. Run the Mistral Model and Interact<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Once the model is downloaded, you can start an interactive session with it:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">docker exec -it ollama ollama run mistral:7b<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">You will now be in an interactive prompt where you can type your questions or prompts to the Mistral model. For example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\">&gt;&gt;&gt; If UpCloud were an animal, what animal would it be?\n\nIf UpCloud were an animal, it might be a Gray Wolf (Canis lupus). Just like the wolf is known for its adaptability and pack mentality in the wild, UpCloud demonstrates adaptability in providing high-performance cloud services tailored to its clients' needs, while maintaining a strong community spirit within its user base. Additionally, both UpCloud and wolves are renowned for their intelligence and speed.<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">To exit the interactive session, type <code>\/bye<\/code> and press Enter.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You have successfully set up Ollama and run a Mistral model on your UpCloud GPU instance. This basic setup provides a powerful local inference environment for experimenting with large language models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can now explore other models available on <a href=\"https:\/\/ollama.com\/search\" target=\"_blank\" rel=\"noopener\">Ollama&#8217;s library<\/a> or integrate Ollama with applications via its API. For example, integrate it with <a href=\"https:\/\/docs.openwebui.com\/getting-started\/quick-start\/connect-a-provider\/starting-with-ollama\" target=\"_blank\" rel=\"noopener\">Open WebUI<\/a>.<\/p>\n","protected":false},"author":69,"featured_media":0,"comment_status":"open","ping_status":"closed","template":"","community-category":[244,247],"class_list":["post-1849","tutorial","type-tutorial","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/upcloud.com\/global\/wp-json\/wp\/v2\/tutorial\/1849","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/upcloud.com\/global\/wp-json\/wp\/v2\/tutorial"}],"about":[{"href":"https:\/\/upcloud.com\/global\/wp-json\/wp\/v2\/types\/tutorial"}],"author":[{"embeddable":true,"href":"https:\/\/upcloud.com\/global\/wp-json\/wp\/v2\/users\/69"}],"replies":[{"embeddable":true,"href":"https:\/\/upcloud.com\/global\/wp-json\/wp\/v2\/comments?post=1849"}],"version-history":[{"count":1,"href":"https:\/\/upcloud.com\/global\/wp-json\/wp\/v2\/tutorial\/1849\/revisions"}],"predecessor-version":[{"id":3988,"href":"https:\/\/upcloud.com\/global\/wp-json\/wp\/v2\/tutorial\/1849\/revisions\/3988"}],"wp:attachment":[{"href":"https:\/\/upcloud.com\/global\/wp-json\/wp\/v2\/media?parent=1849"}],"wp:term":[{"taxonomy":"community-category","embeddable":true,"href":"https:\/\/upcloud.com\/global\/wp-json\/wp\/v2\/community-category?post=1849"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}