Running an LLVM Model Locally with .NET 9 Clients: A Practical Guide

In today’s AI-powered software landscape, developers increasingly want to run LLMs (Large Language Models) locally for privacy, performance, and offline capabilities. With .NET 9, Microsoft continues to enhance its ecosystem, making integration with local AI models more streamlined. In this blog post, we’ll walk through how to run an LLM locally (using models like LLaMA, Mistral, or TinyLlama) and build .NET 9 clients that consume them.

Prerequisites

To follow along, you should have:

.NET 9 SDK (Preview or Release Candidate)
A local LLM runtime (e.g., llama.cpp, Ollama)
Linux/macOS/WSL2 on Windows
Basic knowledge of C# and REST APIs

Step 1: Running the LLM Locally

We will use Ollama for its simplicity. Install it:

curl -fsSL https://ollama.com/install.sh | sh

curl -fsSL https://ollama.com/install.sh | sh

Then pull and run a model:

ollama pull mistral
ollama run mistral

ollama pull mistral
ollama run mistral

Ollama will now run the LLM on http://localhost:11434.

Step 2: Create a .NET 9 Client Project

Create a new console app using the .NET 9 SDK:

dotnet new console -n LlmClient
cd LlmClient

dotnet new console -n LlmClient
cd LlmClient

Update your project to target .NET 9 (edit LlmClient.csproj):

<TargetFramework>net9.0</TargetFramework>

<TargetFramework>net9.0</TargetFramework>

Step 3: Define a Data Contract

Ollama uses a simple JSON structure. Here’s an example payload:

{
  "model": "mistral",
  "prompt": "What is the capital of France?"
}

{
  "model": "mistral",
  "prompt": "What is the capital of France?"
}

Create the following C# classes:

public class LlmRequest
{
    public string Model { get; set; } = "mistral";
    public string Prompt { get; set; } = string.Empty;
}

public class LlmResponse
{
    public string Response { get; set; } = string.Empty;
}

public class LlmRequest
{
    public string Model { get; set; } = "mistral";
    public string Prompt { get; set; } = string.Empty;
}

public class LlmResponse
{
    public string Response { get; set; } = string.Empty;
}

Step 4: Build the HTTP Client Logic

Using HttpClient, we’ll send the request and parse the response:

using System.Net.Http.Json;

var client = new HttpClient { BaseAddress = new Uri("http://localhost:11434") };

var request = new LlmRequest
{
    Prompt = "Explain quantum computing in simple terms."
};

var response = await client.PostAsJsonAsync("/api/generate", request);
var json = await response.Content.ReadFromJsonAsync<LlmResponse>();

Console.WriteLine($"AI: {json?.Response}");

using System.Net.Http.Json;

var client = new HttpClient { BaseAddress = new Uri("http://localhost:11434") };

var request = new LlmRequest
{
    Prompt = "Explain quantum computing in simple terms."
};

var response = await client.PostAsJsonAsync("/api/generate", request);
var json = await response.Content.ReadFromJsonAsync<LlmResponse>();

Console.WriteLine($"AI: {json?.Response}");

Step 5: Add Streaming Support (Advanced)

Ollama supports streaming responses using Server-Sent Events. Here’s how to consume them with HttpClient in .NET 9:

using var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/api/generate")
{
    Content = JsonContent.Create(request)
};
requestMessage.Headers.Accept.ParseAdd("text/event-stream");

using var response = await client.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead);
using var stream = await response.Content.ReadAsStreamAsync();

using var reader = new StreamReader(stream);
string? line;
while ((line = await reader.ReadLineAsync()) != null)
{
    if (line.StartsWith("data: "))
    {
        var jsonLine = line.Substring(6);
        Console.Write(jsonLine); // Or parse JSON fragment
    }
}

using var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/api/generate")
{
    Content = JsonContent.Create(request)
};
requestMessage.Headers.Accept.ParseAdd("text/event-stream");

using var response = await client.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead);
using var stream = await response.Content.ReadAsStreamAsync();

using var reader = new StreamReader(stream);
string? line;
while ((line = await reader.ReadLineAsync()) != null)
{
    if (line.StartsWith("data: "))
    {
        var jsonLine = line.Substring(6);
        Console.Write(jsonLine); // Or parse JSON fragment
    }
}

Pros and Cons

✅ Pros

Privacy: All prompts stay local
Performance: Low-latency inference without internet
Cost: No cloud usage fees
Customizability: You control model choice and deployment

❌ Cons

Hardware Requirements: Needs RAM and CPU/GPU resources
Model Size: Can consume several GBs of disk
Limited Model Selection: Smaller open models may not match GPT-4/Claude performance

Summary

Running LLMs locally and integrating them with modern .NET 9 clients is easier than ever. Whether you want to build a privacy-focused chatbot, a local code assistant, or simply prototype AI features, this architecture offers a great foundation:

Use Ollama to simplify local LLM serving
Use HttpClient and System.Net.Http.Json for async interactions
Leverage streaming for better UX

With .NET 9’s improved performance and language features, integrating AI into native apps or microservices is not only possible, but productive.

🧠 Bonus Tip: Wrap your LlmClient into a minimal API or a Blazor app to bring AI to your UI!

Stay tuned for follow-up posts on embedding these clients into MAUI, Blazor, and ASP.NET 9 applications.

Running an LLVM Model Locally with .NET 9 Clients: A Practical Guide

JR IT Services

Running an LLVM Model Locally with .NET 9 Clients: A Practical Guide

Prerequisites

Step 1: Running the LLM Locally

Step 2: Create a .NET 9 Client Project

Step 3: Define a Data Contract

Step 4: Build the HTTP Client Logic

Step 5: Add Streaming Support (Advanced)

Pros and Cons

✅ Pros

❌ Cons

Summary

Johannes Rest

Schreibe einen Kommentar Antwort abbrechen

Prerequisites

Step 1: Running the LLM Locally

Step 2: Create a .NET 9 Client Project

Step 3: Define a Data Contract

Step 4: Build the HTTP Client Logic

Step 5: Add Streaming Support (Advanced)

Pros and Cons

✅ Pros

❌ Cons

Summary

Johannes Rest

Beitragsnavigation

Schreibe einen Kommentar Antwort abbrechen