In today’s AI-powered software landscape, developers increasingly want to run LLMs (Large Language Models) locally for privacy, performance, and offline capabilities. With .NET 9, Microsoft continues to enhance its ecosystem, making integration with local AI models more streamlined. In this blog post, we’ll walk through how to run an LLM locally (using models like LLaMA, Mistral, or TinyLlama) and build .NET 9 clients that consume them.
Prerequisites
To follow along, you should have:
- .NET 9 SDK (Preview or Release Candidate)
- A local LLM runtime (e.g., llama.cpp, Ollama)
- Linux/macOS/WSL2 on Windows
- Basic knowledge of C# and REST APIs
Step 1: Running the LLM Locally
We will use Ollama for its simplicity. Install it:
curl -fsSL https://ollama.com/install.sh | shThen pull and run a model:
ollama pull mistral
ollama run mistralOllama will now run the LLM on http://localhost:11434.
Step 2: Create a .NET 9 Client Project
Create a new console app using the .NET 9 SDK:
dotnet new console -n LlmClient
cd LlmClientUpdate your project to target .NET 9 (edit LlmClient.csproj):
<TargetFramework>net9.0</TargetFramework>Step 3: Define a Data Contract
Ollama uses a simple JSON structure. Here’s an example payload:
{
"model": "mistral",
"prompt": "What is the capital of France?"
}Create the following C# classes:
public class LlmRequest
{
public string Model { get; set; } = "mistral";
public string Prompt { get; set; } = string.Empty;
}
public class LlmResponse
{
public string Response { get; set; } = string.Empty;
}Step 4: Build the HTTP Client Logic
Using HttpClient, we’ll send the request and parse the response:
using System.Net.Http.Json;
var client = new HttpClient { BaseAddress = new Uri("http://localhost:11434") };
var request = new LlmRequest
{
Prompt = "Explain quantum computing in simple terms."
};
var response = await client.PostAsJsonAsync("/api/generate", request);
var json = await response.Content.ReadFromJsonAsync<LlmResponse>();
Console.WriteLine($"AI: {json?.Response}");Step 5: Add Streaming Support (Advanced)
Ollama supports streaming responses using Server-Sent Events. Here’s how to consume them with HttpClient in .NET 9:
using var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/api/generate")
{
Content = JsonContent.Create(request)
};
requestMessage.Headers.Accept.ParseAdd("text/event-stream");
using var response = await client.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead);
using var stream = await response.Content.ReadAsStreamAsync();
using var reader = new StreamReader(stream);
string? line;
while ((line = await reader.ReadLineAsync()) != null)
{
if (line.StartsWith("data: "))
{
var jsonLine = line.Substring(6);
Console.Write(jsonLine); // Or parse JSON fragment
}
}Pros and Cons
✅ Pros
- Privacy: All prompts stay local
- Performance: Low-latency inference without internet
- Cost: No cloud usage fees
- Customizability: You control model choice and deployment
❌ Cons
- Hardware Requirements: Needs RAM and CPU/GPU resources
- Model Size: Can consume several GBs of disk
- Limited Model Selection: Smaller open models may not match GPT-4/Claude performance
Summary
Running LLMs locally and integrating them with modern .NET 9 clients is easier than ever. Whether you want to build a privacy-focused chatbot, a local code assistant, or simply prototype AI features, this architecture offers a great foundation:
- Use Ollama to simplify local LLM serving
- Use
HttpClientandSystem.Net.Http.Jsonfor async interactions - Leverage streaming for better UX
With .NET 9’s improved performance and language features, integrating AI into native apps or microservices is not only possible, but productive.
🧠 Bonus Tip: Wrap your
LlmClientinto a minimal API or a Blazor app to bring AI to your UI!
Stay tuned for follow-up posts on embedding these clients into MAUI, Blazor, and ASP.NET 9 applications.
Views: 26
