Using Microsoft.Windows.AI.Generative to run language models on a Copilot+ PC

Using Microsoft.Windows.AI.Generative to run language models on a Copilot+ PC

Introduction

Microsoft is introducing a new set of APIs, aimed for running AI models locally, on a Copilot+ PC. These APIs are part of the Microsoft.Windows.AI namespace and are designed to work with an upcoming version of Windows 11, on Copilot+ hardware.

Let's take a look at how to use Generative AI APIs to run language models locally on a Copilot+ PC.

For the rest of this article, I will assume that you are familiar with C#, and have a basic command line application set up in Visual Studio.

Add the NuGet package

Since we are deeply in bleeding edge territory, we will need a beta version of Windows 11, and special version of the Microsoft.WindowsAppSdk NuGet package. At the time of writing, only the 1.7.250127003-experimental3 version has the right components - even newer, non-experimental versions won't work.

To add the package, right click on your project in Visual Studio, and select "Manage NuGet Packages". In the "Browse" tab, search for "Microsoft.WindowsAppSdk" and select the experimental version. If you don't see this version, make sure "include prerelease" is checked. Click "Install" to add it to your project.

You can verify that the package was added by adding "using Microsoft.Windows.AI.Generative;" to the top of your code file and do a build. If you don't see any errors, you are good to go.

Creating the LanguageModel

To use the language model functionality, we first need to create a LanguageModel instance. This involves checking if the model is available on the system, making it available if necessary, and then creating the model instance.

Here's how you can implement the CreateLanguageModel method:

async Task<LanguageModel> CreateLanguageModel()
{
    Console.Write("Making Language model available...");

    if (!LanguageModel.IsAvailable())
        await LanguageModel.MakeAvailableAsync();
 
    var languageModel = await LanguageModel.CreateAsync();
    Console.WriteLine(" Done.\n\n");
    return languageModel;
}
        

The method is straightforward - it checks if we already have the model available locally, downloads it if needed, and creates a new instance. The whole process happens with just a few lines of code, making it incredibly easy to get started with AI on your Copilot+ PC.

In your main program, you would call this method like this:

LanguageModel languageModel = await CreateLanguageModel();
        

The simplest example

Let's start with the simplest way to generate a response from the language model. For this example, we'll create a method that takes a model instance and a prompt, then returns the generated response.

Here's how you define the prompt and call the method:

var prompt = "Write a simple hello world program in C#. No need for comments or explanation, just the code.";
await GenerateSimpleResponse(languageModel, prompt);
        

And here's the implementation of the GenerateSimpleResponse method:

async Task<string> GenerateSimpleResponse(LanguageModel model, string prompt)
{
    Console.WriteLine("Generating simple response...\n\n");

    // Generate the response
    var result = await model.GenerateResponseAsync(prompt);
    
    // Format and display the response
    var response = result.Response.Replace("\n\n", "\n");
    Console.WriteLine(response);
    
    return response;
}
        

That's it! Simply call the API with your prompt, wait for the magic to happen, and display the result. The API handles all the complexity of running the language model and generating a coherent response. We just do a bit of formatting to make it look nice in the console.

When run with our example prompt, the output looks like this:

using System;
namespace HelloWorld
{
    class Program
    //  Main()
    {
        static void Main()
        {
            Console.WriteLine("Hello World!");
        }
    }
}
        

Streaming the response

While getting the complete response at once is convenient, there are scenarios where you might want to stream the response as it's being generated. This gives your application a more responsive feel, as users can see the response building in real-time rather than waiting for the entire generation to complete.

The API provides a way to receive the response in chunks as they're generated. Here's how to implement it:

async Task<string> GenerateResponseWithProgress(LanguageModel model, string prompt)
{
    Console.WriteLine("Generating response with progress... Press any key to stop.\n\n");

    StringBuilder sb = new();
    var asyncOp = model.GenerateResponseWithProgressAsync(prompt);

    // Set up the progress handler
    asyncOp.Progress = (_, delta) =>
    {
        sb.Append(delta);
        Console.Write(delta);

        // Allow cancellation by pressing any key
        if (Console.KeyAvailable)
            asyncOp.Cancel();
    };

    try
    {
        await asyncOp;
    }
    catch (TaskCanceledException) { }

    return sb.ToString();
}
        

The streaming API is almost as simple as the basic version - most of the "complexity" (if you can even call it that) comes from the cancellation logic we've added to let users stop generation by pressing a key. We need to handle the TaskCanceledException that gets thrown when generation is canceled, but that's just a single try/catch block.

The core of the implementation is still wonderfully straightforward - call the API with your prompt, set up a handler to receive text as it's generated, and display it to the console.

Call this method the same way as the simple version:

await GenerateResponseWithProgress(languageModel, prompt);
        

The result will be displayed incrementally as it's generated, with the same final output as the simple method, but with a more interactive experience for the user.

Statistics and performance

While the current API doesn't provide token counts (a common metric in large language models), we can still measure performance using word counts, character counts, and generation speed. Adding performance tracking to both methods is straightforward with the Stopwatch class:

void CalculateAndDisplayStatistics(string text, Stopwatch sw)
{
    var wordCount = text.Split().Count();
    var charCount = text.Length;

    Console.WriteLine($"\nGenerated {wordCount} words, {charCount} characters in {sw.ElapsedMilliseconds} ms. That is {wordCount / sw.Elapsed.TotalSeconds:F2} word/sec and {charCount / sw.Elapsed.TotalSeconds:F2} char/s.\n");
}
        

To use this in our methods, we simply add a Stopwatch at the beginning and call this utility function at the end:

var timer = Stopwatch.StartNew();
// ... generation code ...
timer.Stop();
CalculateAndDisplayStatistics(response, timer);
        

In our test runs, we see performance results like:

Generated 72 words, 177 characters in 5537 ms. That is 13.00 word/sec and 31.96 char/s.
        

for the simple response method, and similar results for the streaming method. The performance will vary depending on your hardware, especially the NPU capabilities of your Copilot+ PC.

These metrics give you a quick way to see how fast your AI is thinking. Whether you're showing off your fancy new Copilot+ PC to friends or doing serious development work, it's always fun to see the numbers and compare different approaches. I've seen widely varying numbers before, even up to 25 words/sec. These APIs and the underlying OS components are still in development, so I'm excited to see how they will evolve over time.

I hope this article has given you a good introduction to the new Microsoft.Windows.AI.Generative APIsfor running language models on a Copilot+ PC. Let me know what you have built with them!

To view or add a comment, sign in

More articles by András Velvárt

Insights from the community

Others also viewed

Explore topics