AI & LLM

Feb 28, 2025

Off the shelf copilots are dead | Build your own with your data

Off-the-shelf copilots are obsolete! Build a custom AI copilot powered by your own data using Pieces AI for smarter, more personalized automation.

In the tech world, very few people haven't heard the term 'copilot'.

It was a term first coined by GitHub for their AI-powered developer assistant but has now become the term-du-jour for any software tool that is designed to assist you in your day-to-day tasks, from writing code to accessing corporate documents to managing accounts, across the whole spectrum of knowledge work.

Pieces has emerged as a powerful platform for building custom copilots (with Phi-2, by adding context, even adding local context with Pieces client) to enhance business processes and productivity.

So what really is a copilot? And how do you build one?

In this post, I will cover not only what a copilot is, but will show you hands-on how you can build a Star Wars-themed copilot (or any theme of your choice) using the LLM or SLM of your choice.

We'll explore copilot development using Pieces and other tools to create AI-powered capabilities for various business needs.

What is a copilot?

A copilot is an AI assistant (very different from an AI agent) that is designed to ‘sit’ with you, and provide you with help and guidance, leaving you still in control.

The name makes a lot of sense – it’s loosely based on the idea that a plane has a pilot and copilot, with the pilot in charge and the copilot in the next seat to assist. You are still in charge, the copilot is there to provide help and advice, but not take control.

Scott Haselman from Microsoft describes a copilot as:

A very eager intern, in that it is helpful, has some relevant background knowledge, but has limited domain knowledge, and despite making convincing sounding suggestions, it can be wrong, and sometimes very wrong.

This is why you are in the pilot seat — your role is to gather information from the copilot and other sources, including your own skills and experience, then act on it.

The UI for a copilot is a conversation. You interact with the copilot using human language, adding multi-modal information such as images or videos, then get a response back typically as a textual chat, but also as images, videos, audio, or more.

This text can be plain text in the language of your choice, or rich text with features like code blocks if your copilot is focused on software development.

The first AI copilot was focused on software engineering, similar to Pieces at which first introduced its own copilot with its variations for streaming, or for specific IDEs like Jetbrains, but there are now copilots for all knowledge workers.

For example, Microsoft 365 copilot is a copilot that not only helps with general Microsoft Office tasks (“it looks like you are writing a letter” 📎) but can also access your corporate knowledge from Sharepoint and other sources and share this in your copilot conversations.Custom copilots built with Microsoft Copilot Studio can be tailored for specific business processes like Human Resources, Customer Support, or IT Service Management.

What are the components of a copilot?

There are 3 main components of a copilot – the AI, the UI, and additional context.

AI using an LLM or SLM

The core brain behind the copilot is an AI. Copilots are natural language tools, so typically the core AI component is a generative AI language model.

These can be large language models, or LLMs, such as OpenAIs GPT-4o or Google Gemini, or small language models (SLMs), like Llama from Meta or Phi from Microsoft.

Model choice

Model choice can be important – the different models each have different strengths and weaknesses.

If you are using a model in the cloud, is it the optimal model for both performance and price? If you are using a SLM running on the user's device, is it powerful enough for your needs, whilst also being small enough to run on whatever hardware your user has?

💡 Ideally, you want to use an abstraction layer so that you can quickly switch out the model, either during the phases of development, or allow the user to switch at run time based on their needs.

AI abstraction layers

AI abstraction layers provide a common API surface across a range of LLMs and SLMs. The goal with these is to allow you to relatively quickly change from one model to another. There are a range of abstractions, each with different capabilities. Some examples are:

Pieces – Yes, Pieces has an API and SDKs for C# and Python. Just as you can switch the model you use in the Pieces copilot, you can build your own copilot using the Pieces API or SDKs, and switch the model with one line of code. Pieces supports both cloud LLMs and on-device SLMs.
Ollama – Ollama is a powerful abstraction and management layer for local models running on your device. You can use Ollama to download and run a huge range of models, and it is Ollama that Pieces uses under the hood to interact with on-device SLMs.
Vercel AI SDK – The Vercel AI SDK is a TypeScript SDK that allows you to build LLM-based web applications, with a single API for a huge range of LLMs.

By having an identical API surface, you only need to write the core code once to interact with the abstraction layer. You can then customize which LLM or SLM your app uses by managing this choice in configuration.

Chat-oriented user interface

Satya Nadella, the CEO of Microsoft, loves to say “The chat is the UI”, and the chat interface is an important part of building a copilot. This UI needs to have a way for you to not only provide your side of the chat, either as text, or by providing a multi-modal interface to allow things like dropping in images, but also show the output of the AI.

All LLMs and SLMs can take input, and provide output using markdown, a simple text-based format with formatting provided using special characters.

As you think about the chat UI, you may need a way to add markdown features to the input and render these from the output.

Taking a simple example, if you are dealing with code, you would indicate this with 3 backticks to show the start and end of a code block:

```
print("Hello world")
```

You may need a way to easily set this in the UI when entering code (for example, the Pieces copilot has an “Insert code block” button. You may also need to render this in the chat output.

As you think about your user interface, consider the different types of input and output you will have, and what is the best way to render these.

Context

The LLM only knows what it knows. This may sound like an obvious statement, but the reality is that this limits the usefulness of an LLM.

For example, GPT-4o was trained on most of the internet in October 2023.

This is great if you need your copilot to be able to answer questions on information that was on the internet back then, but not so helpful for more up-to-date information, or information that is not on the internet, such as your codebase, or corporate documentation.

What is context?

You can pass extra information to the LLM, and the information passed to the LLM is called context.

This can be documents, code files, images, and more. If you are a regular Pieces user, then you probably have used the Pieces Long-Term Memory, and this is another type of context that can be passed to the LLM.

You need to consider how to store and access any context that is relevant to your copilot and make this information available to the LLM in your copilot chats.

📌 Context is important to ensure that your copilot is providing the right answers.

If you have a copilot that answers questions on your internal documentation, there is no way an LLM trained on the public internet will have access to this information, so you have to provide this as relevant context.

Context windows

Each model has a context window, a context size limit – this is the maximum amount of context that can be passed to the model in each call. You need to know this when passing context to the model.

For example, if you have a small model with a 4k token size, you can’t pass in an entire company's documentation archive and expect it to work. Context size is measured in tokens, representing whole words or parts of words.

Model context sizes vary, and as models get more powerful, these are getting larger.

32,000 or 128,000 token context windows are not unheard of, with the online versions of Google’s Gemini supporting up to 2 million tokens.

The downside to using larger context windows is that it increases the memory requirements for on-device SLMs, and the cost for cloud LLMs, which usually charge per token (or more accurately per million tokens).

The way to get around these context limits is a technique called Retrieval Augmented Generation, or RAG.

RAG

RAG is the process of retrieving information relevant to a query, then using this to augment the generation of a response by sending this information to the LLM.

Essentially, it’s a smart lookup across vast amounts of data to pull out just the information that is needed for a particular prompt, and this can then be sent in the context of the conversation.

For example, imagine you are building a copilot for call center workers for a retail business. If someone phones up about an order, you need to get insight into not only the details of the order, item, quantity, and so on but also any previous conversations that have happened around this order.

If you ask the copilot to “Summarize order #66”, then the AI needs to be sent information about the order with Id 66 to come up with a response.

You could send all the order information for every order to the AI, but this could exceed the context window, and provide potentially too much information for the AI to generate an accurate response.

💡 Instead, you could use RAG to detect that this is order number 66, then retrieve the relevant order details for just that order, then send that to the AI.

This way you have a smaller amount of context and a higher potential accuracy in the response.

Let’s build a copilot with Pieces!

Now we’ve been through some of the theory, let’s get hands-on! For the rest of this post I’ll be using C# with the Pieces C# SDK.

If you want a video version of this, check out my session at .NET Conf 2024 ⬇️

For the sake of simplicity, we will be building a simple console-based copilot, using the command line as the UI.

This means you won’t get markdown rendered correctly, instead, you will just get the raw textual representation, but this is enough to get started. Consider this a future exercise for the reader to improve the UI!

Create the project

As this is a .NET project using C#, you will need the .NET SDK installed. This project is using features added in .NET 9, so you will need the .NET 9 SDK, along with an IDE of your choice such as Visual Studio, VS Code, or Jetbrains Rider – all of which have Pieces extensions available so you can leverage Pieces to help as you build your copilot, such as asking the Long-Term Memory to recall sections from this post.

Create a folder called StarWarsCopilot, and from inside that folder in your terminal or command prompt create a new .NET console app:

dotnet new console

Open the project in your IDE. It will be a basic “Hello World” application.

Once you have your project, you will need to install the Pieces nuget package. Run the following command:

dotnet add package Pieces.Extensions.AI --prerelease

This package is a pre-release as it depends on the Microsoft.Extensions.AI package which is also currently a pre-release package.

Interact with the LLM through Pieces

You can interact with an LLM using the IChatClient interface. To your Program.cs file, remove the existing code and add the following:

using Microsoft.Extensions.AI;
using Pieces.Extensions.AI;
using Pieces.OS.Client;
// Create a connection to PiecesOS
PiecesClient client = new();
// Create a chat client
IChatClient chatClient = new PiecesChatClient(client, "My Star Wars copilot");

You’ll need to ensure PiecesOS is running for this connection to be made.

Now you can test it out with a simple completion. Add the following code:

// Create a list of chat messages
List<ChatMessage> messages = [
   new(ChatRole.User, "What is the story of Darth Plagueis the wise?")
];
// Get the response and display it
var response = await chatClient.GetResponseAsync(messages);
Console.WriteLine(response);

Run this with the command:

dotnet run

You will see an output telling the tale of Darth Plagueis the wise.

➜  StarWarsCopilot dotnet run
The story of Darth Plagueis the Wise is a fictional tale from the "Star Wars" universe, specifically mentioned in "Star Wars: Episode III – Revenge of the Sith." It is recounted by Chancellor Palpatine (Darth Sidious) to Anakin Skywalker as a way to tempt him to the dark side of the Force.
Darth Plagueis was a Sith Lord who was said to be so powerful and wise that he could manipulate the Force to influence the midi-chlorians to create life. He had such mastery over the dark side that he could even prevent those he cared about from dying. However, despite his immense power, he was unable to prevent his own demise. According to the tale, Plagueis was betrayed and killed by his own apprentice while he was asleep. This story is used by Palpatine to suggest to Anakin that the dark side holds the secret to saving his loved ones from death, thus enticing him to join the Sith.
The story of Darth Plagueis serves as a cautionary tale about the pursuit of power and the inevitable betrayal that often accompanies the Sith's quest for dominance.

One upside of Pieces is that conversations are shared between every tool you use, so you can start a conversation in VS Code, and continue it in Visual Studio.

Or for example, start a conversation using code and see it in the Pieces desktop app.

Responses

When you run this code, you will see nothing, then suddenly the response appears. This is different from how you probably normally see a response from an LLM, coming back words or parts of words at a time.

LLMs stream back responses because they generate one token at a time, and you normally see these appearing as they are generated in the output.

The reason it is different is because you used a call that waits till it gets the entire response before returning, just to make the code simpler.

When you work with streaming responses, you make a call to the LLM, and it returns you the response token by token, along with a flag that indicates if there is more data to come, or if the stream is finished.

Typically, the last response that indicates that the stream has finished contains the complete text response.

This means you typically call the streaming endpoint, write out every response as it comes, except the last response marked as the last one.

Replace the call to GetResponseAsyncand the following WriteLine with this code:

// Helper function to run and stream the response
async Task AskQuestionAndStreamAnswer(List<ChatMessage> chatMessages)
{
   // Get each response from the streaming completion
   await foreach (var r in chatClient.GetStreamingResponseAsync(chatMessages))
   {
       // check we are not the final response, if not write the token to the console
       if (r.FinishReason != ChatFinishReason.Stop)
       {
           Console.Write(r.Text);
       }
   }
   Console.WriteLine();
}
await AskQuestionAndStreamAnswer(messages);

This code wraps up the call to get a streaming response in a helper function, adding each response token to the console output as it is returned.

If you run this code now, you will see the response stream back token by token.

Get a conversation going

So far we have a hardcoded prompt streaming a response.

For a copilot to be useful, we need to have an interactive tool where we can ask a question, get a response, then ask another question.

For simplicity's sake, we’ll do that using a simple console interface.

Delete the hardcoded message from the messages list:

List<ChatMessage> messages = [];

Replace the call to AskQuestionAndStreamAnswer with some basic code to get text from the console and send it to the LLM:

string? prompt;
while (!string.IsNullOrWhiteSpace(prompt = Console.ReadLine()))
{
   // Add the user message to the list of messages
   messages.Add(new(ChatRole.User, prompt));
   // Ask the question and stream the response
   await AskQuestionAndStreamAnswer(messages);
}

This will read from the console, add whatever you type to the messages list, then send this to the LLM and stream back the response. If you press return on an empty console, the program will end.

If you test this out, you will be able to get a conversation going. However, you may notice some interesting things.

Firstly, in the code, you probably saw the messages being added to a list with a role of User, and the entire list being sent to the LLM. This means the full history is being sent for every call

Secondly, as you chat with the LLM, you may notice that the responses are based on the questions you ask, but not the response from the LLM.

For example, if you asked:

Who was Darth Palgueis the wise?

The answer might be:

Darth Plagueis the Wise is a fictional character from the "Star Wars" universe. He is mentioned in "Star Wars: Episode III – Revenge of the Sith" by Chancellor Palpatine (Darth Sidious) in a conversation with Anakin Skywalker. According to the story, Darth Plagueis was a Sith Lord who was so powerful and knowledgeable in the dark side of the Force that he could influence the midi-chlorians to create life and even prevent death. However, he was ultimately betrayed and killed by his own apprentice, who is implied to be Palpatine himself. The tale of Darth Plagueis serves as a cautionary story about the pursuit of power and the inevitable betrayal that often accompanies it.

If you then ask:

Who are all the characters you mentioned in your last reply?

The response will be something like:

It seems there was a misunderstanding. In the previous interaction, you asked about "Darth Plagueis the Wise," but I didn't provide a response or mention any characters. Darth Plagueis is a character from the Star Wars universe, known for being a Sith Lord who was mentioned in "Star Wars: Episode III – Revenge of the Sith." If you have any specific questions about him or other characters, feel free to ask!

The LLM knows you asked about Darth Plagueis, but has ‘forgotten’ the characters in its response. So what is going on here?

Chat messages and the conversation history

We’ve all got used to having a 2-way conversation with an LLM. We ask a question, we get a response, we ask a follow-up question. It seems so far, our copilot doesn’t quite do this.

❔ So what are these User messages, and how can we fix the weird one-sided conversation?

The reason for this is LLMs are stateless. When we have a conversation with an LLM, we are not having a human-like conversation where each side remembers both what was said and their response.

Instead, every interaction with the LLM is like a new, fresh conversation.

You have to tell the LLM every time what the conversation history was for it to generate a prompt based on that history (my favourite 5 prompts are these ones, even when Pieces is running offline).

Prompt evaluation is not less important.

Anyway, we often don’t realize this as it is abstracted away from us by copilots, or chat interfaces like ChatGPT (btw, you can also use ChatGPT with Pieces).

📌 When you send a question to the LLM, you need to send a list of questions, and the responses from the LLM, and with each one tell the LLM where they came from – did they come from the user, or as a response from the LLM. T

hese are defined using the ChatRole enum.

We’re already adding all the prompts from the user as user messages – messages.Add(new(ChatRole.User, prompt)). What’s missing is the responses from the LLM.

These need to be added as Assistant messages, and we can do that when we receive the last streaming response from the LLM.

Change the contents of the AskQuestionAndStreamAnswer function to this:

// Get each response from the streaming completion
await foreach (var r in chatClient.GetStreamingResponseAsync(chatMessages))
{
   // check we are not the final response, if not write the token to the console
   if (r.FinishReason != ChatFinishReason.Stop)
   {
       Console.Write(r.Text);
   }
   else
   {
       chatMessages.Add(new(ChatRole.Assistant, r.Text));
   }
}

This code saves the final complete response from the LLM into the chat messages collection, so that it is sent on every call.

👉 Give this a spin, and you will now be able to have a complete 2-sided conversation.

Set a system prompt

Your messages currently have 2 roles – User and Assistant. User for the questions you ask, assistant for the responses from the LLM. There is a third type of message, a System prompt.

This is a special prompt that is only sent once at the start of the conversation and provides the LLM with guidance to use when responding.

To make your copilot into a Star Wars copilot, change the code that creates the messages collection to the following:

List<ChatMessage> messages = [
   new(ChatRole.System, "You are a helpful developer assistant called Yoda. You will always answer in the style of Yoda from Star Wars.")
];

Now when you run this, the answers, like Yoda, they will sound. Fun, this copilot is.

System prompts are used to set context, tone, boundaries, and other rules for the LLM. They provide guidance on how to respond to tailor the responses to the users' needs.

For example, if you are building a copilot for developers, your system prompt may contain something like “You are a senior developer with 15 years experience, skilled in software engineering, architecture, testing, and accessibility”.

These system prompts can be very powerful as you build copilots for different use cases. By defining the audience, the responses can be tailored to the needs of your users.

If you are building a copilot to help 4th graders program in Scratch, you would want different responses than a copilot for experienced professional developers, and you can define this in the system prompt.

“You are an assistant for 4th graders who are learning to code in Scratch, ensure all responses are appropriate for 4th graders and focus only on the Scratch block-based programming tool” would be an ideal system prompt.

There are many things you can do in the system prompt, like defining that you only ever want data as JSON in a specific format, or asking for code only responses with no other text, or you can have some fun!

Obviously feel free to experiment with the system prompt, and change to match the Jedi (or Sith lord) of your choice.

Add context

The system prompt has another use – adding context to the conversation.

When you use the Pieces Long-Term Memory, or add a folder of code, snippet, or code file, the relevant context is added to the system prompt, either completely, or using RAG to extract just the bits that are relevant.

Depending on the type of context you want to add, there are different ways to add it.

For example, if you want your copilot chat to be based on just the script of Return of the Jedi, you can add all of this to the system prompt, assuming you are using a model with a large enough context window.

If you are adding a large code base, you can create a RAG system to extract just the code you need or leverage one built into whatever abstraction layer you are using over LLMs.

You need to balance providing the right amount of context. Too little and you may miss out on important information. Too much and you can increase cost, decrease speed, and even confuse the LLM.

For example, thinking about adding the script of Return of the Jedi as context, if you wanted to ask the LLM “Who destroyed the Death Star?” and get answers about Return of the Jedi, you would need to provide enough of the Return of the Jedi script to include the scene where Lando and Wedge fly into the superstructure and blow it up – without this part, the LLM may not have the answer.

If you go the other way and provide too much information, such as the scripts for all the Star Wars Skywalker saga movies, then the LLM would be confused as there are now 2 answers – Luke in Episode 4, and Lando and Wedge in Episode 6.

Adding the entire context to the system prompt

Adding the entire context to the system prompt is the quickest and easiest way to add context in your copilot.

For example, you can add the script of Return of the Jedi by changing the creation of the list of messages and the system prompt to this:

// Load the script of Return of the Jedi
string script = await new HttpClient()
       .GetStringAsync("https://raw.githubusercontent.com/jimbobbennett/star-wars-copilot/refs/heads/main/return-of-the-jedi-script.txt");
// Create a list of chat messages
List<ChatMessage> messages = [
   new(ChatRole.System,
   $

You are a helpful assistant called Yoda. You will always answer in the style of Yoda from Star Wars. A

nswer all questions using the script of the movie Return of the Jedi which is below between blocks marked with three backticks.

   ```
   {script}
   ```
   """)
];

This loads the script and then injects it into the system prompt. Now you can ask questions about Return of the Jedi, and the copilot has the context.

For example, if you ask “Who blew up the Death Star”, the copilot knows you are referring to the second Death Star, and will respond that Lando Calrissian and Wedge Antilles destroyed it, as opposed to Luke who destroyed the first one.

The downside to using the system prompt to send all the context is that you are limited by the context window size and the potential cost.

The default model in the Pieces C# SDK is Google Gemini, which has a very large context window size, but if you switched to a smaller model, like the variant of Phi-3 running locally with a 4k context window size, the response would be garbage as the context window would be too small to get the entire system prompt, let alone the user prompts.

If you are using a cloud-based LLM, you are probably paying by the token, and sending tokens that are not needed will cost you more. Sending all the context in the system prompt only makes sense if you know you need everything.

If you need something smarter to extract just what is needed for your prompt, then consider a RAG solution.

Adding context using RAG

Using RAG you can extract information from the user prompt and use this to determine what context you need to add, and this then gets injected into the system prompt, reducing the amount of context that you provide. There are many tools and techniques for doing this, with vector databases a popular tool.

One advantage of using an abstraction layer over the LLM is that a lot of these have RAG capabilities built in.

For example, you may have used the Pieces Long-Term memory, or added files or folders of code as context to your copilot chats. You can also do this from the Pieces SDK.

Rather than cover how to implement RAG yourself in this post, we will use the capabilities of the Pieces SDK to add the Long-Term Memory as context and use the internal RAG system to extract relevant information based on your prompt.

Start by reverting the system prompt:

List<ChatMessage> messages = [
   new(ChatRole.System, "You are a helpful developer assistant called Yoda. You will always answer in the style of Yoda from Star Wars.")
];

The Pieces Long-Term Memory can be configured in some options that are passed to the GetStreamingResponseAsync call:

ChatOptions options = new()
   {
       // Set the maximum number of responses to 1
       AdditionalProperties = new AdditionalPropertiesDictionary
       {
           { PiecesChatClient.LongTermMemoryPropertyName, true },
           { PiecesChatClient.LongTermMemoryTimeSpanPropertyName, TimeSpan.FromHours(1) },
       }
   };
   // Get each response from the streaming completion
   await foreach (var r in chatClient.GetStreamingResponseAsync(chatMessages, options))

This tells Pieces to access the Long-Term Memory for the past 1 hour, use RAG to extract what is relevant based on the user's prompt, then internally add this to the system prompt that gets sent.

This is in addition to the system prompt that is set in code, so the response will still be in the style of Yoda.

Give this a try by asking something like “Summarize the now witness the power of this fully coded and operational copilot blog post I was just reading”.

Internally Pieces will query the Long-Term memory vector database, find the relevant details about this post, and send that to the LLM.

Next steps

In this post so far you have built a Star Wars-themed copilot in 50 or so lines of code, using Pieces as an abstraction layer over the LLM.

You’ve seen how LLMs are stateless, so you need to send the entire chat history every time, and learned how to guide the model using a system prompt. Finally, you learned about ways to add extra context to help the LLM give you a relevant answer.

Now go and build your own copilot with Pieces! I’m excited to see what you can create.

We’ve already seen a Star Trek copilot, so where will your imagination take you? Please share what you build with us at Pieces on X, Bluesky, LinkedIn, or our Discord.

Written by

Jim Bennett

Off the shelf copilots are dead | Build your own with your data

...

Build with Pieces for free

Recent

Judson Bonneville on writing documentation at Pieces

Jul 22, 2025

How I write documentation at Pieces

Learn about a real-world use case for using AI tools to write production documentation from soup to nuts: voice-to-text, thought-process checks, and assisted structuring all the way to a finished piece of effective, thoughtful technical writing

Jul 21, 2025

The rise of on-device AI and the return of data ownership

Discover how on-device AI is reshaping the tech landscape by prioritizing privacy, speed, and user control, marking a powerful shift toward true data ownership and away from cloud dependency.

Jul 11, 2025

A different perspective on prompt evaluation

Learn what prompt evaluation is, why it matters in AI development, and how to systematically assess prompt quality to improve performance, accuracy, and reliability across use cases

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.