AI & LLM

Jan 26, 2024

How to Build a Copilot Using Local Large Language Models with Pieces Client

Learn how Pieces Copilot uses LLLMs and switches between them effortlessly— and how to build a copilot with Pieces Client.

In this article, we’ll walk you through how to change between different Large Language Models (LLMs) such as GPT 3.5/4 that are in the cloud and how to properly download Local Large Language Models like Mistral and Llama2 7B and use them entirely on-device, all through the Pieces OS Client. At the end of this series, you’ll learn how to build a copilot using Open Source by Pieces.

You can download the Pieces Vanilla Typescript Project repo to take a look at the example code.

Prerequisites

We suggest reading the first article in this series if you haven’t already, and you will need to install Pieces OS.

Understanding the Models

When chatting with generative AI tools like ChatGPT or Pieces Copilot, these tools run on a specific large language model (LLM)— it could be Mistral, Llama2, GPT-4, or any of the other models that are introduced to the market nearly daily. Each model brings a new perspective to the LLM environment. With Pieces OS Client, you gain access to many of these models and can switch between them as you want or need to as you move between tasks in your day. If you're wondering how to run an LLM locally, Pieces is definitely the first place you should look.

NOTE: If you plan on cloning the Typescript Project, we recommend you starting with a clean slate and deleting any Local LLMs (LLLM) you may have downloaded in your Pieces Suite. If you do not, you will notice that some of the radio buttons on the Example Copilot page are inactive, and will provide buttons where you can delete each of the models as soon as you enter the application. We currently support local instances of the Mistral 2.5 7B and Llama2 7B parameter models, and plan to add support for other models soon.

Adding Radio Buttons for Model Selection

If we pick up from the previous article in this series, we have our copilot chats added and can get an entire copilot conversation back in the chat. Next, we will start to add visual radio buttons so you can simply select each model. Here are the models that we will be working with:

Mistral CPU (local)
Mistral GPU (local)
Llama2 7B CPU (local)
Llama2 7B GPU (local)
GPT 3.5 (cloud)
GPT 4 (cloud)

You'll notice that these models are either hosted in the cloud or hosted locally on your machine. We can use each of these models in the same location, all by pressing one button. There is one additional step to download the model itself, but we will get to that further in this article.

First, we will add in the radio buttons. You can take the below code snippet and use to add radio buttons into the html of your project if needed, and it is also used as the controls for changing between LLLMs and the Cloud models provided:

llama2-7b-cpu        gpt-4        gpt-35        llama2-7b-gpu        Mistral CPU        Mistral CPU

Save this Snippet

You can see that the form above contains all of the radios, followed by the models’ downloads containers, which hold each of the buttons that are connected to downloadable chat models.

Each of these buttons will represent a single selectable model. As you adjust these values, each model that is selected is set to the active model, EXCEPT for when we first load the page. On first load, we default to the GPT 3.5 model and use it to perform our first request. When you add in these inputs, they are each connected to model download logic that lives in the new ModelProgressController.ts file.

This is where all of the magic lives! The Pieces endpoints handle the heavy lifting with our LLLMs and Cloud LLMs, which will allow us to build and functionally switch between them quickly. Why build your own copilot if it’s only powered by a single LLM?

Setting up the Models for Selection

In the modelsProgressController.ts, we first set a public variable to store our list of models. We can iterate through this list later, as we get the proper enum and value for each model that is returned from the Pieces.ModelsApi:

// first intialize the value here: public models: Promise;
// then get its value inside of the constructor.
private constructor() {
    // can access the model snapshot here and set it to the variable that was just created 
    this.models = new Pieces.ModelsApi().modelsSnapshot();
    this.models.then((models) => {
      this.initSockets(
      // then you can use filter to set the initial value for the models download.
        models.iterable.filter(
          (el) =>
            (el.foundation === Pieces.ModelFoundationEnum.Llama27B || el.foundation === Pieces.ModelFoundationEnum.Mistral7B) &&
            el.unique !== 'llama-2-7b-chat.ggmlv3.q4_K_M'
        )
      );
    });
  }

Save this Snippet

Using the Pieces.ModelsApi().modelsSnapshot(), you can get a list of all the available models. Then, the el.foundation === Pieces.ModelFoundationEnum.Llaama27B in combination with a el.unique is used to filter on the iterable list down to the default model. This is the first time that this enum has been used thus far, but it certainly won’t be the last. Use that in combination with a few variables to start and create our onClick() functions in src/index.ts.

Moving over to the main() function, we can use .getInstance() to retrieve and set our modelsProgressController and our list of models, then create variables to store each model's value. We also set the initial value of the model that is selected to GPT 3.5:

async function main() {
    // copilot stream controller.
    CopilotStreamController.getInstance();
    // get the values we have stored on the controller.
    const modelProgressController = ModelProgressController.getInstance();
    const models = await modelProgressController.models;
    // set all model values that we are going to use in this example.
    const gpt35 = models.iterable.find((model) => model.foundation === ModelFoundationEnum.Gpt35 && !model.name.includes('16k'))!;
    const gpt4 = models.iterable.find((model) => model.foundation === ModelFoundationEnum.Gpt4)!;
    const llama27bcpu = models.iterable.find((model) => model.foundation === ModelFoundationEnum.Llama27B && model.cpu)!;
    const llama27bgpu = models.iterable.find((model) => model.foundation === ModelFoundationEnum.Llama27B && !model.cpu)!;
    // setting and finding the mistral models.
    // CPU model
    const mistralcpu = models.iterable.find((model) =>  model.foundation === ModelFoundationEnum.Mistral7B && model.cpu);
    // GPU model
    const mistralgpu = models.iterable.find((model) =>  model.foundation === ModelFoundationEnum.Mistral7B && !model.cpu);
    // set your default selected model id here for gpt-3.5
    CopilotStreamController.selectedModelId = gpt35.id;
...

Save this Snippet

Then we can start to create each button using a similar pattern. First, we create the button element, setting its onClick function to set the appropriate CopilotStreamController.selectedModelId as each dial is selected. Here we use the Mistral button as an example:

...
// then we set the download mistralCpuButton to pass in  the model specific id that is needed to start the
// download of the model.
downloadMistralCpuButton.onclick = (e) => {
    new ModelApi().modelSpecificModelDownload({model: mistralcpu!.id})
}

Save this Snippet

The selectedModelId is used when any message is sent from the input of the copilot chat and passed in as the model parameter on Pieces.QGPTStreamInput.question.model where the id from the Mistral model has been stored:

// in askQGPT in CopilotStreamController.ts 
const input: Pieces.QGPTStreamInput = {
      question: {
        query,
        relevant: {iterable: []},
        // the updated parameter here:
        model: CopilotStreamController.selectedModelId
      },
    };

Save this Snippet

Then we need to repeat this for each of the radio dials that we are creating - or rather the models that we are attempting to download. We’ll ensure that each button is present and throw an error if it is not found with if (!downloadMistralCpuButton) throw new Error('expected id mistral-cpu-button). This gets more interesting inside of the Mistral CPU and Mistral GPU models since we will download those files locally. Remember since you have Pieces OS downloaded and it is being used as the database and storage location for the models, it is handling storing and providing access to the model via these endpoints.

Once each of the radio buttons is created, the end result will show 6 radio dials that are set up for swapping between each of the models and setting the proper value each time it is selected:

Selecting a cloud or local large language model.

Downloading the Mistral CPU Model

Now we can create the logic for downloading the CPU model when we click a button below the radio dials. This will detect if the model is already downloaded and give the appropriate option to delete it if it has already been downloaded.

Below where the mistralCpuButton is created, we can add a check to see if the CPU model is present in the Client. Because we create the buttons with JavaScript, we can conditionally add the 'Download Mistral CPU' text to the button only when the model is not downloaded. When it is downloaded, we can provide the option to delete.

...
// this checks the model value on the models.itereable to check if downloaded.
if (!mistralCpu?.downloaded) {
        // if the model is not downloaded, then we cannot select the radio button.
        mistralCpuButton.setAttribute('disabled', 'true');
	// create the container for the download button.
        const downloadMistralCpuContainer = document.createElement('div');
        modelDownloadsContainer.appendChild(downloadMistralCpuContainer);
	// create the button and set the appropriate text.
        const downloadMistralCpuButton = document.createElement('button');
        downloadMistralCpuButton.innerText = 'Download Mistral CPU'
        downloadMistralCpuContainer.appendChild(downloadMistralCpuButton);
...

Save this Snippet

After we create the buttons, we can set the specific model that we want to download when we click that button. We can then use the ModelApi().modelSpecificModelDownload() and pass in the corresponding .id value on the Pieces.Model value stored on mistralCpu:

...
downloadMistralCpuButton.onclick = (e) => {
    new ModelApi().modelSpecificModelDownload({model: mistralCpu.id}).then(console.log).catch(console.error)
        }

Save this Snippet

Showing Download Progress

We also want to show our download progress. Let’s create a unique .id value for the const mistralCpuDownloadProgress that we created:

...
const 
mistralCpuDownloadProgress
 = document.createElement('div');
downloadMistralCpuContainer.appendChild(mistralCpuDownloadProgress);
// creates the ID we need here using the unique value.
mistralCpuDownloadProgress.id = `download-progress-${mistralCpu.id}`

Save this Snippet

Over in the WebSocket found in ModelProgressController, you can see how we use the model.id to connect and set up the appropriate WebSocket the corresponding model locations as the download emits values. The event data comes back, the download progress is selected based on the progress ID that we created above, and we can use the values found on the model based on Pieces.Model.name.

Then, using the event data that is emitted—which will either be the event.percentage or event.status depending on if the model download has/has not started—we will share the percentage numbers. You will get a number back for each percentage value here. We use all that to set the downloadProgressElements.innerText value.

Here is all of that together:

private connect(model: Model) {
    // setup the appropriate web socket.
    const ws: WebSocket = new WebSocket(
      `ws://localhost:${1000}/model/${model.id}/download/progress`
    );
    this.sockets[model.id] = ws;
    ws.onmessage = (evt) => {
      const event = Pieces.ModelDownloadProgressFromJSON(JSON.parse(evt.data));
      const downloadProgressElement = document.getElementById(`download-progress-${model.id}`)
      if (!downloadProgressElement) return;
      // setting the inner text of the element.
      downloadProgressElement.innerText = `${model.name} download progress: ${event.percentage ?? event.status}` + (event.percentage ? '%' : '');
...

Save this Snippet

Showing Individual ModelDownload Progress or Status

Now we can set the appropriate status message based on the event.status that is returned. We can compare it to the ModelsDownloadProgressStatusEnum.Initialized value, which would indicate if the button has been pressed on the page but the download has not officially begun yet.

The model download itself can be canceled once it has begun, so once the status is initialized, we will create and append the cancel download button. This button uses the ModelApi().modelSpecificDownloadCancel({ model: model.id }) inside of its onClick function.

If event.status is either ModelsDownloadProgressStatusEnum.Failed or ModelsDownloadProgressStatusEnum.Unknown, we want to remove the button that allows for the model to be canceled. If the event status is ModelDownloadProgressStatusEnum.Completed, then we can refresh the page to ensure that the new download is added and the radio dials are in sync:

...
if (event.status === ModelDownloadProgressStatusEnum.Initialized) {
    // creates the cancel button and sets its ID.
    const cancelDownloadButton = document.createElement('button');
    cancelDownloadButton.id = `cancel-download-button-${model.id}`
    downloadProgressElement.insertAdjacentElement('afterend', cancelDownloadButton);
    cancelDownloadButton.innerText = `Cancel ${model.name} download`;
    // ModelApi().modelSpecificModelDownloadCancel() <-- how to canel a models download
    cancelDownloadButton.onclick = () => {
          new ModelApi().modelSpecificModelDownloadCancel({model: model.id});
   }
} else if (event.status === ModelDownloadProgressStatusEnum.Failed || event.status === ModelDownloadProgressStatusEnum.Unknown || event.status === ModelDownloadProgressStatusEnum.Completed) {
        document.getElementById(`cancel-download-button-${model.id}`)?.remove();
    }
if (event.status === ModelDownloadProgressStatusEnum.Completed) 
	window.location.reload();
}
...

Save this Snippet

The result is something like this:

Interface of building a copilot using local LLMs.

Deleting a Downloaded Model

Now, if a model is already downloaded, we want to make it easy to remove the model. This is useful if we don’t need the model anymore, or if we want to re-download the entire thing. We’ll set up the delete function using the ModelsApi().modelsDeleteSpecificModelCache() with the model ID for each corresponding model back in index.ts to remove it from its downloaded location.

Remember that we are using the mistralCpu model as an example and that examples for each CPU and GPU model are included in the full repo.

...
else {
    const deleteMistralCpuButton = document.createElement('button');
    modelDownloadsContainer.appendChild(deleteMistralCputButton);
    deleteMistralCpuButton.innerText = 'Delete Mistral Cpu';
    // onclick we call modelsDeleteSpecificModelsCache()
    deleteMistralCpuButton.onclick = () => {
       new ModelsApi().modelsDeleteSpecificModelCache({model: mistralCpu.id, modelDeleteCacheInput: {}}).then(() => {window.location.reload()});
        }
    }

Save this Snippet

The model should then be deleted, and the buttons and radio dials will return to normal upon page refresh. This function is built into the deleteMistralCpuButton.onclick function above.

Upon pressing delete, the Llama2 Model should be deleted and the buttons will return to their original state, allowing you to download the models again!

Ready to Build Your Own Copilot?

This guide is a great way to introduce yourself to downloading LLLMs and an easy-to-use environment to get them set up on your machine, observe how they work, and understand how simple it is to observe their status with Pieces OS Client. This way you can build your own copilot for your application, project, or other solutions you need throughout your workflow.

To get the entire project, you can visit the repo and clone it to get started. All of the above code can be found there and used as copy-and-paste examples for whatever use case you may have.

The next article in this series on how to build a copilot will cover setting up context in each conversation to show the different ways you can combine specific models with particular contexts to get more pointed and useful responses.

If you are interested in contributing to Open Source by Pieces, join our community on Discord or check out our OpenSource repository on GitHub for projects and other open-source initiatives. If you haven't checked out the Pieces for Developers Desktop App and Pieces OS before now, go learn about all of the functionality that is available to help you power your workflow and enhance your work as a developer!

Written by

The Pieces Team

How to Build a Copilot Using Local Large Language Models with Pieces Client

…

Recent

Aug 8, 2025

NVIDIA, SLMs, and why small might just be the future of AI (again)

NVIDIA is betting big on Small Language Models (SLMs), and at Pieces, we've been building for this future all along. Learn how nano models, local inference, and smarter AI architecture are reshaping the landscape

Aug 7, 2025

How Pieces MCP and Long-Term Memory are gaining momentum in dev community

Pieces’ MCP and long-term memory are quietly becoming go-to tools for developers who want smarter, privacy-first AI that just works in the background.

Aug 5, 2025

Most people don’t think about tokens, and honestly, should they?

Most people don’t think about tokens, and maybe they shouldn’t. But the way we use AI today has real implications for energy, cost, and the future of software. Here's why local-first AI matters.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.