Back
How to Build a Copilot Using Local Large Language Models with Pieces Client
Learn how Pieces Copilot uses LLLMs and switches between them effortlessly— and how to build a copilot with Pieces Client.
In this article, we’ll walk you through how to change between different Large Language Models (LLMs) such as GPT 3.5/4 that are in the cloud and how to properly download Local Large Language Models like Mistral and Llama2 7B and use them entirely on-device, all through the Pieces OS Client. At the end of this series, you’ll learn how to build a copilot using Open Source by Pieces.
You can download the Pieces Vanilla Typescript Project repo to take a look at the example code.
Prerequisites
We suggest reading the first article in this series if you haven’t already, and you will need to install Pieces OS.
Understanding the Models
When chatting with generative AI tools like ChatGPT or Pieces Copilot, these tools run on a specific large language model (LLM)— it could be Mistral, Llama2, GPT-4, or any of the other models that are introduced to the market nearly daily. Each model brings a new perspective to the LLM environment. With Pieces OS Client, you gain access to many of these models and can switch between them as you want or need to as you move between tasks in your day. If you're wondering how to run an LLM locally, Pieces is definitely the first place you should look.
NOTE: If you plan on cloning the Typescript Project, we recommend you starting with a clean slate and deleting any Local LLMs (LLLM) you may have downloaded in your Pieces Suite. If you do not, you will notice that some of the radio buttons on the Example Copilot page are inactive, and will provide buttons where you can delete each of the models as soon as you enter the application. We currently support local instances of the Mistral 2.5 7B and Llama2 7B parameter models, and plan to add support for other models soon.
Adding Radio Buttons for Model Selection
If we pick up from the previous article in this series, we have our copilot chats added and can get an entire copilot conversation back in the chat. Next, we will start to add visual radio buttons so you can simply select each model. Here are the models that we will be working with:
Mistral CPU (local)
Mistral GPU (local)
Llama2 7B CPU (local)
Llama2 7B GPU (local)
GPT 3.5 (cloud)
GPT 4 (cloud)
You'll notice that these models are either hosted in the cloud or hosted locally on your machine. We can use each of these models in the same location, all by pressing one button. There is one additional step to download the model itself, but we will get to that further in this article.
First, we will add in the radio buttons. You can take the below code snippet and use to add radio buttons into the html of your project if needed, and it is also used as the controls for changing between LLLMs and the Cloud models provided:
You can see that the form above contains all of the radios, followed by the models’ downloads containers, which hold each of the buttons that are connected to downloadable chat models.
Each of these buttons will represent a single selectable model. As you adjust these values, each model that is selected is set to the active model, EXCEPT for when we first load the page. On first load, we default to the GPT 3.5 model and use it to perform our first request. When you add in these inputs, they are each connected to model download logic that lives in the new ModelProgressController.ts
file.
This is where all of the magic lives! The Pieces endpoints handle the heavy lifting with our LLLMs and Cloud LLMs, which will allow us to build and functionally switch between them quickly. Why build your own copilot if it’s only powered by a single LLM?
Setting up the Models for Selection
In the modelsProgressController.ts
, we first set a public variable to store our list of models. We can iterate through this list later, as we get the proper enum and value for each model that is returned from the Pieces.ModelsApi
:
Using the Pieces.ModelsApi().modelsSnapshot()
, you can get a list of all the available models. Then, the el.foundation === Pieces.ModelFoundationEnum.Llaama27B
in combination with a el.unique is used to filter on the iterable list down to the default model. This is the first time that this enum has been used thus far, but it certainly won’t be the last. Use that in combination with a few variables to start and create our onClick()
functions in src/index.ts
.
Moving over to the main()
function, we can use .getInstance()
to retrieve and set our modelsProgressController and our list of models, then create variables to store each model's value. We also set the initial value of the model that is selected to GPT 3.5:
Then we can start to create each button using a similar pattern. First, we create the button element, setting its onClick function to set the appropriate CopilotStreamController.selectedModelId
as each dial is selected. Here we use the Mistral button as an example:
The selectedModelId
is used when any message is sent from the input of the copilot chat and passed in as the model
parameter on Pieces.QGPTStreamInput.question.model
where the id from the Mistral model has been stored:
Then we need to repeat this for each of the radio dials that we are creating - or rather the models that we are attempting to download. We’ll ensure that each button is present and throw an error if it is not found with if (!downloadMistralCpuButton) throw new Error('expected id mistral-cpu-button)
. This gets more interesting inside of the Mistral CPU and Mistral GPU models since we will download those files locally. Remember since you have Pieces OS downloaded and it is being used as the database and storage location for the models, it is handling storing and providing access to the model via these endpoints.
Once each of the radio buttons is created, the end result will show 6 radio dials that are set up for swapping between each of the models and setting the proper value each time it is selected:
Downloading the Mistral CPU Model
Now we can create the logic for downloading the CPU model when we click a button below the radio dials. This will detect if the model is already downloaded and give the appropriate option to delete it if it has already been downloaded.
Below where the mistralCpuButton
is created, we can add a check to see if the CPU model is present in the Client. Because we create the buttons with JavaScript, we can conditionally add the 'Download Mistral CPU' text to the button only when the model is not downloaded. When it is downloaded, we can provide the option to delete.
After we create the buttons, we can set the specific model that we want to download when we click that button. We can then use the ModelApi().modelSpecificModelDownload()
and pass in the corresponding .id
value on the Pieces.Model value stored on mistralCpu
:
Showing Download Progress
We also want to show our download progress. Let’s create a unique .id value for the const mistralCpuDownloadProgress
that we created:
Over in the WebSocket found in ModelProgressController
, you can see how we use the model.id
to connect and set up the appropriate WebSocket the corresponding model locations as the download emits values. The event data comes back, the download progress is selected based on the progress ID that we created above, and we can use the values found on the model
based on Pieces.Model.name
.
Then, using the event data that is emitted—which will either be the event.percentage
or event.status
depending on if the model download has/has not started—we will share the percentage numbers. You will get a number back for each percentage value here. We use all that to set the downloadProgressElements.innerText
value.
Here is all of that together:
Showing Individual ModelDownload Progress or Status
Now we can set the appropriate status message based on the event.status
that is returned. We can compare it to the ModelsDownloadProgressStatusEnum.Initialized
value, which would indicate if the button has been pressed on the page but the download has not officially begun yet.
The model download itself can be canceled once it has begun, so once the status is initialized
, we will create and append the cancel download button. This button uses the ModelApi().modelSpecificDownloadCancel({ model: model.id })
inside of its onClick function.
If event.status
is either ModelsDownloadProgressStatusEnum.Failed
or ModelsDownloadProgressStatusEnum.Unknown
, we want to remove the button that allows for the model to be canceled. If the event status is ModelDownloadProgressStatusEnum.Completed
, then we can refresh the page to ensure that the new download is added and the radio dials are in sync:
The result is something like this:
Deleting a Downloaded Model
Now, if a model is already downloaded, we want to make it easy to remove the model. This is useful if we don’t need the model anymore, or if we want to re-download the entire thing. We’ll set up the delete function using the ModelsApi().modelsDeleteSpecificModelCache()
with the model ID for each corresponding model back in index.ts
to remove it from its downloaded location.
Remember that we are using the mistralCpu
model as an example and that examples for each CPU and GPU model are included in the full repo.
The model should then be deleted, and the buttons and radio dials will return to normal upon page refresh. This function is built into the deleteMistralCpuButton.onclick
function above.
Upon pressing delete, the Llama2 Model should be deleted and the buttons will return to their original state, allowing you to download the models again!
Ready to Build Your Own Copilot?
This guide is a great way to introduce yourself to downloading LLLMs and an easy-to-use environment to get them set up on your machine, observe how they work, and understand how simple it is to observe their status with Pieces OS Client. This way you can build your own copilot for your application, project, or other solutions you need throughout your workflow.
To get the entire project, you can visit the repo and clone it to get started. All of the above code can be found there and used as copy-and-paste examples for whatever use case you may have.
The next article in this series on how to build a copilot will cover setting up context in each conversation to show the different ways you can combine specific models with particular contexts to get more pointed and useful responses.
If you are interested in contributing to Open Source by Pieces, join our community on Discord or check out our OpenSource repository on GitHub for projects and other open-source initiatives. If you haven't checked out the Pieces for Developers Desktop App and Pieces OS before now, go learn about all of the functionality that is available to help you power your workflow and enhance your work as a developer!