New for 2025 – more local models in Pieces, including Qwen Coder and Phi-4
Pieces has added more local models to power the copilot and long term memory including Qwen coder and Phi-4
I wrote recently about our internal upgrade, changing how Pieces handles local models to use Ollama.
One reason for this change was to make it easier and quicker for us to bring new local models to Pieces.
I’m excited to announce that we’ve just released our first batch of new models, including Qwen 2.5 Coder, a model that a lot of folks have asked for.
The local models available through Pieces
We’ve extended the model catalog to include new and updated models from Google, IBM, Meta, Microsoft, and Mistral, as well as adding models from Qwen and StarCoder.
Here’s our complete set.
Ollama model pages: Gemma 2 Ollama model page / Gemma Ollama model page / CodeGemma model page
IBM
Ollama model pages: Granite Code / Granite 3.1 Dense / Granite 3 Dense / Granite 3 MoE
Meta
Ollama model pages: Llama 3.2 / Llama 3 / Llama 2 / CodeLlama
Microsoft
Ollama model pages: Phi-4 / Phi-3.5 / Phi-3 / Phi-2
Mistral
Ollama model pages: Mixtral 8 / Mistral
Qwen
Ollama model pages: QwQ / Qwen 2.5 Coder
StarCoder
Ollama model page: StarCoder 2
Why use local models?
If you’ve not come across local models before, let’s take a moment to dive into them. These are models that, put simply, run locally – they are LLMs that run on your device instead of in the cloud.
I’m a huge fan of local models. It feels like every week they are getting more and more powerful, and the gap between the quality of results you get from a model you can run on your local machine, and a model that runs in the cloud on racks of hardware worth hundreds of thousands is getting smaller and smaller.
What are the benefits of running AI locally
To me, there are 3 big advantages to local models.
AI Governance – Companies are putting restrictions on AI usage to stop private data being sent to cloud-based LLM providers. Some companies enforce local models only so no private data leaves your device.
Environmental impact – AI has a huge power need, leading to large amounts of carbon emissions contributing to climate change. On-device AI has a much lower power need, making them greener.
What are the disadvantages of local models?
Local models are great but not perfect. The disadvantages are:
You need a powerful computer – To run a local model you need a reasonably beefy machine. For every billion parameters you will need a GB of RAM on top of what your system is using. For example, if you want to run a 7B parameter model then you will need at least 8GB of RAM, if not more depending on what else you are running. Ollama takes advantage of GPUs, such as those from NVIDIA, or the GPU cores built into Apple Silicon processors.
These models run slower – Local models run slower than cloud models. This is down to the hardware - no matter what GPU you are running, it will be less powerful than a rack of dedicated AI server-grade GPUs.
The results may not be as good – In general the results of running a local model will not be as good as running a cloud model. These models are smaller, meaning they have encoded less information. These models are getting better all the time though. This is especially true with local models trained on a specific task so that they don’t need to be as large as their corpus of information is smaller. For example, models that are trained to code need little knowledge of tourist attractions in Paris!
Take these models for a spin!
Pieces supports local models the same way it supports cloud models – everything you can do in Pieces with Claude, for example, you can do locally using Llama, Qwen, or Phi-4. Pieces manages everything for you, so you can enable a local model with a couple of clicks. Once you have a local model downloaded and activated, you can use it offline.
Give these new models a spin, and let me know your thoughts on X, Bluesky, LinkedIn, or our Discord. And if you are not using Pieces yet, give it a try for free!