/

Product Updates

Jan 23, 2025

Jan 23, 2025

New for 2025 – more local models in Pieces, including Qwen Coder and Phi-4

Pieces has added more local models to power the copilot and long term memory including Qwen coder and Phi-4

I wrote recently about our internal upgrade, changing how Pieces handles local models to use Ollama

One reason for this change was to make it easier and quicker for us to bring new local models to Pieces. 

I’m excited to announce that we’ve just released our first batch of new models, including Qwen 2.5 Coder, a model that a lot of folks have asked for.


The local models available through Pieces

We’ve extended the model catalog to include new and updated models from Google, IBM, Meta, Microsoft, and Mistral, as well as adding models from Qwen and StarCoder.

Here’s our complete set.


Google

Model Name

Parameters

Description

Ollama Model Page

Gemma 2

2B, 9B, 27B

Google Gemma 2 is a high-performing and efficient model, featuring a brand new architecture designed for class-leading performance and efficiency. 

Gemma 2

Gemma 1.1

2B, 7B

Gemma 1.1 is a new open model developed by Google and its DeepMind team. It’s inspired by Gemini models at Google.

Gemma

CodeGemma

7B

CodeGemma is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following. 

CodeGemma


IBM

Model Name

Parameters

Description

Ollama Model Page

Granite Code

3B (2k context window), 3B (128K context window), 8B, 20B, 34B

Granite Code is a family of decoder-only code mode

Granite Code

Granite 3.1 Dense

2B, 8B

The IBM Granite models are text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing.


They are designed to support tool-based use cases and for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.

Granite 3.1 Dense

Granite 3 Dense

2B, 8B

The IBM Granite models are text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing. Granite-8B-Instruct now rivals Llama 3.1 8B-Instruct across both OpenLLM Leaderboard v1 and OpenLLM Leaderboard v2 benchmarks.

Granite 3 Dense

Granite 3 MoE

1B, 3B

The IBM Granite models are long-context mixture of experts (MoE) Granite models from IBM designed for low latency usage.


The models are trained on over 10 trillion tokens of data, the Granite MoE models are ideal for deployment in on-device applications or situations requiring instantaneous inference.

Granite 3 MoE


Meta

Model Name

Parameters

Description

Ollama Model Page

Llama 3.2

1B, 3B

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.

Llama 3.2

Llama 3

8B

Meta Llama 3, a family of models developed by Meta Inc. are new state-of-the-art.


Llama 3 instruction-tuned models are fine-tuned and optimized for dialogue/chat use cases and outperform many of the available open-source chat models on common benchmarks.

Llama 3

Llama 2

7B, 13B

Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.

Llama 2


CodeLlama

7B, 13B, 34B

Code Llama is a model for generating and discussing code, built on top of Llama 2. It’s designed to make workflows faster and efficient for developers and make it easier for people to learn how to code. It can generate both code and natural language about code. Code Llama supports many of the most popular programming languages used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash and more.

CodeLlama


Microsoft

Model Name

Parameters

Description

Ollama Model Page

Phi-4

14B

Phi-4 is a 14B parameter, state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets.

Phi-4

Phi-3.5 Mini

3.8B

Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites with a focus on very high-quality, reasoning dense data.

Phi-3.5

Phi-3

3B (Mini), 14B (Medium)

Phi-3 is a family of open AI models developed by Microsoft. This is available in 4K and 128K token context windows.

Phi-3

Phi-2

2.7B

Phi-2 is a small language model capable of common-sense reasoning and language understanding. It showcases “state-of-the-art performance” among language models with less than 13 billion parameters.

Phi-2


Mistral

Model Name

Parameters

Description

Ollama Model Page

Mixtral 8

7B

The Mixtral large Language Models (LLM) are a set of pretrained generative Sparse Mixture of Experts.

Mixtral 8

Mistral

7B

Mistral is a 7B parameter model, distributed with the Apache license. It is available in both instruct (instruction following) and text completion.

Mistral


Qwen

Model Name

Parameters

Description

Ollama Model Page

QwQ Preview

32B

QwQ is an experimental research model focused on advancing AI reasoning capabilities.

QwQ

Qwen 2.5 Coder

0.5B, 1.5B, 3B, 7B, 14B, 32B

The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing.

Qwen 2.5 Coder


StarCoder

Model Name

Parameters

Description

Ollama Model Page

StarCoder 2

15B

StarCoder2 is the next generation of transparently trained open code LLMs

StarCoder 2


Why use local models?

If you’ve not come across local models before, let’s take a moment to dive into them. These are models that, put simply, run locally – they are LLMs that run on your device instead of in the cloud.

I’m a huge fan of local models. It feels like every week they are getting more and more powerful, and the gap between the quality of results you get from a model you can run on your local machine, and a model that runs in the cloud on racks of hardware worth hundreds of thousands is getting smaller and smaller.

What are the benefits of running AI locally

To me, there are 3 big advantages to local models.

  • AI Governance –  Companies are putting restrictions on AI usage to stop private data being sent to cloud-based LLM providers. Some companies enforce local models only so no private data leaves your device.

  • Environmental impact – AI has a huge power need, leading to large amounts of carbon emissions contributing to climate change. On-device AI has a much lower power need, making them greener.


What are the disadvantages of local models?

Local models are great but not perfect. The disadvantages are:

  • You need a powerful computer – To run a local model you need a reasonably beefy machine. For every billion parameters you will need a GB of RAM on top of what your system is using. For example, if you want to run a 7B parameter model then you will need at least 8GB of RAM, if not more depending on what else you are running. Ollama takes advantage of GPUs, such as those from NVIDIA, or the GPU cores built into Apple Silicon processors.

  • These models run slower – Local models run slower than cloud models. This is down to the hardware - no matter what GPU you are running, it will be less powerful than a rack of dedicated AI server-grade GPUs.

  • The results may not be as good – In general the results of running a local model will not be as good as running a cloud model. These models are smaller, meaning they have encoded less information. These models are getting better all the time though. This is especially true with local models trained on a specific task so that they don’t need to be as large as their corpus of information is smaller. For example, models that are trained to code need little knowledge of tourist attractions in Paris!


Take these models for a spin!

Pieces supports local models the same way it supports cloud models – everything you can do in Pieces with Claude, for example, you can do locally using Llama, Qwen, or Phi-4. Pieces manages everything for you, so you can enable a local model with a couple of clicks. Once you have a local model downloaded and activated, you can use it offline.

Give these new models a spin, and let me know your thoughts on X, Bluesky, LinkedIn, or our Discord. And if you are not using Pieces yet, give it a try for free!

Written by

Written by

SHARE

New for 2025 – more local models in Pieces, including Qwen Coder and Phi-4

Title

Title

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.