AI & LLM

Jan 24, 2025

You can do what with Pieces AI when offline?

Enable offline AI with on-device LLMs, enhancing your productivity in privacy-focused environments.

This is the second post in my “You can do what with Pieces?” series. Last time I covered 5 copilot prompts that I really like, today is all about offline. In this post, I share 5 cool things you can do with Pieces when you are not connected to the internet.

Chat with the copilot offline

I’m a big AI copilot user, and when writing code I spend a substantial amount of time chatting with the copilot. I use the copilot to write boilerplate code, investigate new libraries that I don’t know about, or do chores like cleaning up code after I have spent time researching and building it out.

We’ve all gotten used to doing these chats online, leveraging LLMs like GPT, Gemini, or Claude, but did you know you can run LLMs on your local device?

Assuming you have a reasonably modern and powerful machine – such as a Windows device with an NVIDIA GPU, or any Apple Silicon-based Mac, with 8GB of RAM, you can run LLMs locally, giving fully offline generative AI.

Admittedly they won’t be as fast as an LLM running on a rack full of $17,000 AI GPU boards, but the results are pretty impressive, with some recent LLMs scoring equivalent to GPT-4o and other cloud-based models on a range of benchmarks.

Pieces currently supports a range of on-device LLMs that you can interact with through the copilot, all powered by Ollama, an open source framework for running local models.

You can literally download the local LLM of your choice from Pieces, turn off your WiFi or unplug your network cable, and have a conversation with the copilot as if you were using a cloud model.

All the features of the Pieces copilot just works with these offline AI models, from leveraging the Long-Term Memory to chatting about your current project.

Have your chats powered by your GPU

When running offline, the LLM can probably take advantage of your GPU to run faster. GPUs, or graphics processing units, are specialist hardware designed to process graphics, as the name suggests.

The first time this term was used was in 1994 by Sony to describe the graphics hardware in the first generation PlayStation.

GPUs are great at floating-point maths, so doing calculations on decimal numbers very, very, fast.

And as it turns out, AI is all about doing calculations on decimal numbers. The hardware that makes your PS5 or XBox have such nice fancy graphics can also be used to run AI very quickly.

Ollama natively supports a range of GPUs, from NVIDIA, to AMD, and Apple’s Metal.

Just by having this hardware, LLMs running locally will be faster – both in the time to first token, the measure of how long till the response starts streaming back and in the number of tokens per second, which is the speed that the answer is streamed to you.

This video shows the speed comparison.

In this video, the left view is the Pieces Desktop app using just a CPU, and on the right using an NVIDIA GeForce RTX 2050 Ti Laptop GPU. The question is asked, and at about 4 seconds in the copilot starts work.

For the GPU, the response starts streaming back after 1 second. For the CPU, the response takes over 11 seconds to start streaming, and the speed at which the tokens are coming is noticeably slower.

Switch chats to an offline LLM mid-conversation

As developers have used tools like Chat-GPT and Gemini directly, one of the frustrations has been having multiple places to have these conversations.

If you are chatting with Chat-GPT and want to try another tool like Gemini, you can’t simply switch mid-conversation, instead, you have to copy the entire chat history and context over.

Pieces solves this problem by allowing you to choose the LLM for your copilot chats, and then change mid-conversation. And this includes changing to an offline LLM.

This is something I’ve used a lot when travelling.

Airplane WiFi can be questionable, and sometimes unavailable, so it’s nice to be able to start working on something using a copilot chat powered by Claude or Gemini, then hop on the plane, switch my LLM to an offline model, then carry on the same conversation.

All my existing context including the chat history, and any files or folders I’ve added, will still be available as I chat with the offline model. Then when I land, I switch back to the cloud model and continue on.

Chat with an offline copilot, when online in a privacy or security-focused environment

On-device LLMs are not only about using LLMs when you are offline.

They also allow you to use an LLM in an environment where you can’t use a cloud-based LLM, such as when you need privacy around customer data or corporate IP.

I’ve chatted to loads of developers at large organizations, including financial services companies, who have corporate AI governance rules that simply won’t let them use cloud-based LLMs, they insist on local LLMs only.

As more and more companies see developers using LLMs, they are starting to focus heavily on good governance.

Central IT is setting rules that work with developers to ensure that they can use the power of LLMs so that they don’t just resort to shadow IT (shadow IT is using systems or applications that the organization doesn’t know about to get around known or perceived central IT limitations).

Pieces is the perfect solution for them, allowing an integrated copilot in the developer tools they use, combined with the peace of mind that their AI governance rules are being followed.

Enrich saved snippets

The Pieces Drive is how Pieces stores, enriches and manages materials that might be relevant to use a developer, or to your copilot chats.

When you save a code snippet to the Pieces Drive, it is enriched using AI. This enrichment includes adding descriptions, tags, links, suggested copilot queries, and more.

These enrichments are generated using an LLM, and you can choose how these are enriched – either using a local LLM, a cloud LLM, or blended mode (quickly enriched using a local model, then more details added using a cloud model later). If you want to use a local model, then you can, by configuring this in your Pieces settings.

Whether this is because you are offline, or in a security or privacy-focused environment, you can be sure your saved snippets will be enriched.

The flip side – what you can’t do when offline

Despite us living in a world where we assume internet connectivity is ubiquitous, we often are in situations where this is not actually the case. In the developed world we might be on a plane, in the middle of nowhere, or dealing with a terrible ISP that keeps dropping our connection. In the developing world, students might have decent internet access in college, but not at home.

Or if they do have access, it might be very expensive, or unavailable due to load shedding (a fancy term for rolling power cuts).

It is worth finishing this post off by saying what you can’t do with Pieces when offline.

And it’s pretty much one thing – you can’t download an offline model.

So make sure you download your favourite model(s) before boarding the plane, heading home from college, or driving out to the middle of nowhere.

If you have a favourite feature of Pieces, let me know on X, Bluesky, LinkedIn, or our Discord.