AI & LLM

May 23, 2025

What do nano models and penguins have in common?

Why small, specialized nano models are making a big impact in AI, and what do they have in common with penguins?

Recently, I sat down with Sam Jones, the Chief AI Officer at Pieces, to talk about something we’re both pretty excited about: nano models.

If you’ve never heard of them, you’re not alone.

Most developers know about large language models (LLMs) like ChatGPT, Claude, or Gemini.

Or you may have even heard about small language models (SLMs), and some might still be figuring out the difference.

And honestly, in the middle of the AI boom, where every scroll on Twitter or LinkedIn means yet another model, another AI startup, another AI-generated post with 1,000 comments, it’s hard to tell what’s actually meaningful.

Filtering real ideas from real people can feel like a breath of fresh air.

So here we are, dropping one more blog post into the mix.

But we hope this one sticks.

Why?

Because it’s about something real: how nano models work, why they matter, what they unlock in our LTM 2.5 release, and yes… how all of this somehow ties back to penguins and the environment. 🐧

In this conversation (check the above video!), Sam tells me all about what nano models are, why they matter, and how we’re using them in our new LTM 2.5 release to make Pieces smarter, faster, and way more efficient.

What makes an AI model "Nano"?

We didn't want an AI generated, overly polished definition, we wanted the real thing. So here's how Sam breaks it down:

"In my mind, a nano model is a very good single-task learner. It's doing one thing, it's doing it very, very fast".

That means instead of being a general-purpose model like GPT-4o, a nano model is laser-focused. It’s built for speed, reliability, and efficiency on one specific task.

I like to think about it like this: you could use a nano model to find all the photos of cats on your device and put them in a folder, automatically add tags to your notes (like “travel”, “code snippet”, “ideas”), you can even use them to filter out spam messages that you definitely don’t want to read.

These are just a few examples of small tasks that you can train these models to do.

Of course, you could just make API calls to an LLM like ChatGPT 4o, but at the end of the day, you don't need a model that large to do something that simple.

By using nano models, you’ll end up saving money and time in the long run.

In regard to saving some time, Sam mentioned that:

"Some of our fastest models execute in three or four milliseconds and it’s doing it on the edge in low resource environments, by which I mean these things should be able to work on your phone from 3 years ago".

That’s a key part of our latest release - not only are these nano models quicker, but they’re also practical.

You don’t need a powerful server, GPU, or the latest MacBook M4. Your Windows laptop from 10 years ago should do just fine.

Why developers should use nano models instead of LLMs

Speed is one thing, but the broader benefit is about using the right tool for the job.

As Sam put it:

"There's no reason to use a massive cloud model to do a very simple task".

Imagine having to call a full LLM to label a calendar link or identify a code snippet.

It’s overkill.

Nano models are built to solve these types of repetitive, low-complexity tasks quickly and locally.

That doesn’t just make your app faster. It makes it cheaper to run and less resource-intensive.

As someone who cares about the environment (and cute animals), I asked Sam whether these nano models could help reduce the impact running these large language models 24/7 has on the environment, and Sam said:

"Absolutely, there's an ethical case to be made there to move to smaller models, I mean there's no point killing penguins because you want Chat GPT 4o to tell you a joke".

And I don’t know about you, but if using nano models means saving penguins, then I’m all for it. I mean, look how cute they are!

This moment in the interview made me laugh, but it’s also very real.

Every API call to a huge model carries environmental and operational costs. Nano models help reduce that impact.

He added:

"I think 90% of people working in software, if you tell them look, this will do the same job, it's cheaper, it's faster, and you don't have to ping an API. I think everyone wants that. I think they're (nano models) going to be huge. I think we'll see fewer and fewer calls to these large, massive language models and more offloading tasks to the smaller micro (nano) models".

And he’s right. Most of the time, you don’t need a massive model to figure out if something is spam or to tag a note.

Nano models handle those simple tasks quickly, without using much power or calling a cloud API.

They save time, cut costs, and yeah – might even save a few penguins while they’re at it.

What’s the difference between SLMs and nano models?

Earlier, we wrote about SLMs (mostly why companies switch to SLMs) and raised a similar environmental question.

There’s a growing selection of eco-friendly LLMs like Phi-2, Mistral 7B, TinyLlama, and DistilBERT that require significantly less compute for both training and inference, while still delivering impressive performance.

On-device models (such as GGUF versions of LLaMA or Mistral) take this even further by eliminating constant cloud calls, reducing environmental impact.

While techniques like LoRA fine-tuning lower training energy usage by updating only small portions of a model instead of retraining the entire thing.

We often overlook just how much this matters until we realize the broader footprint of every model call, every GPU cycle.

That’s why choosing quantized, locally run, or LoRA-adapted models can offer major sustainability benefits without compromising on capability.

But what about nano models, and how do they differ from Small Language Models (SLMs)?

While SLMs like Mistral 7B or Phi-2 are general-purpose and capable of handling diverse tasks (e.g., summarization, coding assistance, chat), nano models are purpose-built for specific, repetitive micro-tasks.

They’re ultra-lightweight, require minimal power, run instantly on-device, and often operate quietly in the background: tagging notes, filtering spam, or surfacing memory cues.

If SLMs are the “brains” of an AI system then nano models are the “reflexes,” - quick, targeted, and efficient.

Together, they can result in sustainable, private, and performance-optimized AI experiences.

Why Pieces built nano models for LTM-2.5

When I asked why Pieces decided to go all in on nano models this release, Sam pointed to a few good reasons:

"We needed to go faster than hitting the cloud models would allow us… plus, everybody wants to save cash right? The more resources you have the further they go and we thought that was best spent on model development as opposed to cloud LLM costs, and finally we wanted autonomy".

It’s not just a speed thing.

Every time you rely on a cloud model, you’re at the mercy of response times, API changes, usage costs, and someone else’s roadmap.

Not only that, but the timing also lined up after making a pivotal hire:

"We brought on some incredible talent... someone from Google DeepMind (shoutout to our awesome ML Research Scientist Antreas), he's deeply involved in training these super small models and has this world-class research viewpoint. And he was able to share a lot of really incredible techniques with the team".

I may even get Antreas on for his own interview if he agrees to it!

Other than that, another big reason for the switch was privacy:

"The more you've got executing on these tiny local things, the less data you need to share... and the better the privacy".

Ultimately, it came down to this:

"You're just going to grow very tired of paying the wire time of hitting large cloud models... and we just want a bit of autonomy from that".

What’s new in the Pieces LTM-2.5 release

With this release, nano models are working across multiple touchpoints in Pieces, especially within Long-Term Memory, as was pointed out by our CEO, and Sam in the interview earlier, but also Pieces Copilot.

"These are mainly models that sit within the LTM and the Copilot... extracting metadata, extracting links, trying to work out your intention".

These new nano models help with things like tagging saved content, ranking it, connecting past actions to current context, and surfacing relevant memories.

I also asked if we made these models ourselves:

"Bit of both... We take open source models... then in-house we fine-tune them. I’d say they're like 90% created by Pieces".

Instead of one big model trying to do everything, we’ve got a network of specialist models. It’s faster, more precise, and easier to iterate on.

Sam described it like this:

"With the LTM-2.5 what we're doing is getting all of these small models to play together. So when you interact with our app you're going to kick off this chain or network effect amongst all these tiny tiny models and the result is going to be your your experience interacting with your pieces memories and everything you've got stored there.. is going to be much better than if we were doing the same thing with large language models.".

From a product point of view, we are sooo excited about all of our users being able to get deep into using Pieces Long-Term Memory, whether it's through the Copilot chat, Pieces MCP Server, or through the new Workstream Activity view.

How to train and deploy nano models for production

So, how do you actually build one of these tiny models?

Sam broke it down pretty simply:

"You find a task. Make it a simple one and a repetitive one... get a very large open source model, generate some data... take your tiny model, push all the data through it, do your ML tricks... and put it into production".

It’s a focused, repeatable workflow. Choose a task. Create high-quality training data. Fine-tune a small model. Ship it.

And thanks to the way we’ve structured our pipeline, it doesn’t take weeks of coordination or a huge team to do it.

"In 24 hours... we can repopulate all of our models... what would take a 20-person team maybe three months, our team of five can do in a couple of days".

That speed makes a big difference. We’re able to test ideas quickly, adjust when needed, and keep improving without waiting on long training cycles.

For devs experimenting with nano models or trying to get into ML engineering, Sam shared this:

"Any NLP task on a model less than 80 million parameters... if you can get it working better than a cloud model - that's impressive".

It’s a good reminder that there’s a lot of room to build here. You don’t need the biggest model. You just need the right one for the job.

Real-world impact of using nano models

Nano models didn’t just give us a performance boost. They changed how we approach our product entirely:

"Pretty much everything that we can, we switch to a nano model... that would be my ideal world".

Because the performance difference is real:

"If you've got things executing in like a couple of milliseconds on a CPU, it suddenly becomes possible to do a lot of filtering of that data".

From ranking context to surfacing saved snippets, that filtering power makes everything feel snappier and more personal.

And yes, the Workstream Activity view is going fully local too. A lot of users have asked for this specifically, and Sam confirmed it's already underway:

"We’re working in partnership with Microsoft to run our entire stack locally on their new NPU laptops... Give us three months and we’ll have the whole thing running locally".

That’ll be a huge milestone for users who care about both privacy and speed.

Lessons learned from building nano models at Pieces

Toward the end of our conversation, I asked Sam if he’d do anything differently when building Pieces now that we are moving in the direction of nano models. His response:

"There's always small things right, there's always “I wish I'd find that sooner” or “I wish I'd done better”, or you know maybe we didn't need to focus so much on that feature and do this instead. But I think those kind of comments sort of ignore the journey and I think we wouldn't be here now at the forefront of what we're doing without all of those twists and turns. So I would be exceedingly hesitant to change anything in my past at Pieces, because I think we're really at the start of something really cool here".

See nano models in action

Nano models may be small, but they’ve changed the way we think about product design, performance, and privacy.

They give us faster responses, lower costs, and more control over where and how your data is processed.

We’ve only scratched the surface of what they can do, but it’s already been a big step forward for Pieces – and for our users.

If you're curious about how this actually works under the hood, check out Nano Models Explained: How Pieces Uses Local Small AI Models in the Latest LTM-2.5 Release with Sam Jones on our YouTube channel.

Want to see these nano models in action? You can download Pieces and try it yourself, or reach out to book a demo with our team.

We’d love to hear what you think. In the meantime, happy coding!