Long-Term Memory is all you need
Learn how long-term memory enhances your workflow by providing lasting context, improving recall, and boosting productivity.
AI is fast becoming a powerful (though sometimes ill-informed or even downright wrong) assistant that is helping all kinds of knowledge workers be more productive.
It does this based on a mixture of models trained on knowledge from sources like the internet or text books, potentially combined with context from extra sources such as databases of corporate information or organizational document stores.
👉 We treat the AI as an assistant, a work companion, or a copilot, leveraging its capabilities to help us with daily tasks. But the area it is missing out on in assisting us is the knowledge of our other actions as a user, outside the bounds of the applications we use to access these AI tools. It’s missing out on a long-term memory fed by our actions, that can be accessed by the LLM and used for reasoning.
In this post, we dive more into long-term memory, see how we can draw parallels with the way the human mind works and how we’d like AI to work, and see ways in which we can augment the AI with a long-term memory, by using the second generation Long-Term Memory capabilities of the Pieces LTM-2 engine.
TLDR’
AI isn’t truly intelligent – it’s just advanced pattern recognition.
Pieces beyond search by retrieving your own knowledge, finding patterns, and recalling context when needed.
Instead of just a smart search tool, we act as a human memory assistant, keeping you at the center.
What is long-term memory?
In biological terms, such as referring to the human or animal brain, long-term memory is one of 3 types of memory:

Short-term memory, or working memory – these are memories that you need to access right now, such as remembering the first part of this blog post, or the conversation you are having.
Long-term memory – these are short-term memories that your brain has decided to store for long-term access. This is episodic memory, storing experiences, combined with emotions and feelings.
Implicit memory – these are the memories that you don’t need to explicitly remember, such as muscle-memory associated with movement. For example, knowing how to drive, catch a ball, or walk, are all implicit memories.
Combining long- and short-term memory
With long-term memory, you have additional context to apply to the information being gathered by your short-term memory, to allow you to respond appropriately.
Your brain is able to cross reference these different types of memories, such as when you see an actor in a TV show and you remember what movie you’ve also seen them in, or when you get a weird error when compiling or running your code, you can quickly remember that you’ve seen it before and what the fix is.
By having a companion with a similar long-term memory, you are able to use this shared context.
For example, my wife and I were watching a TV show called Grand Designs about folks who are building houses, and I said to my wife “do you remember the house from the people we saw at that place”, a very generic statement, and she instantly understood I was referring to an episode about a couple who built a lovely house partly underground, and we saw them at the bar at Glyndebourne when we were there one season to see Carmen.
We had that shared context in our long-term memories, allowing us to cross-reference with our short-term memory.
How AI aligns with human memory concepts
When I think of these in terms of interacting with an AI, I personally like to think of it as:
Implicit memories are what the LLM is trained on – the raw data scraped from the internet.
Short-term memories are the chat history, the questions you’ve asked before and their responses.
Long-term memories – well, this bit is missing. The AI has no episodic memory to call on based on your human experiences.
Without long-term memory, AI is missing a huge amount of power. Imagine the equivalent of asking an AI “do you remember the house from the people we saw at that place” and have it respond correctly.
To put it in the context of a developers workflow, imagine being able to ask the AI “What was the package that Ellie recommended to me in chat last week, and how do I install it based on the documentation I was reading”
and get an answer based on an episodic memory that included both your chat and browsing history.
This would be very powerful!
📌 As a reminder, AI is not actually intelligent, and when we think of memory or other AI processes, these are merely code that follows mathematical processes to extract and create information from stored data.
We anthropomorphize them, or liken them to humans, because it’s an easier way for us to comprehend the way they work, but it’s literally just spicy auto-complete. 🌶️
What is long-term memory in the context of an AI companion app
When I think of an AI companion to help me with knowledge work, I want it to be a companion with an eidetic memory that is sitting next to me, knows what I know, and can recall this information when requested.
The focal point here being “knows what I know”. I want this companion to know about the conversations I’ve had, the documentation I was reading, the code I was compiling and so on, but nothing that might be irrelevant or sway the responses away from being helpful.
❕ This way, when I ask it a question, it has the context that is relevant to me and my question, not knowledge of everything.
For example, if I am reading the Ollama CLI documentation, and I ask the companion “based on the documentation I was reading, how do I interact with Ollama”
, I want it to tell me how to use the CLI.
Not the API, not any of the SDKs. I want the answer just based on what I was looking at, not based on all the information out there on the wider internet.
Although it’s helpful to have an AI trained on this wider amount of knowledge, I need one that knows what context is relevant, and just use that to generate a response.
These companions should have long-term memory that is application-agnostic, not tied to a particular system or product.
It should cover the code I write, the communications I make in chat or email tools, the research I do in my browser, the data from reporting tools and so on.
Not just be tied to knowledge mining in a specific system.
This is not to say there isn’t a need for product- or system-specific copilots such as Microsoft 365 Copilot that can mine a Sharepoint setup, but these tools do a different job.
One is to be an assistant that helps you remember what you are doing and connect it all together, the other is a tool for accessing information you may not have seen before.
Examples of why long-term memory is needed for AI tools
Every day when I interact with different AI tools, I’m constantly reminded of the need for long-term memory inside of them to bring additional context.
Here are a couple of scenarios where existing tools keep the context way too limited:
Switching AI tools
With the regular releases of new AI tools – from ChatGPT, to Claude, to Qwen and DeepSeek, I often switch from one to another to try them out. The problem is, my chats don’t come from one to another.
Using Pieces helps, as I can switch LLM mid-conversation, but using the tools directly just doesn’t have that feature, I guess to try to keep you in their walled garden.
It would be incredibly powerful to fire up a conversation inside whatever is the latest flavor of AI tool, and ask something like “Based off of my previous conversation with the other LLM, what should I do next?”
Accessing multiple projects from the same IDE chat
Tools like GitHub Copilot or Cursor are very powerful when interacting with a project. I can open the project, and ask questions about the codebase. Where they fall down is asking questions about a different project.
I often have multiple projects open, and use one as a reference for another – how I used a particular component or package, how some boilerplate code is implemented, things like that.
By limiting what I can ask the AI in the IDE to just the open project, means I am often context switching between instances of the IDE.
It would be more helpful if the IDE was able to remember the other code bases I was working on.
Examples of why long-term memory can add additional AI-powered experiences
As well as solving those 2 cases above, long-term memory opens up a whole new range of scenarios powered by AI that you literally cannot do at the moment with existing tools.
Create a report for a standup
A standup is one of the rituals for agile software development. Every day developers join a meeting and say what they did yesterday, what they are going to do today, and any blockers.
Done right it can be very powerful, giving visibility to the entire team, the progress of the project, allowing for quickly finding ways to unblock each other, checking progress for dependent tasks, and spotting opportunities to collaborate.
These are typically done in a group meeting, but this model falls down with developers in different timezones, remote work, and so on.
A lot of teams are performing these asynchronously, with developers providing their updates as bullet points in a team chat, their collaboration tool, or shared document, with everyone reviewing it each day to look for those all-important collaboration opportunities.
The downside to these meetings is coming into work and immediately having to think “What was I doing yesterday”
and potentially spend time looking it up.
As we close tasks, our brain likes to file them away so we forget – they are no longer in our working memory, so it takes effort to recall.
By having access to an AI with long-term memory, you can ask the AI to create these updates for you. It can ‘remember’ the GitHub issues you closed, or code you were working on.

After reading the team's updates, that is then available to the long-term memory. “Did Brian finish the backend refactoring task yesterday?"
or “What was the ticket number that Leo closed on Friday”
becomes the type of questions you can ask without having to go back to your chat tool to look it up.
Implement an issue in code based on a recommendation in chat
This is one of my personal favourite scenarios that long-term memory can unlock.
I can use a tool like Cursor or GitHub copilot to make changes to my code base, but I have to explain those changes myself – providing details in the Cursor composer, or GitHub copilot chat.
These details can come from GitHub issues, email, chats, research in my browser, and so on.
To implement a code change, I may want to refer to the GitHub issue, then to a chat conversation where someone recommended a solution, then to documentation I was reading about the solution.
For example, if I’m working on adding a widget to a UI, I may have been discussing it with a colleague who recommends a particular widget, then reading more about how to make it work the way I want in the online documentation.
I could provide all this context to the AI, but I’m lazy – I’d rather just say “Hey AI, that issue I was just reading. Help me implement it using the widget Leo recommended using the initialization method I was just reading about in my browser”
and have the AI bring all that context together with my code base and suggest the right changes.

How can AI implement a long-term memory
As an AI engineer, there are multiple ways to implement long-term memory. The 2 most popular ideas are fine-tuning and RAG.
Fine-tuning
When you train a model, you run millions or billions of training cycles using a large data set. This gives you a generic model that anyone can use.
To give a model access to more specific information, you can fine-tune it, by running additional training cycles using specific information.
This way the model will be better at returning results relevant to the information used in the fine-tuning.

This is an interesting technique, and one that is being suggested as a way to have self-evolving AIs, constantly learning and improving.
The upsides to fine-tuning are:
🟢 Once the model is fine-tuned, the same prompt can bring back accurate answers without additional context being added, so no increase in the number of tokens to send and potentially the cost for cloud models
🟢 As the new information is part of the model, there’s no increase in the time to get a response
The downside is:
🔴 Fine-tuning is expensive – you need powerful GPUs, and to run thousands of training cycles, leading to a massive hardware or cloud cost.
🔴 The fine-tuning process can take days. This means there is a massive lag between the AI ‘learning’ something, and it being available to you.
🔴 You cannot change the model without fine-tuning a new model.
🔴 There are limits to the models you can fine-tune. If you are using a cloud-based model there may not be any fine-tuning capabilities, or you may need to spin up cloud infrastructure to run your own deployments.
RAG
RAG is a much more efficient way to implement memory in an AI.
Retrieval-augmented generation, or RAG, is the process where you take a prompt, use it to look up relevant context, then send that with the prompt to the LLM.
The classic example of this is if you were building a chatbot to answer questions on customer orders in a retail system.
If you ask about order 66, the RAG system would look up this order, and send all the information about this order, such as item details, quantity, price, customer, and previous conversations to the LLM along with the prompt.
The upsides to RAG are:
🟢 There are no model changes, so no costs associated with hardware or training time.
🟢 You can save information to the RAG system in milliseconds and it becomes instantly available for the next prompt with no training lag.
🟢 You can iterate faster over changing what information is stored or found based on the prompt.
🟢 You can switch models at any time, usually with a small change to how you prompt.
There are downsides to RAG:
🔴 You need to build a RAG system that knows how to interpret the prompt and pull out the right data. You need to capture and index this information.
🔴 There is a performance hit with each prompt as you have to look up the relevant data.
🔴 Your prompts become much larger, meaning a slower response from the LLM, and potentially higher cost if you are using a cloud LLM with a price per token.
Introducing LTM-2, the second generation Pieces Long-Term Memory engine
At Pieces, we’ve spent years focusing on this problem.
We want to ensure you have a long-term memory that is available to the AI, that is an assistant to you.
We announced LTM-1, the first generation of this last year, and are excited to release LTM-2, our second generation Long-Term Memory agent.
👉 What makes our approach so interesting is that we are not knowledge mining a specific system, like querying across a corporate documentation store, or customer data, we are mining the knowledge that the human has collected, and potentially forgotten, and finding patterns and links in that data, then recalling this in a way that hopefully gives the human the information they need.
We are very much that human memory assistant, instead of a smart search tool. We keep the human central to what we are doing – we capture context from the active windows, so we are focusing on what you are doing and reading.
This human-centric Long-Term Memory opens up the ability to do a huge range of novel prompts that are just not possible with any other tool.
Check out my recent post on my top prompts that you can do with Pieces, most of which are only possible thanks to the Pieces Long-Term Memory.
Pieces has a free tier available for everyone with a wide range of LLMs to choose from, both in the cloud, or offline.
Download Pieces and give this a try, either in our flagship desktop app, or inside the IDE, browser, or productivity tools you use on a day-to-day basis.
How does LTM-2 work
LTM-2 works by capturing text from your active windows using OCR amongst a range of other techniques, indexing this locally, and storing it in a local encrypted database.
This data is all processed securely and privately on your device, rather than being sent to the cloud.
We also scan the data for PII, and other secrets like API keys, and do our best to eliminate these. All this in exchange for only a few percent of your CPU.
Once we have this data, we have a RAG system powered by some custom AI that can connect your prompts to the information we have stored. We use what we call temporal grounding along with a load of indexing smarts to find the relevant information.
If your prompt contains enough information for a specific lookup, we can extract the right result - for example, if you ask “What package did Ellie recommend I use in my app”
, and there’s only one package that Ellie has recommended before, then we can find it.
If there might be confusion on the results, you can add time-based phrasing.
For example, if Ellie recommended a package last month, then another one yesterday, you can ask “What package did Ellie recommend yesterday”
to get the relevant result.
When you prompt, we extract the relevant context, then send that to the LLM of your choice along with your prompt.
By using a RAG model instead of fine-tuning you can use the LLM of your choice, and have responses use the Long-Term memory instantly.
You can use cloud models, and in doing so, the long-term memory context is sent to a cloud LLM, or you can use local models, and everything works offline with nothing leaving your machine for complete privacy.
We also have a good balance between storage space, and the ability to extract information.
📌 We store 9 months of data, meaning we won’t eat too much of your hard drive space, but provide you a large enough window.
Privacy
Every step on our journey with LTM we have considered privacy. From local processing, to encrypted data, we want your data to be safe and private.
After all, the concept of an app watching everything you do is scary. This is why we always talk about Pieces as a tool for your work machine.
With LTM-1 we added the ability to pause or stop the Long-Term Memory engine collection, so if you are doing activities you want to keep private, you can quickly switch it off, then back on when you need it.
For example, if you are a manager and are working on a performance review, or involved in client projects and need to not capture information for specific clients.
With LTM-2 we take this privacy further, with the ability to turn off capture on an app by app basis.
Never want to capture context from an HR system?
Once one capture has happened, the app will appear in the PiecesOS Long-Term Memory access control list, and you can disable capture.
You can also delete captured context on an app-by-app basis if you forget to pause the Long-Term Memory engine when you access systems you don’t want to capture from.
As always, the only time your private memories are sent to the cloud is if you are using a cloud LLM. If you are using an offline LLM, you can even turn your WiFi off or unplug your ethernet cable, and everything will still work, completely air gapped.
We are working hard to add more privacy features. Keep an eye out for more features coming soon.
Workstream activity
As well as capturing this information and surfacing it through a copilot chat, we also surface your activities every 10 minutes in our new Workstream Activity view.
This contains a rollup summary of all the activities in that 10 minute period – great for refreshing your memory after a meeting, grabbing links to web pages you were reading without having to work through all the hundreds of tabs you have open, or checking your todos for the day.
Each summary item has some tags for quick filtering, the apps that context was captured from, and more!
This is a searchable list, and you can download the contents as markdown or plain text if you want to share it with anyone.
Useful for sharing a set of tabs you were reading for research, or the summary from a standup.

The future of Long-Term Memory
So what is the future for long-term memory?
AI predictions is a big area, with AI engineers looking at use cases for memory, mainly using a knowledge base for that ‘memory’ such as corporate document stores, or even blockchains.
At the moment Pieces is the only Long-Term Memory system that is designed to be human-centric, surfacing the information that a real user has come across, providing augmentation to our memory.
The future is exciting. At Pieces we are already talking about proactive memory – that is having an agent spot information that you are likely to want to act upon, and surface to you memories that might be relevant.
For example, if you get an error when running your code that you have seen before, the memory agent can detect this, and proactively send you information from the last time you researched and fixed the same error.
But that’s the future - today you are able to install Pieces, taking advantage of our free tier, and have access to our Long-Term Memory.
If you haven’t already installed it, download Pieces for free now.
Once you have Pieces installed and the Long-Term Memory turned on, come back here, read this post again (or just skim through it), and ask Pieces to “summarize the long-term memory is all you need blog post I was just reading”
to see it in action.
What do you see as the future of Long-Term Memory? Please share what you build with us at Pieces on X, Bluesky, LinkedIn, or our Discord.
