AI & LLM

Sep 11, 2025

AI memory explained: what Perplexity, ChatGPT, Pieces, and Claude remember (and forget)

Discover the different types of AI memory, how they work, key use cases, and the best prompting approaches to get accurate, context-aware responses

Think about how you use your own memory. Some things live in your short-term recall, like the phone number you just looked up or what you had for lunch yesterday. Other things live deeper: like your favorite lunch menu, the name of your first teacher, the stories you tell at family dinners.

Now imagine if you only had the Post-it note version of memory. You'd remember fleeting details, but the moment you left the room, it would all be gone.

That's how most AI assistants worked until recently: stateless. Each chat was a blank slate. Helpful in the moment, but forgetful the next.

Now engineers start noticing something is missing in their AI assistants. These systems could impress with their intelligence, drafting essays, writing code, summarizing research, but they couldn't remember who you were or what you cared about. In the excitement of generative AI's breakthrough moment in 2023, few questioned why every new chat started from scratch.

But as people began using assistants daily, for studying, managing projects, even running businesses the lack of memory became glaring. Why should we have to re-teach our AI every single time?

What AI Memory really means

In plain terms, AI memory is the ability of an assistant to retain information across interactions.

Unlike "stateless" AI where every session resets memory allows the system to remember who you are, what you've asked before, and the ongoing context of your work.

Why does this matter?

Because repetition is inefficient. Without memory, users constantly re-explain their goals, preferences, or projects. With memory, AI becomes less like a calculator and more like a colleague who remembers what's been said. But as with human memory the power comes with risks: bias, distortion, forgetting, or remembering too much. Which brings us to the different kinds of memory emerging in AI systems.

The architecture of AI memory

Short-term (session) memory

Like a mental scratchpad, short-term memory allows AI to remember the flow of a single conversation until you close it. When you ask ChatGPT to summarize a meeting, then immediately follow up with "now make it bullet points," it understands "it" because the session is still active.

This provides continuity within conversations but disappears once you leave. No long-term personalization, but also no long-term privacy concerns.

But what if we go further?

At Pieces, we think about what happens when this scratchpad evolves into a continuous long-term memory. The LLM is still the reasoning engine doing the heavy thinking, but Pieces adds the memory and grounding layer to it, which makes that reasoning actually useful for real work.

It’s the difference between:

Reasoning from nothing vs. reasoning from everything you’ve actually done.
Generic best practices vs. your team’s specific decisions and context.
Starting over every conversation vs. building on yesterday’s progress.

Instead of being just a smart conversation partner, Pieces becomes a smart teammate who was actually there for all your previous work. There are actually decent things you can ask for only Pieces but not any other AI tool.

And that’s how we move from short-term recall into explaining the real breakthrough: long-term memory.

Long-term memory

Short-term memory only follows you as long as a single conversation window is open. Once you leave, that scratchpad is erased which is helpful for continuity in the moment, but frustrating when you need to build on work over days or weeks.

Long-term memory is where the revolution truly begins. Instead of resetting at the end of every session, the AI carries forward what it has learned about you: your projects, your preferences, your style. This means you don’t have to re-teach it every time you start a new chat.

For example, imagine telling your assistant once about your team’s goals or your preference for concise answers. With persistent memory, that context doesn’t vanish, it’s there tomorrow, next week, and beyond.

Take a look at Jack, who gives a great demo, how may work in action.

Vector-based contextual memory

Instead of storing raw text, advanced systems break content into numerical "embeddings", like fingerprints of meaning, and save them in searchable databases. This enables fast search and retrieval of relevant context, though embeddings can be noisy, potentially retrieving irrelevant information if not carefully filtered.

Many open-source projects like LangChain and MemGPT store conversations this way.

But what does that mean in practice?

For marketers, sales, customer support, vector-based memory means the assistant can instantly pull in relevant campaign details, customer feedback, or content drafts without you having to dig through old documents. If you’re writing a new launch email, it can surface the exact positioning from last month’s brainstorming session, or remind you of A/B test results you discussed weeks ago. Instead of repeating research or rewriting strategy notes, you get immediate access to the context that matters most.

For engineers, the value is just as clear. Instead of re-explaining a bug or re-documenting an architectural decision, the assistant can surface your own past debugging notes, design discussions, or code snippets related to the problem at hand. When you hit a memory leak or error, it can retrieve the last time you solved something similar, cutting down on repetitive problem-solving and speeding up resolution.

In other words, vector-based memory turns an AI from a clever responder into a context librarian: it doesn’t just know “how” to answer, it knows what past work to bring forward so you can move faster without redoing yesterday’s effort.

Episodic vs. semantic memory

Borrowed from cognitive science, this distinction separates event-based memories ("we discussed X last week") from factual knowledge ("the capital of France is Paris"). An AI might recall you asked for book recommendations (episodic) and that you like science fiction (semantic), then combine both to recommend a new Neal Stephenson novel.

This approach offers more nuanced, human-like interaction but increases system complexity and the risk of confusion between different types of stored information. Today, several tools are experimenting with this dual-memory model:

OpenAI’s ChatGPT blends episodic and semantic memory when it recalls both what you asked in a prior session and your general preferences. This enables experiences like “remembering” that you asked for research help last week (episodic) while also tailoring answers to your known communication style (semantic).
Anthropic’s Claude handles episodic recall through targeted search, while semantic patterns emerge when you consistently provide preferences (like tone or structure). Its design keeps episodic memory user-triggered, giving you more agency over when context is applied.
Pieces scopes episodic memory to project-specific histories (e.g., your debugging sessions in a repo) while semantic memory captures broader facts about your workflows or preferences. This prevents cross-contamination between unrelated contexts.
Academic research projects like MemGPT and frameworks like LangChain Memory explicitly separate episodic (conversation logs, events) and semantic (knowledge embeddings), allowing developers to decide which to prioritize in different applications.

In productivity tools, episodic memory lets an AI recall that “you outlined a pitch deck last Thursday,” while semantic memory reminds it that “your audience prefers concise, bullet-point slides.” Together, the assistant can generate a draft that aligns with both the event and the pattern. In developer workflows, episodic memory brings up the exact conversation where you discussed a tricky bug, while semantic memory recalls the general pattern that you prefer minimal logging in production code. In personal knowledge management, episodic recall could resurface a conversation about a book club meeting, while semantic memory ensures recommendations align with your consistent taste in science fiction.

In short, episodic + semantic memory is about blending contextual events with enduring knowledge. When built well, it turns assistants from note-takers into collaborators who can link what happened with what matters.

What do we get today are the four distinct approaches

#1 ChatGPT

ChatGPT operates with two complementary memory layers. The system maintains explicit "saved memories", facts and preferences you've shared that it considers important. More significantly, since April 2025, it automatically references your entire chat history across all sessions, creating a comprehensive knowledge base of your interactions.

Consider debugging a complex function over several days. Day one, you discuss common errors when processing large JSON files. Day two, encountering a memory error, ChatGPT recalls the previous discussion and connects it to your specific function. Day three, when JSON parsing issues arise, it remembers both contexts and provides targeted recommendations.

And it’s not just for developers.

So let’s say you’re in marketing and you draft messaging for a product launch. A week later, when you’re revising campaign assets, ChatGPT can resurface the exact positioning language you worked on earlier and connect it to new audience insights you’ve shared since. Or if you’re in engineering, you might troubleshoot a build pipeline failure today and, weeks later, encounter a related CI/CD issue, ChatGPT can recall both the error patterns and your previous fixes to recommend a shortcut.

#2 Claude

This architecture prioritizes user agency over convenience while the system maintains comprehensive chat logs but accesses them only when explicitly requested, with complete user control through opt-in settings and genuine deletion capabilities.

So let’s say you’re preparing for a client pitch. You might have encountered a similar request weeks earlier, and Claude can pull up exactly what you discussed about that deck. Or if you’re in engineering, you can ask it to recall the architectural trade-offs you debated before a deployment freeze.

The strength here is that Claude only brings this context forward when you explicitly ask for it, giving you confidence it won’t clutter the conversation with irrelevant details.

Compared to Pieces, though, this approach can sometimes be pricier. With Pieces, you can switch mid-conversation between models depending on the task — not everything requires heavy reasoning or the burning of expensive tokens. By scoping context and flexibly choosing the reasoning engine, Pieces reduces unnecessary cost while keeping your workflow grounded in the right memory at the right time.

#3 Perplexity

Perplexity's beta system combines user-managed explicit memories with automatic interaction logging. You can save specific facts and preferences while the system maintains a searchable library of queries and conversations.

The distinguishing feature is citation-based transparency, when memory influences responses, Perplexity explicitly shows which memories or past interactions contributed to the answer, alongside traditional web sources.

When I tested this myself, Perplexity recalled a past book recommendation and, weeks later, surprised me with an even more relevant follow-up suggestion. The memory setting UI makes it easy to manage what’s stored and what gets removed.

#4 Pieces

At Pieces, we're pioneering a different approach, one grounded in the principle of context with boundaries. Instead of a single memory pool, we scope memories to specific projects or workstreams.

Head-to-head comparisons

For the purposes of demonstrating how artificial long term memory can revolutionize AI agents, we have chosen two tasks that can touch on much of the nuance that we believe users value.

Tasks:

Provide a summary of my work week: A key task that will show us how much coverage of tasks there is, and how close the narrative is to what happened.
Give me a list of all the anime shows I recently watched and how they may relate philosophically to my work and values: This task lets us see how the various agents can achieve memory formation on things that you’d never think at the time of doing that you may need and thus make notes on, i.e. automatic memory formation on key events that I wouldn’t think to document in the moment, or taking the time to document would outweigh potential value in the future unless done religiously.

1. ChatGPT

Coverage: Provides a structured, categorical reconstruction (Research, Writing, Web, Admin, Creative). It leans on general inference patterns from your recent chats rather than direct document retrieval.

Strengths: Clear, readable narrative that feels like a management-friendly weekly report. Balances technical detail (architectures, meta-learning) with softer categories (blogposts, proposals, design).
Weaknesses: Since it lacks true long-term memory of your actual week, it generates a plausible but approximate report. Risks hallucinating emphasis areas that may not reflect your real workload.

2. Claude

Coverage: Much more document-grounded. It references specific proposals, deliverables, and even hours invested. Breaks down time allocation (technical vs. legal), statuses, and milestones.

Strengths: Evidence-based — pulls directly from located files and aligns closely with what you truly did. Provides structured project context (major deliverable + side work).
Weaknesses: Narrower scope; it focuses heavily on the main retrieved project (content moderation proposal) and less on peripheral tasks. Reads like a client deliverable log more than a holistic weekly journal.

3. Perplexity

Coverage: Sparse. It highlights the limitations of available data and doesn’t attempt to over-fill the gaps. Offers recommendations for how to improve tracking in the future.
Strengths: Transparency — avoids hallucination, sets boundaries clearly. Provides “next steps” (e.g., connect calendars, share summaries) that could make future reports more detailed.
Weaknesses: Low immediate utility — the user doesn’t get a detailed report, just a disclaimer and suggestions. Reads more like a diagnostic on system limits than an actual weekly summary.
Best suited for: Honest, minimal reporting when source data is incomplete — but not ideal if you need a usable work summary right away.

4. Pieces

Coverage: Highly detailed, almost like a project management log. Breaks down work into Executive Summary, R&D, Systems/Infrastructure, Team Collaboration, Admin. Includes specific technical troubleshooting (e.g., Kubernetes, GPU node debugging, repo fixes), collaborations, and even personal tasks (car repair, memorial writing).
Strengths: Rich coverage and granularity, tying together technical, professional, and personal spheres. Feels closest to an “AI with long-term memory” — because it recalls context across weeks, tools, and conversations.
Weaknesses: The density can overwhelm but can be refined with further prompt engineering.

Overall:

ChatGPT grounds itself in retrieved conversations, which gives a polished narrative but often overgeneralises and fills gaps with approximations. Claude relies on retrieved documents, making its account accurate for a single project but far too narrow to capture the whole week. Perplexity plays it safe by disclaiming what it can’t infer, leaving me with little more than suggestions. All three fail to form the right overall arc of my week because of their limited memory and scope. Pieces, by contrast, pulls together technical debugging, research, collaborations, and even personal context into a coherent story that feels strikingly close to how I actually lived it demonstrating what it means to act like a true second brain.

1. ChatGPT

Approach: Checks retrieved conversations but finds no record of anime. Asks the user to supply titles, offering to help draw connections afterward.
Strengths: Transparent, doesn’t hallucinate titles, offers a clear path forward (user lists shows, model interprets themes).
Weaknesses: Passive; puts all the burden on the user. No narrative or proactive recall.
Net result: Behaves like a competent assistant but not a memory system.

2. Claude

Approach: Searches across past conversations, notes no specific anime titles were mentioned, then pivots to general comments on AI/ML themes in anime.
Strengths: More proactive than ChatGPT ties general anime culture to philosophical aspects of your research. Shows some effort at extrapolation.
Weaknesses: Still lacks specificity; without remembering your actual viewing, the analysis feels generic.
Net result: Slightly richer than ChatGPT, but still memory-blind and context-poor.

3. Perplexity

Approach: Explicitly states no records exist in its memory. Provides a “Current Anime List Status: none” followed by a generic framework (“if a list is given, here’s how it could map philosophically”).
Strengths: Extreme honesty; avoids fabrications.
Weaknesses: Almost useless without external input. Offers no narrative or connection-building.
Net result: A diagnostic on system limits rather than an answer.

4. Pieces

Approach: Recalls your actual recent anime activity (Summer Hikaru, SKAMATO, Invincible, Dr. Stone, Death Note, Kaiji, etc.) with dates, platforms, and watch history. Then goes further linking specific shows and even your own logged notes/posts to philosophical themes in your work (innovation, resilience, research monoculture, human condition, identity).
Strengths: Rich, precise, and contextual. Doesn’t just remember titles, it weaves them into your intellectual landscape, showing how media consumption relates directly to your professional philosophy.
Weaknesses: Dense; may overdeliver for a casual query. Requires careful reading.
Net result: Acts like a true “second brain,” surfacing and connecting details you wouldn’t manually log.

Key contrasts

ChatGPT: Needs the user to fill in memory gaps.
Claude: Tries to connect themes, but generic without concrete recall.
Perplexity: Admits limits, contributes little.
Pieces: Rich, contextual recall that links your actual viewing to your intellectual framework — something only long-term memory can achieve.

Punchline

The other platforms lack the memory to capture even something as simple as what anime I watched recently, so they either deflect, generalise, or disclaim. Pieces alone remembers the shows, when I watched them, and how they intersect with my thinking—delivering the kind of insight you’d expect from a true second brain.

A marketing campaign doesn't get mixed up with a product launch. A research project remains distinct from casual brainstorming. This project-scoped approach addresses a fundamental problem: context pollution. When everything is remembered equally, nothing is remembered well. Our Long-Term Memory engine (LTM-2.5) operates locally through PiecesOS, ensuring sensitive code and proprietary information remain under developer control.

For developers or vibe coders, who build internal processes, the Model Context Protocol (MCP) connects PiecesOS with AI tools like GitHub Copilot and IDEs, enabling context-aware queries without complex custom integrations. When debugging a memory leak in Node.js, developers can query: "Have I encountered similar memory leaks in Node.js background tasks before?" The system searches project-specific context, returning relevant debugging notes and past solutions.

Our Workstream Activity feature continuously captures tasks, generating searchable summaries every 20 minutes ideal for ANY teams. These rollups can be exported for documentation, used in standup reports, or referenced to maintain context across complex debugging sessions.

If you’re a nerd, like me, recent performance improvements reduced average query times from 2.5 seconds to 0.8 seconds, a 68% improvement measured across datasets of 50,000 code snippets.

The ethics of AI memory

If all your context is stored in the cloud under a single vendor’s control, switching tools is expensive and risky. On-device memory flips that dynamic: your history, preferences, and work context live with you, not in a provider’s data center. That means you can change reasoning engines or assistants without starting over.

Cloud memory is a larger attack surface, exposed to prompt injection or data poisoning attacks at scale. On-device systems minimize that exposure by keeping sensitive context local, making it harder for malicious prompts to embed themselves into persistent knowledge stores.

In practice, this is why platforms like PiecesOS push for a local-first memory architecture. You still benefit from long-term, multimodal context, but you retain control over what’s stored, when it’s shared, and how it connects to external reasoning engines.

The road ahead: augmentation, not dependence

The next generation of AI memory needs to move beyond simple storage and retrieval toward contextual understanding.

This requires combining the best of current approaches:

The continuity of ChatGPT's seamless experience.
The transparency and control of Claude's explicit model.
The modular flexibility of Perplexity's hybrid system.
The contextual grounding of Pieces' project-scoped architecture.

As a researcher, I believe the ultimate goal isn't memory that replaces human judgment, but memory that augments human capability while preserving human agency. The systems that succeed will help us think sharper, work smarter, and live more creatively without making us dependent.

The choice is what kind of memory we want to build and what we want it to mean.

We're moving from Post-it notes to diaries, from ephemeral interactions to persistent partnerships. The memory revolution in AI represents one of the most significant developments since the graphical user interface. As these systems mature, they'll fundamentally change how we interact with information and digital tools. The most successful memory systems will be those that genuinely augment human intelligence rather than attempting to replace human judgment.

Choose your AI memory partner wisely. The conversation you're having today might be the foundation for tomorrow's breakthrough.

Written by

Antreas Antoniou

AI memory explained: what Perplexity, ChatGPT, Pieces, and Claude remember (and forget)

…

Try Pieces

Recent

Dec 4, 2025

Building a daily productivity app with Pieces — Part 2: Adding AI Intelligence with Gemini

Build a daily productivity app with Pieces (Part 2) by adding AI intelligence with Google Gemini, covering architecture, prompts, integrations, and practical tips to ship smarter workflows.

Dec 1, 2025

Building daily stand-up generator using Pieces API — Part 1: The SDK overview

Learn how to build a daily stand-up generator using the Pieces API. This first part of the series covers the SDK overview, key capabilities, and how developers can streamline workflow automation with Pieces.

Nov 27, 2025

How we stopped watching our engineers struggle through stand-ups

Tired of awkward standup meetings where great engineers sound like they did nothing? I automated our team's standups with AI and got 3 people promoted. Here's exactly how we changed standups and made real work visible to managers.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.