Engineering

Apr 10, 2025

An under the hood look at how Pieces implemented an MCP server

At Pieces we’ve just announced our new MCP server, allowing you to interact with your Pieces Long-Term Memory from any MCP client, such as Cursor or GitHub Copilot.

This unlocks interesting and novel prompts to drive agentic developer workflows, like “Based on the discussions I had yesterday with Ellie, make the relevant updates to my package manifest to use the newly released versions”. This will contact the Pieces MCP server, which will return the relevant memories with the package versions discussed in a conversation with Ellie, then use the agent in your IDE to make the updates to your manifest files.

In this post I take a small look under the hood at how we implemented this, particularly the prompt engineering that went into making this server as simple to use as possible.

What does the Pieces MCP server do?

I assume by now you are already up on MCP. It’s been going viral all over the tech space, but if you need to learn more, check out our blog post on what it is and why everyone is talking about it.

The goal of the Pieces MCP server is to help drive deeper integrations of the Pieces Long-Term Memory with the tools you use everyday. It’s important to us that you can access your Pieces Long-Term memories from the IDEs, browsers, and productivity apps that you are already using, to avoid context switching and the corresponding productivity drop that comes with it.

By making your memories available over MCP, you can use the agent chat that is built into tools like Cursor, GitHub Copilot, Windsurf, or Cline to interact with Pieces, instead of having to switch to a different extension. This is very powerful, allowing you to use the memories returned by Pieces in your agentic workflows, using prompts that take the memories and use them to drive code changes.

The Pieces MCP server retrieves memories without processing them, sending them back to the client for processing using the client’s LLM. This is similar to how the Pieces copilot works, with Pieces retrieving memories as context for your copilot chat, then sending these to your selected LLM.

The advantage of this method is you can create prompts that retrieve memories and process them in their entirety using other context that the MCP client has, such as calling other MCP servers or interacting with code.

The technical implementation

Now for the more interesting part, lets dive into the technical implementation of the Pieces MCP server.

Transport

For the transport layer, we decided to use the SSE transport, server-sent events. The reason for this is that PiecesOS is already running, and makes it easy to expose the relevant endpoint over HTTP.

If we used stdio, we’d have to ship another component that you would need to manually install and manage, with all the relevant dependencies. For example, there are a number of MCP servers out there that are run using NPM, which means the user has to install and manage Node to use the server. We already have PiecesOS running to capture your memories, so it made sense to connect directly to it over SSE.

The downside of SSE is weirdly at the moment it is not supported in Claude Desktop. I say weirdly, Anthropic, the makers of Claude, invented MCP, so you would have thought they would have the best support 🤷. If you want to use Pieces with Claude, then there are many open source MCP gateways that convert stdio to SSE, such as github.com/lightconetech/mcp-gateway.

SSE has excellent support in IDEs and tools like Cursor, GitHub Copilot, Windsurf, Cline, and Goose.

Tools

Currently the Pieces MCP server has one tool - ask_pieces_ltm. You can see the details of this tool by making a tools/list request, or using your MCP client. This is the tool that calls Pieces, returning a JSON object with details of the retrieved Long-Term memories. You might be able to see the output of this tool in your MCP client when it is called. This output isn’t really designed to be human readable, instead it is designed to be read by an LLM and processed.

For example, if you ask “I need a status report for a stand up meeting covering the work I was doing yesterday. Create a report with 5 bullet points of the main tasks I was working on”, the ask_pieces_ltm tool would return summaries of the work you were doing yesterday, and the LLM that calls the tool would then process this to create the report with the 5 bullet points.

Tool description

When you add an MCP server to a client application, the tool details are sent to any LLM that supports tool calling. This includes a tool description in natural language that defines the tool, and it is this prompt that is used by the LLM to determine which tool to call.

At the time of writing, the description for the ask_pieces_ltm tool is:

Ask Pieces a question to retrieve historical/contextual information from the user’s environment.

This in itself is enough to help the LLM to know when to call the tool. If you ask a question that the LLM perceives as needing historical or contextual information from the user's environment, then it will call the tool. Prompts like “What was I doing yesterday”, “ Summarize my conversations with Mark last week”, or “What code change did Ellie ask me to make?” all are related to the user's environment.

Parameters

When you define a tool, you also need to define the parameters that get passed to the tool by the MCP client LLM. Pieces has a number of parameters that the LLM extracts from the prompt and passes over.

Like tools, the parameters also have descriptions to help guide the LLM to provide the correct values.

question

This is the question that is passed to Pieces to use to extract Long-Term Memories. The description is:

The user’s direct question for the Pieces LTM. Always include the exact user query if they request historical or contextual information.

The LLM will extract the question from the prompt, and pass this over. It will only pass over the part of the prompt that is relevant to calling the Pieces LTM tool.

For example, if you ask “What was I doing yesterday?”, then the question will be the full “What was I doing yesterday?”.

If you ask “What was the database package and version Sam asked me to use? Update my project file to install it”, then the LLM will probably determine that the first sentence is relevant to the Pieces tool, and the second sentence is relevant to driving an agentic workflow, so the question passed to Pieces would just be “What was the database package and version Sam asked me to use?”.

time_ranges

Pieces has temporal grounding, allowing you to ask questions with time ranges. From the Pieces copilot in our desktop app or extensions, you can either ask a question with a time range, such as “What was I doing last week”, or specify a time range with the time picker.

MCP clients don’t have a time picker we can use, so we rely on the calling LLM to extract any time range information from the prompt and send that to us. The description for this parameter is:

The time is <current time> in Local time.This is an array of json objects with 2 required properties `from`, `to`, and `phrase` that provides a time range if the user asks about context at a specific time.
For instance: `what was I doing yesterday` would include `from` which would be a timestamp in utc that would be the starting place on the time line and span the full 24 hours of yesterday, the ending place would be the `to` value. and the `phrase` would be yesterday.
Return your answer in Local time.'

The <current time> value is replaced with the current time, and Pieces sends a notifications/tools/list_changed message every hour to tell the MCP client to reload this tool to get an updated time.

So if you asked “What was I doing yesterday” and the current date was April 8th 2025, the parameter would be:

"time_ranges": [
    {
      "from": "2025-04-07 00:00:00",
      "to": "2025-04-07 23:59:59",
      "phrase": "yesterday"
    }
  ]

application_sources

When Pieces captures memories, it tracks the application that the memory was captured from. This supports enhanced privacy by allowing you to block or delete memories from certain applications. This also allows you to be specific, asking things like “Summarise my conversation with Ali in Teams” to just get a conversation from Microsoft Teams, rather than conversations in GitHub issues in your browser or over email.

Like time ranges, this is available either in your prompt, or as a selector in the Pieces copilot. As the selector is not available in your MCP client, Pieces will use the client’s LLM to help get the sources to drive the memory retrieval. The description for this is:

You will provide use with any application sources mentioned in the user query is applicable. IE if a user asks about what I was doing yesterday within Chrome, you should return chrome as one of the sources.
If the user does NOT specifically ask a question about an application specific source then do NOT provide a source here.
If the user asks about website or web application that could be found in either a browser or in a web application then please provide all possible sources. For instance, if I mention Notion, I could be referring the the browser or the Web application so include all browsers and the notion sources if it is included in the sources.
Here is a set of the sources that you should return <sources>

The <sources> is updated to contain a list of sources that Pieces has captured memories from. This way the sources lines up with the expected sources, the same as you would see in the source picker in the Pieces copilot. Just like with the time, this is updated and the updates are detected by the client listening for a notifications/tools/list_changed message.

Call the MCP server

Once you have the Pieces MCP server connected to your client, you should be able to call it by using any prompt that refers to needing historical or contextual information from your environment. Phrases like “what was I doing” or “summarize my conversation with” help guide the LLM to choose the Pieces tool. Prompts like “How do I install a nuget package” will not, whereas “How do I install the nuget package Susan recommended to me in Superhuman yesterday” will.

You can also be more explicit and use prompts like “Ask Pieces to summarize..”. The “Ask Pieces” explicitly in the prompt will guide the LLM to use the ask_pieces_ltm tool, so if the prompts you are using don’t call Pieces, this is a way to ‘force’ it to happen. Though be aware, LLMs are non-deterministic so there are no guarantees the tool will be called.

Depending on the tool you are using, you may be able to see the parameters sent to Pieces, as well as the raw response.

Cost of calling the MCP server

When you use an MCP client to call an MCP server there is a potential cost associated with the extra tokens and extra LLM calls. When you add an MCP server to your client, the client needs to send the details about all the tools to the LLM with every LLM call. This means the descriptions, tool and parameter names, and so on, all get passed to the LLM increasing the token cost. Once the LLM decides to call a tool, the response it gets back is sent to the next LLM call to process. This means a ‘simple’ query like “What did I do yesterday” means:

The client makes an initial LLM call, passing the prompt as well as the details of the Pieces tool
The LLM returns to the client, saying it needs to call the tool
The client calls Pieces and gets a detailed response with the relevant memories
The client calls the LLM passing the response from Pieces and the prompt to get a response.

If your client charges by the token, then every use of an MCP server will cost you money. Something to be aware of, potentially disabling tools when you are not using them. For example, if you are using Cursor to make some code changes that don’t need Pieces Long-Term memories, then disable the MCP server to reduce your token usage, re-enabling it when you need it.

Use Pieces in your MCP powered workflow!

If you are not already using Pieces, then give it a try. Once you are up and running with Pieces, add it to your MCP client of choice, check our documentation for details on how to do this.

Also please share your thoughts on MCP and what having Pieces available over MCP unlocks for you on X, Bluesky, LinkedIn, or our Discord.

Written by

Jim Bennett

An under the hood look at how Pieces implemented an MCP server

…

Get started

Recent

Aug 13, 2025

How does gpt-oss compare to Gemma 3n architecture?

Inside our ML team’s week-long debate on OpenAI’s newly open-sourced GPT-OSS models versus Google’s Gemma3N architecture, from kernels and quantization tricks to efficiency, multimodality, and the quiet arrival of local AI’s future.

Aug 12, 2025

Visionary AI investor Flat Capital Invests in Pieces to Accelerate Artificial Memory For Individuals and the Enterprise

We’re thrilled to welcome Flat Capital as a new investor in Pieces. Learn more about this exciting partnership and what it means for the future of local-first AI.

Aug 12, 2025

From IDE to deployment: 9 Best AI tools for Python

We put the top AI tools for Python coding to the test, not just to see which writes code the fastest, but which actually feels good to use, fits into your workflow, and makes building in Python more enjoyable.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.

our newsletter

Sign up for The Pieces Post

Check out our monthly newsletter for curated tips & tricks, product updates, industry insights and more.