Context length in LLMs: how to make the most out of it
Learn how you can get the best out of your LLMs by refining LLM context length to get more accurate and concise results.
Building with AI depends on how well you can instruct the LLM to get the desired output. This is where the concept of context length in LLMs comes into play. It means the maximum amount of text an LLM can process at once, also known as the feature extractor max length ast.
A larger context window allows an LLM to process longer inputs and add more information to its outputs. This can lead to more concise responses, fewer hallucinations, and improved accuracy, but it also has some drawbacks.
In this article, we’ll explain the concept of LLM context length, how we can improve it, and the advantages and disadvantages of varying context lengths.
We will also cover how one can improve model performance by applying specific AI context in copilots.
What is context length in LLMs?
Context length in Large Language Models (LLMs) refers to the maximum number of tokens that a model can process simultaneously. It is the maximum length of the input sequence and can be treated as the memory of the LLM and increasing it can often lead to better LLM performance.
Tokens are the model's method of encoding words into numerical representations through positional encoding.
For instance, approximately 130 tokens represent 100 words. If a model encounters an unfamiliar word, it dissects the word into multiple tokens.
The context length of an LLM determines the maximum volume of information it can accept as input for a query. In simpler terms, a larger context length or LLM context window allows a user to input more information into a prompt to elicit a response.
While it's intuitive to consider LLM context length in terms of words, language models actually quantify content based on token length. Typically, a token corresponds to four characters in English or roughly ¾ of a word. Therefore, 100 tokens equate to about 75 words.
Are new line characters helpful in the LLM context?
The short answer is yes, they are, as they are helping to improve readability, context separation, and logical context handling.
Here are the context lengths of some of the most prominent LLMs:
Llama: 2K
Llama 2: 4K
GPT-3.5-turbo: 4K. (However, GPT-3.5-16k has a context length of 16K.)
GPT-4: 8K. (Similarly, the max content length for GPT-4-32k is upto 32K.)
Mistral 7B: 8K
Palm-2: 8K
Gemini: 32k token context length.
How to set context length
Context length in LLMs can be set either by the training process from the GUI or by using their APIs.
Most LLMs will have the `max_seq_len` parameter and you can update the context length, by updating this parameter.
For example, suppose you have to update the context length in OLlama. In that case, you will need to go through the model's configuration file (e.g., config.json) find the max_sequence_length parameter, and then adjust the context length parameter (n_ctx) to the desired value.
What is input sequence length vs query length in LLM?
To understand LLM context length better, you also need to understand what input sequence length and query length mean.
Input sequence length refers to the total number of tokens in the entire input query, and system prompts, and can even consider the previous context.
Query length is the number of tokens that are present in the LLM prompt a user sends to the model.
Here’s an example:
User prompt: "Who is the President of India?”
Input sequence length: This could be longer if the model considers the previous context.
Query length: 5 tokens ("Who” “is” “the” “President” “of”)
Challenges of having large context windows vs using retrieval instead
AI may generate inaccurate results. To deal with this, we opt for techniques like increasing context length or creating a Retrieval Augmented Generation (RAG).
Both methods have their own upsides and downsides. Having a large context window leads to a rise in costs along with increased use of computational resources for processing extensive contexts since LLM providers charge per token, and a long context (i.e., more tokens) makes each query pricier.
Whereas Retrieval systems, it is not resource intensive and can be done at a reduced cost, it can be harder to integrate and it also has a dependency on other data sources which may again be outdated or inaccurate.
Setting context length through AI copilot
In the above paragraphs, we learned how setting context can lead to better output and accuracy. Let’s take a step forward, and learn how to do it with the AI Copilot.
Pieces for Developers is an AI tool that can run on browsers and IDEs and help generate code, remember context, and improve general code management. The Pieces Copilot, which is the chat-like assistant, is already contextually aware.
Let’s say you want to make some changes to your code. It can understand the contents of the open file and then suggest changes or updates accordingly.
However, you can improve the response even more by adding context from folders, files, code snippets, websites, and messages of your choice. All of this can be done locally, which also adds to increased security.
Let’s take a look at how you can set the context in Pieces Copilot inside Pieces for Developers:
Go to Copilot chat on your Pieces Desktop app. You can navigate to the Copilot by searching in the search bar.
Click "Set your Context" at the bottom of the copilot chat as shown in the image below.
Choose how you want to set your AI context:
Long-term memory can shadow your day-to-day work, capture relevant workflow materials, and add that as context to your Pieces Copilot.
Use the code snippets that you've previously created and saved to Pieces to assist you in asking questions about your code
You can upload your code folders for the LLM to take document structure into context.
If you're interested in learning how to build your own copilot using Pieces OS SDK and add context to it, read the linked blog post and join the Discord.
Setting context in different developer tools with Pieces Copilot
Pieces Drive can be used with IDEs of your choice like VS Code, Jetbrains, Obsidian, JupyterLab, Chrome, and other web extension integrations.
Similar to the Desktop App, you can use files, long-term memory context, folders, and snippets for context in AI conversations with your personalized copilot.
In some of our integrations like VS Code, you can utilize directives to quickly reuse materials as context. You can create your own custom creative, which allows you to define your own frequently used context sets for your questions and there are some default directives such as:
@recent utilize the recently opened files as context
@workspace utilize your current workspace as context
If the copilot used a file as relevant context, it will show it in the chat window, and you can click on it to view it.
How to handle long context in LLMs
When conducting an LLM context window comparison, it's important to understand the various tradeoffs.
Through this article, we have learned about context lengths in LLM and how it can help us get more concise and accurate results. However, we have also learned that it can lead to some issues like increased cost. Some other ways to handle long context are:
Already using an assistant like Pieces that is contextually aware and has memory retention powers.
Chunking and summarization
Retrieval Augmented Generation (RAG)
Here are some resources that you can read to learn more about context lengths:
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Extending Context Window of Large Language Models via Positional Interpolation
This article was first published on January 31st, 2024 and was improved by Haimantika Mitra as of January 2nd, 2025 to improve your experience and share the latest information.