AI & LLM

Feb 19, 2025

What are LLM parameters and what is their role?

Language models, both large and small, have their size measured in parameters. The big, cloud models, like GPT-4o use trillions of parameters, whereas smaller models you can run locally have billions of parameters.

An illustration comparing Large Language Models (LLMs) with different parameter sizes. On the left, a small LLM is represented by a compact neural network with fewer nodes and connections, symbolizing limited capacity and computational power

Language models, both large and small, have their size measured in parameters. The big, cloud models, like GPT-4o use trillions of parameters (1.8 trillion according to rumor, with the actual value not published), whereas smaller models you can run locally have billions of parameters.

But what are ‘parameters’? Let’s find out.

What is a parameter?

The size of generative AI models is measured in parameters – these are the numbers that make up the internals of the model. Internally these models are complex neural networks, which have interconnected nodes, and as values pass from node to node, the parameters are used to transform the number.

Example – parameters in a simple mathematical model

For example, imagine we wanted to create a model to estimate the price of a Lego set (Lego being a huge passion of mine, especially Star Wars Lego). We could start with a model that uses the number of pieces in a set to gauge the price.

number_of_pieces * <parameter> = price

In this model, the price of a set is equal to the number of pieces in the set, multiplied by some number. This number is a parameter for the model. We have one input (number_of_pieces), one parameter, and one output (price).

Taking the Ultimate Collectors Series Millennium Falcon as an example, it has 7,541 pieces, and costs $849.99, giving:

7541 * <parameter> = 849.99

Which is:

7541 * 0.1127 = 849.99

The value for our parameter therefore is 0.1127.

If we consider the discontinued Super Star Destroyer, a set with 4,784 pieces, our formula will give:

4784 * 0.1127 = 539.23

Now this discontinued product is currently selling for around $1699.99. This is higher than the original retail price because it is discontinued. This formula falls down here, giving a substantially lower price. There’s probably another parameter we could add here related to if the product is discontinued or not.

This would give a formula of:

(number_of_pieces * <parameter 1>) + (is_discontinued * <parameter 2>) = price

We now have an additional input, is_discontinued, and an additional parameter. We could set the is_discontinued input to 0 or 1, and multiply by the parameter. Some algebra later, and we have:

(number_of_pieces * 0.1127) + (is_discontinued * 1160.76) = price

Now if we plug our values into this formula, we have:

Millennium Falcon: (7541 * 0.1127) + (0 * 1160.76) = 849.99

Super Star Destroyer: (4784 * 0.1127) + (1 * 1160.76) = 1699.99

We have created ‘knowledge’ of the price of Lego in 2 numbers. Now this is a drastic over-simplification, but gives you a basic idea of how parameters contribute to the model.

We have ‘trained’ the model by working out the values of these parameters. To store this model, we would need to save 2 things — a representation of the formula, so we know what inputs are needed, and where in the equation they go, and the parameters.

Parameters in neural networks

The models used by LLMs are neural networks – essentially interconnected neurons and as numbers enter one neuron, they are multiplied by a parameter and sent to the next neuron. If you have worked with neural networks before, you may have heard of this referred to as weights. Parameters are more than just weights however, they include other values that influence the model.

There can be multiple connections between neurons, with multiple inputs and outputs. For example, we could model our Lego price function using neurons:

Diagram illustrating multiple connections between neurons in a neural network, showcasing multiple inputs and outputs. The image models a Lego price function using interconnected neurons to represent computational relationships

The arrows here represent the parameters, with the neuron adding up the values of number_of_pieces * parameter 1 and is_discontinued * parameter 2.

If you look at other Lego sets, then once again the formula falls down – the identity and landscape kit has only 2,808 pieces yet is $789.99. The Eiffel Tower has 10,001 pieces but is only $629.99. So there must be other contributing factors to the price.

To get a better model, we could probably use a neural network with multiple layers fed by all the inputs. Something like this:

Diagram of a neural network with multiple layers processing various inputs to model Lego set pricing. The image represents how additional factors beyond piece count contribute to price variations

I’ll leave it as an exercise for the reader to work out the correct inputs, network layout, and parameters, but this shows the basic idea. In this case, we have 6 inputs to the price, and with all the connections between 3 layers of neurons, we have 22 parameters.

Assuming this gives us a better model, we’ve created something 11 times the size, with more information encoded in it to help give a better output.

Now imagine instead of storing the price of Lego sets, we wanted a model to represent a massive corpus of human knowledge. These networks are orders of magnitude more complex than this, with multiple layers doing different tasks.

This means they need orders of magnitude more parameters – in the billions. If you were to write out all the parameters of a typical small model in a notebook you would need about 35,000,000 pages – a notebook about a mile thick!

If you want a more detailed overview of the transformer architecture used in LLMs, then I recommend you read Attention is all you need, the original paper from Google introducing this architecture.

Why numbers? I thought these were language models

Computers are fancy calculators – they work by doing arithmetic on numbers. Even when we have text, a computer needs to treat the text as numbers.

The first way text was stored was ASCII values – ASCII is the American Standard Code for Information Interchange, first published in 1963. This is essentially a look-up table of numbers from 0-128 that represent letters, numbers, and other characters.

This allows you to store and manipulate text, then render it on screen or to a printer by looking up the value, then having code to draw the corresponding character.

For example, if you wanted to store the string “Hello World!”, you would actually store [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]. 72 is the ASCII code for a capital H, 101 for a lower case e, 33 for an exclamation point, and so on. Any character you need would be stored as 1-byte.

LLMs use a different technique to represent text, called tokenization. Rather than representing individual characters as numbers, tokenization represents words or parts of words as tokens.

For example, if I had the sentence “Lego Star Wars is the best type of Lego, especially the Millennium Falcon”, then this would be converted to 15 tokens — [43, 15332, 11307, 25778, 382, 290, 1636, 1490, 328, 92395, 11, 6980, 290, 119341, 94314].

These tokens include words, punctuation, and even the spaces between words.

You can try this yourself — OpenAI provides a tokenizer site where you can enter text and see the tokens based on a selection of their models.

These LLMs have encoder and decoder models built in. These convert the text from your prompt to tokens for the inputs, then the model processes the tokens as numbers, then finally the tokens that come out of the model are decoded back to text.

So if we pass in “Lego Star Wars is the best type of Lego, especially the Millennium Falcon”, then encoder model converts this to [43, 15332, 11307, 25778, 382, 290, 1636, 1490, 328, 92395, 11, 6980, 290, 119341, 94314], which is processed by the model, which then will return something like [35037, 11, 480, 382, 290, 1636, 92395, 13], which will be decoded back to “Correct, it is the best Lego.”.

Parameters and quantization

We’ve looked at parameters, and seen how these are stored as numbers. But from a computer science perspective, what do we mean by storing them as a number?

There are multiple ways to store a number, each taking up different amounts of space.

For example, do you store a parameter as a 1-byte integer, from 0-255, or as a 32-bit floating point number with a range of -3.4028235 × 1038 to +3.4028235 × 1038 or something in between?

Typically you will train a model with 32-bit floats. This means every parameter takes up 4 bytes of storage space on disk or 4 bytes of memory. Think of this with a 1.8 trillion parameter model like GPT-4o, needing 7.2 trillion bytes to store the model or 7.2 terabytes.

If you still use physical media, think of a stack of 144 XBox game disks just to store the model parameters.

You can make models smaller, using a technique called quantization. This is the process of converting the values into smaller numbers to reduce both the storage space and memory needed to run the model.

For example, small language models that can run locally are often quantized to INT_4, or 4-bit integers, representing a value from 0-15.

Quantization isn’t a simple process – you don’t take the entire range of a 32-bit floating point number, divide it into 16 ranges, and then assign a number from 0-15 depending on which range the original value falls in.

Instead, a range of optimizations are used, for example just looking at the range of actual values in the model and aligning to those – so if most of the weights fall between 0 and 16,000,000 you could use 0 as 0-999,999, 1 as 1,000,000-1,999,999 and so on.

If you have a small number of values outside this range, you can clip them to the minimum or maximum value, so a parameter of 20,000,000 would be clipped to 16,000,000.

Model sizes

Models come in a range of sizes. The cloud models, for example, those by OpenAI or Google often don’t publish their parameter counts or quantization. You can also get smaller models, known as Small Language Models, or SLMs, that are small enough to download and run offline.

For these, the parameter counts and quantization are known.

Cloud-based models are in the hundreds of billions of parameters, or trillions, and will be 32-bit floats. GPT-4o is rumored to be 1.8 trillion parameters, Google Gemini 1.5 Pro has reports of everything from 200 billion to 1.5 trillion parameters.

DeepSeek v3 is available with 671 billion parameters.

For small language models, the parameter counts are in the low billions – mainly to allow them to run on desktops and laptops. Popular models include Microsoft Phi-4 at 14 billion parameters, or Llama 3.2 in 1 billion, 3 billion, 11 billion, and 90 billion variants.

The different sizes are sometimes related to capabilities built into the model. For example, with Llama 3.2, the 1 and 3 billion parameter variants can only handle text, but the 11 and 90 billion variants include vision capabilities to understand images.

Finding the right model

Parameters are how LLMs store information – they are the weights in the neural network that encode the information that the model is trained on.

The mode parameters, the mode information encoded, and in theory the better the model. Large models have hundreds of billions if not trillions of parameters, smaller ones are in the low billions and can be run on device.

With Pieces, you have access to a wide range of models, from large cloud models to smaller on-device models. Give some of them a try, see which ones work best for you, and share your findings with me on X, Bluesky, LinkedIn, or our Discord.