What is an LLM

What is it

An LLM (Large Language Model) is an AI model trained to understand and generate text. It does this by predicting what comes next: given a context, it calculates which token is most likely to follow.

That seemingly simple idea is what makes it possible for a model to answer questions, write code, translate languages, summarize documents, or hold a coherent conversation. It's all text prediction at scale.

"Large" isn't just a marketing term — it refers to the number of parameters (the numerical values the model learns during training). Modern models have anywhere from billions to trillions of parameters. That scale is what gives them generalization ability: they don't memorize answers, they learn patterns from human language.

Mental model

Think of a system that has read practically the entire internet, every digitized book, all the public code on GitHub. Not to memorize, but to learn the structure of language: which words go together, how arguments are built, what typically follows "the result was".

When you ask it a question, the model doesn't "look up" the answer in a database. It constructs the answer token by token, choosing at each step what makes the most sense given what it's already written.

flowchart LR A[Input prompt] --> B[Tokenization] B --> C[Probability calculation\nby the transformer] C --> D[Next token selection] D --> E{Done?} E -- No --> C E -- Yes --> F[Complete response]

A token isn't a word — it's a fragment of text. "developer" might be 2 tokens. A special character can be 1. On average, a token is about 4 characters in English. This matters because models have a limit on how many tokens they can process at once — the context window.

How it's used

The most direct way to use an LLM is through an API. You send a message, you get a response.

from anthropic import Anthropic
 
client = Anthropic()
 
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain what tokenization is in one sentence."}
    ]
)
 
print(response.content[0].text)

What you send is called a prompt. How you construct it directly affects the quality of the response — that's what prompting is about.

LLMs can also serve as the foundation for more complex systems: agents that use tools, RAG pipelines that combine them with knowledge bases, or workflows where multiple models collaborate. But all of that builds on this foundation: a model that predicts text.

When to use it / when not to

Use it when:

You need to process or generate text with flexibility: summarizing, translating, classifying, extracting information.
The task requires reasoning over natural language.
The input might be ambiguous or in unstructured formats.
You want a conversational interface to expose functionality in your system.

Don't use it when:

You need deterministic precision. LLMs are probabilistic — for exact calculation, use code.
The task requires real-time, up-to-date information. The model has a knowledge cutoff.
You're handling sensitive data that can't leave your infrastructure (unless using on-premise models like Llama).
You need full auditability of why a specific decision was made.

What is a token — the basic unit LLMs process
What is a context window — how much information the model can "see" at once
What is an embedding — how LLMs represent meaning numerically
Prompting — how to instruct the model to get better results
What is an AI agent — how LLMs are used as the engine of more complex systems

External resources

Resources are managed from the admin panel.

History and evolution

Show history

Language models have existed since the 1990s, but they were statistical and limited — predicting the next word based on frequency counts.

The fundamental leap came in 2017 with Google's "Attention Is All You Need" paper, which introduced the transformer architecture. The attention mechanism allowed models to efficiently relate distant words in a text — something recurrent networks (LSTM, GRU) did poorly.

In 2018, OpenAI published GPT-1 and Google published BERT — the first large pre-trained models that could be fine-tuned for specific tasks. Until then, every NLP task required a model trained from scratch.

GPT-2 (2019) and GPT-3 (2020) showed that scaling the model and data produced emergent capabilities nobody had explicitly designed — reasoning, few-shot learning, translation without specific training.

ChatGPT (2022) was the moment all of this reached the general public. It used RLHF (Reinforcement Learning from Human Feedback) to align the model with human preferences, making it more useful in conversation.

Today the ecosystem includes proprietary models (GPT-4, Claude, Gemini) and open models (Llama, Mistral, Gemma) with comparable capabilities across many tasks.