How does an LLM generate an answer?

It predicts the most likely next token given everything so far, appends it, and repeats — building the response token by token.

What is attention in a transformer?

A mechanism that lets the model weigh which earlier words matter most for predicting the next one, capturing context and relationships.

Why do LLMs hallucinate?

They optimize for plausible-sounding text based on patterns, not truth, so they can produce confident but incorrect statements.

Artificial Intelligence

How do large language models work?

Large language models work by predicting the next word in a sequence. Trained on vast text using the transformer architecture, they learn statistical patterns of language, then generate responses one token at a time.

See it in motion.

Watch a 2-minute animated lesson that shows exactly how large language models works.

▶ Watch the visual lesson

Step by step

1Text is converted into tokens (numbers); the model processes them through billions of learned parameters.
2The transformer's 'attention' mechanism weighs how each word relates to the others.
3Training adjusts the parameters to predict the next token accurately across huge datasets.
4Because they predict plausible text rather than verify facts, they can 'hallucinate' — sound right but be wrong.

Frequently asked questions

How does an LLM generate an answer?: It predicts the most likely next token given everything so far, appends it, and repeats — building the response token by token.
What is attention in a transformer?: A mechanism that lets the model weigh which earlier words matter most for predicting the next one, capturing context and relationships.
Why do LLMs hallucinate?: They optimize for plausible-sounding text based on patterns, not truth, so they can produce confident but incorrect statements.

How do large language models work?

Step by step

Frequently asked questions

Related topics