typst/ARCHITECTURE.md

# Typst Compiler Architecture
Wondering how to contribute or just curious how Typst works? This document
covers the general architecture of Typst's compiler, so you get an understanding
of what's where and how everything fits together.

The source-to-PDF compilation process of a Typst file proceeds in four phases.

1. **Parsing:** Turns a source string into a syntax tree.
2. **Evaluation:** Turns a syntax tree and its dependencies into content.
4. **Layout:** Layouts content into frames.
5. **Export:** Turns frames into an output format like PDF or a raster graphic.

The Typst compiler is _incremental:_ Recompiling a document that was compiled
previously is much faster than compiling from scratch. Most of the hard work is
done by [`comemo`], an incremental compilation framework we have written for
Typst. However, the compiler is still carefully written with incrementality in
mind. Below we discuss the four phases and how incrementality affects each of
them.


## Parsing
The syntax tree and parser are located in `src/syntax`. Parsing is a pure
function `&str -> SyntaxNode` without any further dependencies. The result is a
concrete syntax tree reflecting the whole file structure, including whitespace
and comments. Parsing cannot fail. If there are syntactic errors, the returned
syntax tree contains error nodes instead. It's important that the parser deals
well with broken code because it is also used for syntax highlighting and IDE
functionality.

**Typedness:**
The syntax tree is untyped, any node can have any `SyntaxKind`. This makes it
very easy to (a) attach spans to each node (see below), (b) traverse the tree
when doing highlighting or IDE analyses (no extra complications like a visitor
pattern). The `typst::syntax::ast` module provides a typed API on top of
the raw tree. This API resembles a more classical AST and is used by the
interpreter.

**Spans:**
After parsing, the syntax tree is numbered with _span numbers._ These numbers
are unique identifiers for syntax nodes that are used to trace back errors in
later compilation phases to a piece of syntax. The span numbers are ordered so
that the node corresponding to a number can be found quickly.

**Incremental:**
Typst has an incremental parser that can reparse a segment of markup or a
code/content block. After incremental parsing, span numbers are reassigned
locally. This way, span numbers further away from an edit stay mostly stable.
This is important because they are used pervasively throughout the compiler,
also as input to memoized functions. The less they change, the better for
incremental compilation.


## Evaluation
The evaluation phase lives in `src/eval`. It takes a parsed `Source` file and
evaluates it to a `Module`. A module consists of the `Content` that was written
in it and a `Scope` with the bindings that were defined within it.

A source file may depend on other files (imported sources, images, data files),
which need to be resolved. Since Typst is deployed in different environments
(CLI, web app, etc.) these system dependencies are resolved through a general
interface called a `World`. Apart from files, the world also provides
configuration and fonts.

**Interpreter:**
Typst implements a tree-walking interpreter. To evaluate a piece of source, you
first create a `Vm` with a scope stack. Then, the AST is recursively evaluated
through trait impls of the form `fn eval(&self, vm: &mut Vm) -> Result<Value>`.
An interesting detail is how closures are dealt with: When the interpreter sees
a closure / function definition, it walks the body of the closure and finds all
accesses to variables that aren't defined within the closure. It then clones the
values of all these variables (it _captures_ them) and stores them alongside the
closure's syntactical definition in a closure value. When the closure is called,
a fresh `Vm` is created and its scope stack is initialized with the captured
variables.

**Incremental:**
In this phase, incremental compilation happens at the granularity of the module
and the closure. Typst memoizes the result of evaluating a source file across
compilations. Furthermore, it memoizes the result of calling a closure with a
certain set of parameters. This is possible because Typst ensures that all
functions are pure. The result of a closure call can be recycled if the closure
has the same syntax and captures, even if the closure values stems from a
different module evaluation (i.e. if a module is reevaluated, previous calls to
closures defined in the module can still be reused).


## Layout
The layout phase takes `Content` and produces one `Frame` per page for it. To
layout `Content`, we first have to _realize_ it by applying all relevant show
rules to the content. Since show rules may be defined as Typst closures,
realization can trigger closure evaluation, which in turn produces content that
is recursively realized. Realization is a shallow process: While collecting list
items into a list that we want to layout, we don't realize the content within
the list items just yet. This only happens lazily once the list items are
layouted.

When we a have realized the content into a layoutable element, we can then
layout it into _regions,_ which describe the space into which the content shall
be layouted. Within these, an element is free to layout itself as it sees fit,
returning one `Frame` per region it wants to occupy.

**Introspection:**
How content layouts (and realizes) may depend on how _it itself_ is layouted
(e.g., through page numbers in the table of contents, counters, state, etc.).
Typst resolves these inherently cyclical dependencies through the _introspection
loop:_ The layout phase runs in a loop until the results stabilize. Most
introspections stabilize after one or two iterations. However, some may never
stabilize, so we give up after five attempts.

**Incremental:**
Layout caching happens at the granularity of the element. This is important
because overall layout is the most expensive compilation phase, so we want to
reuse as much as possible.


## Export
Exporters live in `src/export`. They turn layouted frames into an output file
format.

- The PDF exporter takes layouted frames and turns them into a PDF file.
- The built-in renderer takes a frame and turns it into a pixel buffer.
- HTML export does not exist yet, but will in the future. However, this requires
  some complex compiler work because the export will start with `Content`
  instead of `Frames` (layout is the browser's job).


## IDE
The `src/ide` module implements IDE functionality for Typst. It builds heavily
on the other modules (most importantly, `syntax` and `eval`).

**Syntactic:**
Basic IDE functionality is based on a file's syntax. However, the standard
syntax node is a bit too limited for writing IDE tooling. It doesn't provide
access to its parents or neighbours. This is a fine for an evaluation-like
recursive traversal, but impractical for IDE use cases. For this reason, there
is an additional abstraction on top of a syntax node called a `LinkedNode`,
which is used pervasively across the `ide` module.

**Semantic:**
More advanced functionality like autocompletion requires semantic analysis of
the source. To gain semantic information for things like hover tooltips, we
directly use other parts of the compiler. For instance, to find out the type of
a variable, we evaluate and realize the full document equipped with a `Tracer`
that emits the variable's value whenever it is visited. From the set of
resulting values, we can then compute the set of types a value takes on. Thanks
to incremental compilation, we can recycle large parts of the compilation that
we had to do anyway to typeset the document.

**Incremental:**
Syntactic IDE stuff is relatively cheap for now, so there are no special
incrementality concerns. Semantic analysis with a tracer is relatively
expensive. However, large parts of a traced analysis compilation can reuse
memoized results from a previous normal compilation. Only the module evaluation
of the active file and layout code that somewhere within evaluates source code
in the active file needs to re-run. This is all handled automatically by
`comemo` because the tracer is wrapped in a `comemo::TrackedMut` container.


## Tests
Typst has an extensive suite of integration tests. A test file consists of
multiple tests that are separated by `---`. For each test file, we store a
reference image defining what the compiler _should_ output. To manage the
reference images, you can use the VS code extension in `tools/test-helper`.

The integration tests cover parsing, evaluation, realization, layout and
rendering. PDF output is sadly untested, but most bugs are in earlier phases of
the compiler; the PDF output itself is relatively straight-forward. IDE
functionality is also mostly untested. PDF and IDE testing should be added in
the future.

[`comemo`]: https://github.com/typst/comemo/