If you have any questions or feedback, pleasefill out this form
This post is translated by ChatGPT and originally written in Mandarin, so there may be some inaccuracies or mistakes.
Introduction
This series of articles focuses on exploring the implementation principles of Svelte, aiming to provide readers with a deeper understanding of Svelte's compilation mechanism and code generation. Since the Svelte compilation process involves code parsing, this article will primarily discuss what an Abstract Syntax Tree (AST) is and further explain its role and significance.
What is an Abstract Syntax Tree (AST)?
Let’s start with a definition from Wikipedia:
In computer science, an Abstract Syntax Tree (AST) is a tree representation of the abstract syntactic structure of source code. Each node of the tree denotes a construct occurring in the source code. The term "abstract" refers to the fact that the syntax does not represent every detail of the actual syntax.
The importance of the Abstract Syntax Tree lies in our desire to have a structured way to describe programming languages (or markup languages) that makes it easier for computers to manipulate. Let’s take a look at an example of HTML
:
<h1>This is a title</h1>
<p>This is a paragraph</p>
<ul>
<li data-item="1">List item</li>
<li data-item="2">List item</li>
<li data-item="3">List item</li>
</ul>
In HTML, tags like <xxx>
are referred to as tags
, which succinctly describe the information about the content, similar to labels on clothes. The tags
also contain information that the HTML represents.
The need to convert strings into a tree structure arises because, while HTML
is easy for humans to understand and has a clear structure, to computers it is just a string. Therefore, we must first parse the string to create a tree-like data structure for easier manipulation. The above HTML can be represented using an Abstract Syntax Tree:
Once HTML
is transformed into a tree structure, we can traverse the tree using algorithms to visit each node and perform corresponding operations. For instance, if I want to search for li
elements within the tree, I can utilize a tree traversal algorithm and return results when tagName
equals li
.
Not only HTML, but other programming languages also follow similar steps based on their defined syntax, such as SQL, GraphQL, etc., which also convert strings into an Abstract Syntax Tree before performing further operations.
Representation of Abstract Syntax Trees
In the previous section, we used an image to represent an Abstract Syntax Tree. However, in addition to that, an Abstract Syntax Tree may also store other information. Taking HTML as an example:
- Whether there are
attributes
, and if so, they should also be stored within the nodes - Whether it is a
self-closing
tag (like<input />
,<video />
, etc.) - The position of parsing (which line and column)
For those interested in a closer examination of an actual Abstract Syntax Tree, you can explore AST Explorer, which allows you to parse various programming languages into Abstract Syntax Trees. Here, we’ll use HTML as an example:
Next, let’s take a look at the Abstract Syntax Tree for JavaScript
:
Implementation of Abstract Syntax Trees
The implementation of Abstract Syntax Trees mainly falls into two categories: one is to define the syntax manually and implement it directly through programming (handwritten); the other involves defining syntax rules (BNF) and then generating parsers using generators like yacc
or PEG.js
.
I previously wrote about how to create a JSON parser from scratch, and if you're interested, you can check it out here:
In my earlier article introducing linaria
, I also mentioned some principles related to ASTs, which you may find interesting as well.
However, implementing an Abstract Syntax Tree correctly is not easy. For simpler markup languages like HTML
, it’s relatively straightforward, but for programming languages like C
, Java
, and JavaScript
, creating a correct Abstract Syntax Tree requires time and must adhere to the specifications described in the documentation. If you’re looking to practice, I recommend implementing a portion of the syntax instead.
In fact, within Svelte
, aside from manually writing parsers for HTML and Svelte template syntax, other languages like JavaScript and CSS rely on parsers developed by others. Future articles will discuss how to implement a simple Svelte
syntax parser, but I won't go into detail here.
If you’re eager to see some implementations, you can refer to my project tiny-svelte, or directly check out the Svelte source code for implementation details.
Uses of Abstract Syntax Trees
While Abstract Syntax Trees are crucial, having just the syntax tree alone doesn’t accomplish much. To compile code, it also requires a code generation step. Beyond code compilation, Abstract Syntax Trees can be used for:
- Code highlighting
- Code auto-completion
- Code prettification
- ESLint
(Additionally, contributions from former colleague @kai include)
- Writing Babel plugins
- Static analysis (TypeScript)
- Code transformation (codemod)
These tools are fundamentally built upon Abstract Syntax Trees, highlighting their importance. With an Abstract Syntax Tree in hand, it feels like anything is possible—imagination knows no bounds.
Conclusion
This article primarily explained the existence and uses of Abstract Syntax Trees, aiming to provide a basic understanding of them. Since the Svelte compilation process first converts code into an Abstract Syntax Tree, it is essential to explain what an Abstract Syntax Tree is for better comprehension of upcoming articles.
If you found this article helpful, please consider buying me a coffee ☕ It'll make my ordinary day shine ✨
☕Buy me a coffee