Kalan's Blog

Kalan 頭像照片,在淡水拍攝,淺藍背景

四零二曜日電子報上線啦!訂閱訂起來

Software Engineer / Taiwanese / Life in Fukuoka
This blog supports RSS feed (all content), you can click RSS icon or setup through third-party service. If there are special styles such as code syntax in the technical article, it is still recommended to browse to the original website for the best experience.

Current Theme light

我會把一些不成文的筆記或是最近的生活雜感放在短筆記,如果有興趣的話可以來看看唷!

Please notice that currenly most of posts are translated by AI automatically and might contain lots of confusion. I'll gradually translate the post ASAP

An in-depth understanding of Svelte (0) — What is an Abstract Syntax Tree?

Introduction

This series of articles focuses on exploring the implementation of Svelte's principles. It aims to provide readers with a deeper understanding of Svelte's compilation mechanism and code generation. Since the Svelte compilation process involves code parsing, this article will primarily discuss what an Abstract Syntax Tree (AST) is and further explain the role and importance of the AST.

What is an Abstract Syntax Tree (AST)?

Let's start with the definition from Wikipedia:

In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is "abstract" in that it does not represent every detail appearing in the real syntax.

The AST is important because we want a structured way to describe programming languages (or markup languages) that is convenient for computers to operate on. Let's take an HTML example:

<h1>This is a heading</h1>
<p>This is a paragraph</p>
<ul>
  <li data-item="1">List item</li>
  <li data-item="2">List item</li>
  <li data-item="3">List item</li>
</ul>

In HTML, <xxx> tags represent elements, similar to labels on clothes that briefly describe the information about the clothes. The tags also record the information represented by the HTML.

The reason we need to convert strings into a tree-like structure is that while HTML is easy for humans to understand and has a structure, for computers, it is just a string. Therefore, we need to parse the string beforehand and build a tree-like data structure for easy manipulation. The above HTML can be represented using an Abstract Syntax Tree:

HTML

Once we have transformed HTML into a tree-like structure, we can traverse the tree using algorithms such as tree traversal to perform corresponding operations, making it easy to find specific nodes. For example, if I want to search for li in the tree, I can traverse the tree using an algorithm and return the result when the tagName equals li.

Not only HTML, but other programming languages also follow similar steps based on their defined syntax. SQL, GraphQL, and others also convert strings into Abstract Syntax Trees before performing other operations.

Representation of Abstract Syntax Trees

In the previous paragraph, we used images to represent an Abstract Syntax Tree. However, besides the image representation, an Abstract Syntax Tree may also contain additional information. Taking HTML as an example:

  • Whether it has attributes, which should also be stored in the node
  • Whether it is a self-closing tag (like <input />, <video />, etc.)
  • Parsing position (line number and column number)

If you want to observe actual Abstract Syntax Trees more closely, you can use AST Explorer. It provides various programming languages that can be parsed into Abstract Syntax Trees. Let's take HTML as an example:

Screenshot_2021-02-07 AST explorer(1)

Now let's take a look at the Abstract Syntax Tree of JavaScript:

Screenshot_2021-02-07 AST explorer(2)

Implementation of Abstract Syntax Trees

The implementation of Abstract Syntax Trees can be divided into two main approaches. One is to define the syntax and directly implement it through programming (manual implementation). The other is to define the syntax rules (BNF) and generate parsers using tools like yacc or PEG.js.

I have previously written an article on how to implement a JSON parser from scratch. If you're interested, you can refer to:

I also mentioned some principles about AST in the article about linaria, which you can refer to if interested: Zero-runtime Linaria: CSS-in-JS.

However, implementing an Abstract Syntax Tree correctly is not easy. For relatively simple markup languages like HTML, it is relatively simple. But for programming languages like C, Java, JavaScript, it takes time to implement a correct and error-free Abstract Syntax Tree. It also requires following the specifications. If you want to practice, it is recommended to implement only a part of the syntax.

In fact, in Svelte, besides HTML and Svelte template syntax, other parts like JavaScript and CSS are implemented using parsers written by others. In the upcoming articles, we will mention how to implement a simple Svelte syntax parser, but let's not delve into it here.

If you can't wait to see the implementation, you can refer to the implementation in tiny-svelte, or directly check the source code of Svelte.

Applications of Abstract Syntax Trees

Although the Abstract Syntax Tree is crucial, it alone cannot accomplish much. To compile code, it needs to go through the code generation step. In addition to code compilation, the Abstract Syntax Tree can also be used for:

  • Code highlighting
  • Code autocompletion
  • Prettifying code
  • ESLint

(Additional information provided by former colleague @kai):

  • Writing Babel plugins
  • Static analysis (TypeScript)
  • Code replacement (codemod)

These tools are built upon the foundation of Abstract Syntax Trees, which demonstrates the importance of Abstract Syntax Trees. With a syntax tree in hand, it seems like there's nothing we can't do. The possibilities are endless.

Conclusion

This article primarily explains the existence and applications of Abstract Syntax Trees, hoping to provide readers with a basic understanding of Abstract Syntax Trees. Since the Svelte compilation process requires code to be transformed into Abstract Syntax Trees, it is necessary to explain what an Abstract Syntax Tree is for better comprehension in the subsequent articles.

Prev

2020 Review/2020 Year in Review — Technology

Next

In-depth understanding of Svelte (1) — Svelte Compilation Process

If you found this article helpful, please consider buy me a drink ☕️ It'll make my ordinary day shine✨

Buy me a coffee