從無到有寫一個 JSON 解析器（2）

💡

If you have any questions or feedback, pleasefill out this form

Number
Keywords (true, false, null)
Array
Custom Feature: $template$
1. Implementation
Conclusion

This post is translated by ChatGPT and originally written in Mandarin, so there may be some inaccuracies or mistakes.

In part1, we discussed how to write a JSON parser and implemented the functionality for parsing strings. Now, we will add the other functions. (In fact, once you understand the basic principles, the implementation of the remaining functions is straightforward.)

Number

json-grammer

Implementing the number functionality isn't too difficult. The often-overlooked aspects are handling decimals, negatives, floating-point numbers, and exponential notation (like 1e6). (By the way, I just discovered that E can also be used.)

function number(parser) {
  let str = "";
  if (parser.current() === "-") {
    str += "-";
    parser.index += 1;
  }

  let curr = "";
  while (((curr = parser.current()), curr >= "0" && curr <= "9")) {
    str += curr;
    parser.index += 1;
  }

  let isFloat = false;
  // float number
  if (parser.next(".")) {
    str += ".";
    isFloat = true;
    while (((curr = parser.current()), curr >= "0" && curr <= "9")) {
      str += curr;
      parser.index += 1;
    }
  }

  // exponential expression
  let expo = "";
  if (parser.next("e")) {
    curr = "";
    if (parser.next("-")) {
      expo += "-";
    }

    while (((curr = parser.current()), curr >= "0" && curr <= "9")) {
      expo += curr;
      parser.index += 1;
    }
  }

  if (expo) {
    return isFloat
      ? parseFloat(str, 10) * Math.pow(10, +expo)
      : parseInt(str, 10) * Math.pow(10, +expo);
  }

  return isFloat ? parseFloat(str, 10) : parseInt(str, 10);
}

In the first part, we check if it is a negative number.
In the second part, we run a while loop to capture the numeric part.
In the third part, we determine if there is a decimal point.
- If there is, we loop again to capture the digits.
In the fourth part, we check for exponential notation (either uppercase or lowercase e).
- If present, we loop again to capture the digits.
The fifth part converts the string into a number (using parseInt or parseFloat).

Keywords (true, false, null)

function keyword(parser) {
  if (parser.next("true")) {
    return true;
  } else if (parser.next("false")) {
    return false;
  } else if (parser.next("null")) {
    return null;
  }
}

This part is quite simple; we just check if the value matches.

Array

json-grammer

function array(parser) {
  const arr = [];

  if (parser.current() === "[") {
    parser.next("[");
    parser.skip();

    if (parser.next("]")) {
      return arr;
    }
    let i = 0;
    while (parser.current()) {
      const val = value(parser);
      arr.push(val);

      parser.skip();

      if (parser.current() === "]") {
        parser.next("]");
        return arr;
      }
      parser.next(",");
      parser.skip();
    }
  }

  return arr;
}

In the first part, we check if it starts with [.
- If we encounter a ], it indicates an empty array.
In the second part, we run a while loop executing the value function and adding the results to the array.
Encountering ] signifies the end of the array, and we return the array.
A comma indicates that there are more elements, so we continue processing.

With that, we're almost there. You can take a look at the code implementations in the Repository. You’ll notice that one of the special-character tests fails because there may be some escape characters within strings. Let's try implementing that.

const escape = {
  '"': '"',
  t: "\t",
  r: "\r",
  "\\": "\\",
};

while (((curr = parser.current()), curr)) {
    if (parser.next('"')) {
      return str;
    } else if (curr === "\\") {
      parser.index += 1;
      const escapeCh = parser.current();
      if (escape[escapeCh]) {
        str += escape[escapeCh];
      }
    } else {
      str += curr;
    }
    parser.index += 1;
  }

We create a table of escape characters and replace them with their corresponding implementations. So far, we have only implemented \t and \r. This means we have successfully passed the basic JSON tests 🍻. However, in addition to the characters mentioned, we also need to implement \u to represent Unicode, which is quite an important feature.

Custom Feature: $template$

Since we are writing the parser ourselves, we can certainly add new syntax! Let’s say we want to implement a templating feature where any variable wrapped in $$ will be replaced by the corresponding value from the passed object. For example:

{
  "name": $name$
}

will become:

new Parser(string, { name: 'kalan' }).parse();
// { name: "kalan" }

Implementation

function template(parser) {
  parser.skip();
  if (parser.next("$")) {
    parser.skip();

    if (parser.next("$")) {
      throw new Error("template cannot be empty");
    }
    let curr = "";
    let key = "";
    while (((curr = parser.current()), curr)) {
      if (parser.next("$")) {
        return parser.variables[key];
      }
      key += curr;
      parser.index += 1;
    }
  }
}

First, we match $.
We start reading the content until we see another $.
When we encounter $, we stop the while loop and replace the template variable with its corresponding value, returning the result.

You can check the detailed implementation on the template branch and review the test results located in the test/template folder.

Conclusion

By writing our own parser, we can express more complex implementations with simpler syntax. We can even extend existing grammars (like JSON in this case) to add features we want. While this may not be super practical, the goal is to showcase what can be achieved through parsing.

Although parsing itself is crucial and interesting, parsing a language is just the first step. For instance, merely converting JSX to JavaScript code without React's support is not very useful; transforming SQL into an abstract syntax tree without a database implementation also feels somewhat lacking. The purpose of parsing a language is to facilitate subsequent processing (executing queries, rendering to the DOM).

In fact, there are already many libraries available that help you bypass the parser entirely, such as the well-known Bison or PEG.js, which allow you to use syntax similar to BNF to automatically generate a stable parser, saving you time and enabling you to focus directly on language implementation.

Our JSON parser does not convert into an abstract syntax tree to yield the final result, so in the next phase, we will attempt to parse simple HTML and convert it into a syntax tree, then render it using JavaScript's DOM API.

← Write a JSON parser from scratch (1)Technology always comes from humanity (Svelte Society: Questions Questions Notes) →

If you found this article helpful, please consider buying me a coffee ☕ It'll make my ordinary day shine ✨

☕Buy me a coffee

Write a JSON parser from scratch (2)

Table of Contents

Number

Keywords (true, false, null)

Array

Custom Feature: $template$

Implementation

Conclusion

Table of Contents

Write a JSON parser from scratch (2)

Table of Contents

Number

Keywords (true, false, null)

Array

Custom Feature: templatetemplatetemplate

Implementation

Conclusion

Table of Contents

Custom Feature: $template$