Kalan's Blog

Kalan 頭像照片,在淡水拍攝,淺藍背景

四零二曜日電子報上線啦!訂閱訂起來

Software Engineer / Taiwanese / Life in Fukuoka
This blog supports RSS feed (all content), you can click RSS icon or setup through third-party service. If there are special styles such as code syntax in the technical article, it is still recommended to browse to the original website for the best experience.

Current Theme light

我會把一些不成文的筆記或是最近的生活雜感放在短筆記,如果有興趣的話可以來看看唷!

Please notice that currenly most of posts are translated by AI automatically and might contain lots of confusion. I'll gradually translate the post ASAP

Write a JSON parser from scratch (2)

In part1, we mentioned how to write a JSON parser and implemented the string parsing functionality. Now let's add the remaining functions. (In fact, once you understand the basic principles, implementing the remaining functions is just following the same pattern.)

Number

json-grammer

Implementing the number function is not difficult. The tricky parts are handling decimal points, negative numbers, floating-point numbers, and exponential notation (e.g., 1e6). (By the way, I just realized that E can also be used.)

function number(parser) {
  let str = "";
  if (parser.current() === "-") {
    str += "-";
    parser.index += 1;
  }

  let curr = "";
  while (((curr = parser.current()), curr >= "0" && curr <= "9")) {
    str += curr;
    parser.index += 1;
  }

  let isFloat = false;
  // float number
  if (parser.next(".")) {
    str += ".";
    isFloat = true;
    while (((curr = parser.current()), curr >= "0" && curr <= "9")) {
      str += curr;
      parser.index += 1;
    }
  }

  // exponential expression
  let expo = "";
  if (parser.next("e")) {
    curr = "";
    if (parser.next("-")) {
      expo += "-";
    }

    while (((curr = parser.current()), curr >= "0" && curr <= "9")) {
      expo += curr;
      parser.index += 1;
    }
  }

  if (expo) {
    return isFloat
      ? parseFloat(str, 10) * Math.pow(10, +expo)
      : parseInt(str, 10) * Math.pow(10, +expo);
  }

  return isFloat ? parseFloat(str, 10) : parseInt(str, 10);
}
  • In the first part, we check if it is a negative number.

  • In the second part, we run a while loop to extract the number part.

  • In the third part, we check if there is a decimal point.

    • If there is, we extract the decimal part.
  • In the fourth part, we check if there is an exponential expression (using uppercase or lowercase e).

    • If there is, we extract the exponential part.
  • In the fifth part, we convert the string to a number (using parseInt or parseFloat).

Keywords (true, false, null)

function keyword(parser) {
  if (parser.next("true")) {
    return true;
  } else if (parser.next("false")) {
    return false;
  } else if (parser.next("null")) {
    return null;
  }
}

This part is simple. We check if the value matches any of the keywords.

Array

json-grammer

function array(parser) {
  const arr = [];

  if (parser.current() === "[") {
    parser.next("[");
    parser.skip();

    if (parser.next("]")) {
      return arr;
    }
    let i = 0;
    while (parser.current()) {
      const val = value(parser);
      arr.push(val);

      parser.skip();

      if (parser.current() === "]") {
        parser.next("]");
        return arr;
      }
      parser.next(",");
      parser.skip();
    }
  }

  return arr;
}
  • In the first part, we check if it starts with [.

    • If we encounter ], it means it is an empty array.
  • In the second part, we run a while loop and call the value function to parse and add values to the array.

  • If we encounter ], it means the array ends, so we return the array.

  • If we encounter a comma, it means there is another element, so we continue parsing.

This is almost it. You can take a look at the implementation in the Repository to see the code in action. You will notice that one of the tests, specical-character, fails because there might be escape characters in the string. Let's try to implement it.

const escape = {
  '"': '"',
  t: "\t",
  r: "\r",
  "\\": "\\",
};

while (((curr = parser.current()), curr)) {
    if (parser.next('"')) {
      return str;
    } else if (curr === "\\") {
      parser.index += 1;
      const escapeCh = parser.current();
      if (escape[escapeCh]) {
        str += escape[escapeCh];
      }
    } else {
      str += curr;
    }
    parser.index += 1;
  }

We create an escape character table and replace the corresponding characters with their actual values. Here, we only implemented \t and \r. This way, we pass the basic JSON tests 🍻. However, besides the mentioned escape characters, we also need to implement \u, which represents Unicode. This functionality is quite important.

Custom Feature: templatetemplate

Since we are writing our own parser, we can add new syntax! Let's say we want to implement a template feature, where any variable enclosed in $$ will be replaced with the corresponding value from the passed object. For example:

{
  "name": $name$
}

will become:

new Parser(string, { name: 'kalan' }).parse();
// { name: "kalan" }

Implementation

function template(parser) {
  parser.skip();
  if (parser.next("$")) {
    parser.skip();

    if (parser.next("$")) {
      throw new Error("template can not be empty");
    }
    let curr = "";
    let key = "";
    while (((curr = parser.current()), curr)) {
      if (parser.next("$")) {
        return parser.variables[key];
      }
      key += curr;
      parser.index += 1;
    }
  }
}
  • First, we match $.
  • Then, we start reading the content until we see $.
  • When we encounter $, we stop the while loop and replace the template variable with the corresponding variable value, and return the result.

You can check the detailed implementation in the template branch. Take a look at the test results in the test/template folder, too.

Conclusion

By writing our own parser, we can express complex implementations more easily in terms of syntax. We can even extend existing grammars (like JSON in this case) with our desired functionality. Although parsing itself is important and interesting, parsing a language is just the first step. It's like converting JSX into JavaScript code without the help of React, which wouldn't be useful; or transforming SQL into an abstract syntax tree without implementing a database, which would be a bit pointless. The purpose of parsing a language is to facilitate further processing (executing queries, rendering to the DOM).

In fact, there are now many libraries that help you skip the parsing part, such as the famous Bison or PEG.js, which automatically generate a stable parser for you based on a BNF-like syntax, saving you time in parsing and allowing you to focus directly on language implementation.

Our JSON parser in this case doesn't convert to an abstract syntax tree and then generate the final result. So, in the next stage, we will try to parse simple HTML and convert it into a syntax tree, then render it using JavaScript's DOM API.

Prev

Write a JSON parser from scratch (1)

Next

Technology always comes from humanity (Svelte Society: Questions Questions Notes)

If you found this article helpful, please consider buy me a drink ☕️ It'll make my ordinary day shine✨

Buy me a coffee