### Journey into Analyzing Japanese Television Subtitles

Written byKalanKalan
💡

If you have any questions or feedback, pleasefill out this form

This post is translated by ChatGPT and originally written in Mandarin, so there may be some inaccuracies or mistakes.

Introduction

Have you ever watched television in Japan? In Japan, the broadcast signals contain a wealth of information, including program schedules, video feeds (which can switch resolutions), subtitles, and more. This means you can toggle subtitles on and off on your TV.

But how does this actually work? Today, let's delve into the technicalities of subtitle parsing in Japanese television broadcasting.

The interest in researching this technology stemmed from a previous tweet by Kian, along with my curiosity about whether I could connect a tuner to my computer to watch television. I unexpectedly discovered many technical details worth learning.

Japanese Television Broadcasting

Japan's television broadcasting is governed by a self-established protocol known as ISDB (Integrated Services Digital Broadcasting), which sends signals via wireless base stations. Each household receives the television signals, then decrypts and restores them to their original data format.

B-CAS

To prevent unauthorized copying and protect copyrights, Japanese television signals are encrypted before transmission. This is regulated under ARIB (STD-B25). B-CAS acts like a key; you must have this card to watch television. If you've bought a television in Japan, it usually comes with a B-CAS card; without it, you won't be able to see any content.

Broadcasters sign contracts with manufacturers willing to comply with copying regulations and directly provide these manufacturers with the key information. Manufacturers then produce hardware that includes the key, such as many USB tuners available now, which can receive signals and allow you to watch TV simply by plugging them into a computer. However, these types of tuners often require you to download their dedicated player to function properly, as this prevents casual copying.

MPEG2-TS

MPEG2-TS is a video encapsulation format that is typically used for broadcasts in Japan. The "TS" stands for Transport Stream. Video is usually encoded with h.262, while audio is encoded with AAC. Because h.262 is older, the file sizes tend to be larger for the same video quality. Generally, when we watch videos on our computers in .mp4 format, they are encoded with h.264.

In terms of packet transmission, each TS packet has a maximum size of 188 bytes. This smaller size helps reduce latency and makes error recovery easier, especially since wireless transmission may introduce noise or errors.

You can refer to this diagram for various descriptions of TS packets.

Source: https://www.researchgate.net/figure/Detailed-structure-of-the-MPEG-2-transport-stream-TS_fig16_41949828

Detailed-structure-of-the-MPEG-2-transport-stream-TS

Several important fields include:

  • sync_byte: always 0x47
  • PID: Packet Identifier, typically used to determine the content of the packet
  • PAT: PID Mapping Table
  • PES: Contains subtitles, audio, and video data

Parsing Subtitles

While there are many technical details regarding broadcasting that can be explored, due to space limitations, we'll focus solely on subtitle parsing. The parsing and transmission of subtitles are regulated by ARIB-B24.

The general process is as follows:

  • Read packets until finding 0x47
  • Slice the data into 188 bytes for easier parsing
  • Search for PID mappings from the PAT (with PAT's PID being 0)
  • Once the caption PID is found, begin parsing the data

Data Unit

Each caption data unit contains not only the "text message" but also a lot of additional information, such as color, shape, and the position where the subtitles should appear. This data is differentiated through data units, with the text message data unit being 0x20.

function parseText(data, length) {
  const str = data.slice(0, length + 1);
  let result = "";
  let i = 0;

  while (i < length) {
    if (str[i] === 0x20) {
      result += " ";
      i += 1;
    }

    // // JIS X 0208 (lead bytes)
    if (str[i] > 0xa0 && str[i] < 0xff) {
      const char = str.slice(i, i + 2);
      if (str[i] >= 0xfa) {
        result += parseGaiji(char);
        i += 2;
      } else {
        const decoded = new TextDecoder("EUC-JP").decode(char);
        result += decoded;
        i += 2;
      }
    } else if (Object.values(JIS_CONTROL_FUNCTION_TABLE).includes(str[i])) {
      console.log("JIS_CONTROL_TABLE!");
      i += 1;
    } else if (str[i] >= 0x80 && str[i] <= 0x87) {
      console.log("color map");
      i += 1;
    } else {
      i += 1;
    }
  }

  console.log(result);
  document.querySelector("#result").innerHTML += result + "<br/>";
}

Since the text here is encoded using the JIS-0208 character set, it requires additional decoding. However, in JavaScript, you can use TextDecoder, which conveniently supports EUC-JP, so you can directly use new TextDecoder('EUC-JP').decode.

Additionally, there are "Gaiji" characters in Japanese representation defined by ARIB that do not exist in the JIS-0208 character set. They are primarily used for displaying information or scrolling text, etc. https://zh.wikipedia.org/wiki/ARIB%E5%A4%96%E5%AD%97

![Screenshot 2023-09-30 23.17.08](/Users/chenkaiyi/Desktop/截圖 2023-09-30 23.17.08.png)

Such characters require separate parsing. Once you have successfully parsed the subtitles, the key is determining how to display them on the screen at the correct time, as a timetable is transmitted with the signal for synchronization.

Demo

Due to the inability to upload video data directly, I can only show you the code and screenshots here. If you're interested, you can try implementing it yourself.

![Screenshot 2023-09-30 23.26.40](/Users/chenkaiyi/Desktop/截圖 2023-09-30 23.26.40.png)

https://github.com/kjj6198/ts-arib-parse/tree/master

Other Technical Details

Because h.262 encoding is used, it is not supported by most browsers. If you want to watch it on a webpage, you would either need to use WebAssembly to decode h.262, which would be very CPU-intensive, or download a large WASM file first. This could work for experimentation, but the actual user experience would be quite poor. Therefore, in most cases, you would need to use tools like ffmpeg to convert h.262 to h.264 for playback on the web.

Currently, a well-known open-source solution is mirakurun, which operates by setting up a server that continuously converts broadcast signals to h.264 for transmission, allowing for direct viewing in the browser.

In addition, here are some other open-source solutions for reference:

Conclusion

Even just displaying subtitles involves many technical details worth learning. Throughout this implementation process, I learned a lot, such as the format of MPEG-TS, and I had no prior knowledge of ARIB or ISDB-T, which have actually supported Japanese broadcasting for many years.

After practical experience, you'll find that it's not as difficult as you might imagine (at least regarding subtitle parsing; I have no understanding of restoring the original signal). As long as you patiently read the documentation, you can implement it successfully.

If you found this article helpful, please consider buying me a coffee ☕ It'll make my ordinary day shine ✨

Buy me a coffee