Notes of AirBox Data Management

Ling-Jyh Chen
4 min readOct 6, 2021

--

What is AirBox?

A 27-min talk about the AirBox project

The Architecture of the Data Service Platform

  • Version 1 (2015/10–2016/12)

The first version of the AirBox data service platform is very simple. It consists of an MQTT Broker and a virtual machine. The MQTT Broker is deployed on AWS (under the free tier service), contributed by Wuulong (the founder of LASS). The virtual machine was contributed by LJ, originally located in AS-IIS, and later moved to AWS for better service quality.

AirBox Data Service Platform Architecture: version 1

The virtual machine is the core of the entire data service platform. It has a database server (using MongoDB), MQTT subscribers (monitoring data streams from MQTT Broker), data crawlers (obtaining data from other providers), and MQTT publishers (passing data from crawlers to MQTT brokers).

The data service platform provides two types of data APIs: static JSON files and dynamic PHP programs. The dynamic PHP programs provide direct data access to the database to facilitate various data query requirements. At the same time, in order to reduce the computational overhead of the database, we also provide static JSON files, which are database dumps to generate the latest measurement results of each AirBox in the database.

  • Version 2 (2016/12–2019/5)

The first version of the data service platform was overloaded after one year of service, and there was an urgent need to upgrade the platform to improve its scalability. The main considerations for the design of the second version of the data service platform were: 1) a time-series database that processes IoT-based data streams, and 2) high availability and load balancing to provide reliable and scalable services.

AirBox Data Service Platform Architecture: version 2

For the time series database, we decided to use KairosDB and only kept the last 30 days of data in the database. There was another program to move those “cold” data (more than 30 days) from the online database to offline local storage (using MongoDB) for data archiving.

For high availability and load balancing, we are fortunate to have a group of collaborators (from NCHC, NCTU, NCKU, THU, NTU and UCSD) to contribute their machines to this data service cluster. We installed Cassandra on each participating machine to form a distributed file system, and ran KairosDB on Cassandra to achieve high availability of database services. We also used nginx to provide load balancing of HTTP requests between all participating machines, and we used the Lsync tool to synchronize WWW document folders.

  • Version 3 (2019/5 –now)

The second version of the data service platform was very smooth at the beginning, and a few months later encountered some troubles: 1) The Cassandra distributed file system slowed down due to the increase in data volume and the long network delay among hosts; 2) Part of the participating machines unexpectedly startup and shutdown due to different reasons, resulting in high recovery costs and overall system instability; 3) Some of the participating machines withdrew from the cooperation due to the end of research funding.

AirBox Data Service Platform Architecture: version 3

Therefore, in the design of the third-generation data service platform, we decided to 1) remove the distributed file system and replace KairosDB with influxDB to improve efficiency; 2) remove all participating machines, but one NCHC (powerful, powerful) Reliable and supportable); 3) The data crawler results are directly inserted into the database without redundant MQTT publish/subscribe process; 4) The database and Web services are separated on two GCP machines to improve information security.

The Data Flow

The data flow of the AirBox project

The AirBox data service platform receive data from four channels:

  • MQTT: The data service platform is also a subscriber of the LASS MQTT service. Most of the data forwarded through this channel is contributed by the maker community. The maker community makes low-cost air sensing equipment by themself and uploads the measurement results voluntarily.
  • Secure MQTT: The data service platform maintains a secure MQTT Broker, which only accepts data uploads from trusted devices.
  • RESTful: The data service platform provides RESTful API for data uploads. Most of the data forwarded through this channel is contributed by collaborating projects, such as NTU4AQ and PiM25.
  • Data Crawler: The data service platform is seeking cooperation with other open data low-cost air sensing projects. After reaching an agreement, a customized data crawler is implemented and deployed to obtain data from the cooperation project at a fixed rate (depending on the agreement and the data frequency of the cooperation project), and then incorporate the data into the data platform for further integration and application.

Reference

--

--

Ling-Jyh Chen
Ling-Jyh Chen

Written by Ling-Jyh Chen

A researcher in Academia Sinica, and the PI of the AirBox project

No responses yet