TL;DR – Clarity’s Lab Team is excited to release version 2 of our patent-pending PM2.5 Global Calibration! This new and improved version is trained on a much larger dataset of collocated measurements from all over the world, enhancing the calibration’s performance in diverse environments. In this blog post, we explain how we developed this new calibration and share performance metrics that show a significant improvement in sensor accuracy.

Background

Since the release of v1 of the PM2.5 Global Calibration in 2021, our partners have collocated hundreds of our Node-S devices with Federal Equivalent Method (FEM)-grade instruments worldwide. Initially developed to improve performance during wildfire smoke events in the United States, the v1 calibration has proven effective throughout the year and in regions without wildfire smoke. Building on this success, we aimed to leverage a much larger dataset for v2 to enhance accuracy and representativeness even further.

Calibration Development

The Global Collocation Dataset

Clarity’s clients have collocated hundreds of Node-S devices with FEM PM2.5 instruments globally. We carefully reviewed each collocation as part of a rigorous quality-control process, resulting in a dataset comprising 2.4 million hours of collocated Node-S and FEM data from:

  • 84 different cities
  • 623 nodes
  • 98 reference sites

This dataset represents a 12-fold increase in the number of hourly measurements compared to the dataset used for v1!

Figure 1: A map of the 98 reference sites included in the Global Collocation Dataset. Lighter blue dots are sites with data available via AirNow, EEA, or OpenAQ. Darker blue dots represent sites that may not be openly available (the location of these sites has been randomly offset).

Model Development

Our Lab employed Leave One (reference site) Out Cross-Validation to evaluate the performance of different modeling approaches. We tested various modeling techniques, including Gaussian Mixture Regression, Ensemble Forest Models, Neural Networks, and multiple linear regression. Additionally, we experimented with various derived features (e.g., dew point, interaction terms between different metrics).

The Final Calibration

After extensive experimentation, the Lab selected a multiple linear regression model that incorporates several features. This model was chosen for its transparency, ability to extrapolate outside the training domain, and most importantly, its ability to enhance the accuracy of Clarity’s PM2.5 measurements worldwide.

The model combines features measured directly by the Node with features derived from mathematical combinations of these measurements. Notably, no external data (e.g., land use, traffic, air pollution models, or satellite data) are used.

The uncorrected PM2.5 mass concentration output from the Plantower sensor correlates well with gravimetric instruments under stable environmental conditions and consistent particle composition but is affected by changes in these factors. To address this, the calibration model includes:

  • Two additional outputs from the PM sensor for size-resolved particle mass and number concentration. These outputs help detect changes in particle size distribution, addressing variability in particle composition.
  • Three terms related to environmental conditions. These terms account for the uptake of water by particles, which can alter particle sizes and optical properties, addressing the impact of changing environmental conditions.

The features are:

 

Feature Name 

 

Type 

 

Description 

pm2_5MassConc_raw Measured  

The raw PM2.5 mass concentration reading from the Plantower sensor 

pm10MassConc_raw Measured  

The raw PM10 mass concentration reading from the Plantower sensor 

pm1NumConc_raw Measured  

The raw PM1 number concentration reading from the Plantower sensor 

relHumidity_raw Measured  

The raw internal relative humidity readings from the Node-S 

pm_rh_interaction Derived from Node Measurements  

The interaction term (i.e. the product) of raw PM2.5 mass concentration reading and the raw relative humidity reading. 

temperature_minus_dew Derived from Node Measurements  

The difference between the raw temperature reading and the calculated dew point.

After training the model on our Global Collocation dataset, the form of the calibration is:

v2 Calibrated PM2.5 =
	pm2_5MassConc_raw * 0.274821 +
	pm10MassConc_raw * 0.263883 +
	pm1NumConc_raw * 0.171146 +
pm_rh_interaction * -0.004631 +
relHumidity_raw * -0.073857 +
pm_rh_interaction * -0.004631 +
temperature_minus_dew * -0.149043 + 8.076738

Where:
pm_rh_interaction = pm2_5MassConc_raw * relHumidity_raw 
temperature_minus_dew = temperature_raw - dewPoint
Where the dewPoint is calculated from relHumidity_raw and 
temperature_raw using the Magnus formula

How Well Does v2 Perform?

As shown in Figure 2, the v2 PM2.5 Global Calibration significantly enhances Clarity’s PM2.5 measurement performance compared to the already effective v1. The median R² increases to 0.79 (+7% from v1), and the median RMSE decreases to 2.7 μg/m³ (-12% from v1). Additionally, benefitting from the diverse Global Collocation Dataset, v2 generally outperforms custom collocation-based calibrations, indicating higher reliability over a wider range of pollutant concentrations and environmental conditions.

Figure 2: Distribution of monthly performance metrics for different calibrations calculated on the Global Collocation Dataset, covering a wide range of environmental conditions.

Awesome! So How Can We Use It?

If you are currently using the v1 Global calibration profile for PM2.5 your devices will automatically be upgraded to v2 on July 1st, 2024 unless you opt out by emailing support@clarity.io by June 28th, 2024. If you would like to transition to v2 before July 1st, 2024 or switch from a custom calibration to the v2 Global Calibration Profile, please contact support@clarity.io.