TL;DR—Clarity has generated a new and improved PM₂.₅ calibration model to account for elevated particulate matter air pollution during the 2021 wildfire season in Western North America. This blog post describes the approach we took in developing this new model and provides performance metrics for the model, which represent a substantial improvement in sensor performance during periods of elevated ambient particulate matter.
Background: Why we generated a new and improved calibration model for the 2021 wildfire season
With droughts intensifying and record-setting heat plaguing Western North America, we unfortunately anticipate a long and challenging wildfire season; in fact, some of our partners have already begun to be impacted by smoke. And, while the summer and fall are often smoky out west, it’s not often this smoky this early.
In fact, as I write this at a coffee shop in Reno, NV I can see smoke from the Beckwourth Complex headed east across the horizon; I suspect we’ll have our first day of smoke-filled skies this season today.
It’s with this in mind that we have developed an improved correction model for the 2021 wildfire season. In 2020 Clarity released its first iteration of a PM₂.₅ calibration model to improve the performance of our sensors during wildfire smoke events. This model performed well during the 2020 wildfire season, but over the past year, we have established collocations at a much broader range of locations across the Western United States, providing us with a more robust dataset with which to improve on the previous iteration of our wildfire calibration model.
The new wildfire calibration model developed for PM₂.₅ for 2021 is the following:
Particulate Matter (PM₂.₅) Mass Concentration - Calibrated =
-0.501 * Particulate Matter (1) Mass Concentration - Raw +
-0.266 * Particulate Matter (2.5) Mass Concentration - Raw +
0.307 * Particulate Matter (10) Mass Concentration - Raw +
-0.321 * Particulate Matter (1) Number Concentration - Raw +
-2.987 * Particulate Matter (2.5) Number Concentration - Raw +
3.825 * Particulate Matter (10) Number Concentration - Raw +
-0.051 * Relative Humidity - Raw +
The model provides a substantial improvement in sensor performance and particularly reduces the overestimation that Plantower sensors (the PM sensor used in the Clarity Node-S) often demonstrate during wildfire smoke episodes. The model was developed using historical data and performed well against a subset of this dataset that we reserved for testing purposes. Of course, the performance will also be tested in this 2021 wildfire season.
This post below describes the approach we took in developing this new model and provides performance metrics for the model.
Developing Training and Testing Datasets
Using collocated Clarity Node-S devices at 13 reference sites (see map below) across five states (California, Idaho, Montana, Oregon, and Washington) we generated a dataset of collocated particulate matter measurements for the period from July 2019 to July 2021. We separated the dataset into model training and model testing datasets. The one location in Montana was kept out of the training dataset to serve as a completely independent site and test how well the model translates across different geographies and climates.
Map of the reference site collocations used to develop Clarity's 2021 wildfire calibration model
Since not all data from those sites during that time period were during wildfire events, we wanted to ensure that both the training and testing datasets had sufficient data points from high-concentration periods to train and test the model. We used a daily PM₂.₅ concentration of 35.5 µg/m³ (corresponding to the Unsafe for Sensitive Groups category for PM₂.₅ AQI), to divide the days into high and low concentrations. For example, a daily PM₂.₅ concentration of 35.5 µg/m³ or greater for at least one site would automatically classify that date—along with all data reporting on that date—into the high pollution category.
After classifying days by high or low PM₂.₅, we randomly divided the data where 80% of the dates were assigned to be in the training dataset and 20% were in the model testing dataset. After combining the two 80% subsets (from both the high and low-concentration groups), we were able to obtain our final training and testing datasets.
The final train/test split, in the end, was closer to 75/25 rather than 80/20 as on any given day, there may have been more devices reporting. In addition, the Montana site was completely excluded from the training dataset.
For this analysis, we only considered multiple linear regressions using the following pool of independent variables from the Clarity Node-S:
- Particulate Matter (1) Mass Concentration - Raw
- Particulate Matter (2.5) Mass Concentration - Raw
- Particulate Matter (10) Mass Concentration - Raw
- Particulate Matter (1) Number Concentration - Raw
- Particulate Matter (2.5) Number Concentration - Raw
- Particulate Matter (10) Number Concentration - Raw
- Temperature - Raw
- Relative Humidity - Raw
Some argue that the multicollinearity of these parameters poses an issue; however, multicollinearity is primarily an issue when one is using regression for inferring causation, not when one is using it for prediction.
To choose what combination of the above independent variables provided the best predictive power (measured by R² and Root-Mean-Square Error [RMSE]) for every combination of the above dependent variables, we performed 10-fold cross-validation repeated 5 times on the training dataset. The final model selected was the one shown above.
Test Dataset Results
The test dataset includes 49,529 hourly measurements from different sensors. From this test dataset, 15,704 hours were from the Montana site. We’ll look at the performance metrics for the full test dataset and just the Montana site below.
We used the recently released USEPA (United States Environmental Protection Agency) PM₂.₅ Performance Targets as a guide in presenting the metrics for daily-averaged PM₂.₅. While the USEPA guidance was not meant for this application, the target metrics are shown below as a comparison, given the limited availability of guidance on low-cost sensor (LCS) performance.
The calibrated daily PM₂.₅ values calculated using the updated wildfire model at the Montana site met the performance targets set by the USEPA (Table 1).
For the uncalibrated PM₂.₅ data, not all the performance targets were met—but this collocation also did not meet all of the USEPA’s guidelines for performance testing. The USEPA guidelines recommend conducting evaluations during a month where at least one daily PM₂.₅ concentration is equal to or greater than 25 µg/m³. At the Montana site, only from July through September 2020 was this daily concentration requirement met.
The low particulate matter concentrations across many of these sites may be one reason why not all the USEPA performance targets are being met for the uncalibrated data. For instance, from January through March of 2021 at the Montana site, the maximum PM₂.₅ concentration was only 12.26 µg/m³.
Table 1. Daily averaged PM₂.₅ results for the Montana site (uncalibrated and calibrated) in the Clarity test data as compared to the USEPA performance targets for low-cost sensors
At higher time resolutions, we would expect more noise in the data and we do see a small change in the metrics as compared to the daily PM₂.₅ results. But the calibrated hourly PM₂.₅ data at the Montana site and for the full test dataset also meet the US EPA performance targets (Table 2).
Table 2. Hourly averaged PM₂.₅ results for the full test dataset and the Montana-only site
Regularly adjusted calibration models ensure the best possible sensor performance during wildfire season
With the proliferation of low-cost sensor technology around the world and the consequent availability of expanded air quality datasets, calibrated low-cost sensor performance is improving every year. Clarity recognizes the importance of proper calibration to ensure the most accurate sensor data possible and to adhere to protocols such as the USEPA’s Air Sensor Performance Targets and Testing Protocols. We regularly adjust the regional and event-specific calibration models that we apply to our sensors around the world to ensure the best possible performance of our sensors.
If you are a Clarity customer with sensors located in Western North America, our team will be in touch over the coming weeks to help you determine whether you would like this improved calibration model applied to your sensors for the 2021 wildfire season.