Goto

Collaborating Authors

 pm2


A Two sided Calibration Theorem

Neural Information Processing Systems

Theorem 2. Suppose that the predictive distribution Q has the sufficient ability to approximate the true unknown distribution P, given data is i.i.d. Eqn. ( 13) holds by minimizing the MMD loss L B.1 Baselines MC-Dropout (MCD) [ 12 ]: A variant of standard dropout, named as Monte-Carlo Dropout. Epistemic uncertainties can be quantified with a Monte-Carlo sampling sample by using dropout during the test phase in the network without changing NNs model itself. For all experiments, the dropout probability was set at 0.3. The conventional MSE loss is used in this method.


Air Pollution Forecasting in Bucharest

Şerban, Dragoş-Andrei, Smădu, Răzvan-Alexandru, Cercel, Dumitru-Clementin

arXiv.org Artificial Intelligence

Air pollution, especially the particulate matter 2.5 (PM2.5), has become a growing concern in recent years, primarily in urban areas. Being exposed to air pollution is linked to developing numerous health problems, like the aggravation of respiratory diseases, cardiovascular disorders, lung function impairment, and even cancer or early death. Forecasting future levels of PM2.5 has become increasingly important over the past few years, as it can provide early warnings and help prevent diseases. This paper aims to design, fine-tune, test, and evaluate machine learning models for predicting future levels of PM2.5 over various time horizons. Our primary objective is to assess and compare the performance of multiple models, ranging from linear regression algorithms and ensemble-based methods to deep learning models, such as advanced recurrent neural networks and transformers, as well as large language models, on this forecasting task.


Long-Term PM2.5 Forecasting Using a DTW-Enhanced CNN-GRU Model

Naeini, Amirali Ataee, Naeini, Arshia Ataee, Mohammadi, Fatemeh Karami, Ghaffarpasand, Omid

arXiv.org Artificial Intelligence

Reliable long-term forecasting of PM2.5 concentrations is critical for public health early-warning systems, yet existing deep learning approaches struggle to maintain prediction stability beyond 48 hours, especially in cities with sparse monitoring networks. This paper presents a deep learning framework that combines Dynamic Time Warping (DTW) for intelligent station similarity selection with a CNN-GRU architecture to enable extended-horizon PM2.5 forecasting in Isfahan, Iran, a city characterized by complex pollution dynamics and limited monitoring coverage. Unlike existing approaches that rely on computationally intensive transformer models or external simulation tools, our method integrates three key innovations: (i) DTW-based historical sampling to identify similar pollution patterns across peer stations, (ii) a lightweight CNN-GRU architecture augmented with meteorological features, and (iii) a scalable design optimized for sparse networks. Experimental validation using multi-year hourly data from eight monitoring stations demonstrates superior performance compared to state-of-the-art deep learning methods, achieving R2 = 0.91 for 24-hour forecasts. Notably, this is the first study to demonstrate stable 10-day PM2.5 forecasting (R2 = 0.73 at 240 hours) without performance degradation, addressing critical early-warning system requirements. The framework's computational efficiency and independence from external tools make it particularly suitable for deployment in resource-constrained urban environments.


An Analysis of Temporal Dropout in Earth Observation Time Series for Regression Tasks

Miranda, Miro, Mena, Francisco, Dengel, Andreas

arXiv.org Artificial Intelligence

Missing instances in time series data impose a significant challenge to deep learning models, particularly in regression tasks. In the Earth Observation field, satellite failure or cloud occlusion frequently results in missing time-steps, introducing uncertainties in the predicted output and causing a decline in predictive performance. While many studies address missing time-steps through data augmentation to improve model robustness, the uncertainty arising at the input level is commonly overlooked. To address this gap, we introduce Monte Carlo Temporal Dropout (MC-TD), a method that explicitly accounts for input-level uncertainty by randomly dropping time-steps during inference using a predefined dropout ratio, thereby simulating the effect of missing data. To bypass the need for costly searches for the optimal dropout ratio, we extend this approach with Monte Carlo Concrete Temporal Dropout (MC-ConcTD), a method that learns the optimal dropout distribution directly. Both MC-TD and MC-ConcTD are applied during inference, leveraging Monte Carlo sampling for uncertainty quantification. Experiments on three EO time-series datasets demonstrate that MC-ConcTD improves predictive performance and uncertainty calibration compared to existing approaches. Additionally, we highlight the advantages of adaptive dropout tuning over manual selection, making uncertainty quantification more robust and accessible for EO applications.



Air Quality PM2.5 Index Prediction Model Based on CNN-LSTM

Guo, Zicheng, Wu, Shuqi, Zhu, Meixing, Guandi, He

arXiv.org Artificial Intelligence

With the intensification of global climate change, accurate prediction of air quality indicators, especially PM2.5 concentration, has become increasingly important in fields such as environmental protection, public health, and urban management. To address this, we propose an air quality PM2.5 index prediction model based on a hybrid CNN-LSTM architecture. The model effectively combines Convolutional Neural Networks (CNN) for local spatial feature extraction and Long Short-Term Memory (LSTM) networks for modeling temporal dependencies in time series data. Using a multivariate dataset collected from an industrial area in Beijing between 2010 and 2015 -- which includes hourly records of PM2.5 concentration, temperature, dew point, pressure, wind direction, wind speed, and precipitation -- the model predicts the average PM2.5 concentration over 6-hour intervals. Experimental results show that the model achieves a root mean square error (RMSE) of 5.236, outperforming traditional time series models in both accuracy and generalization. This demonstrates its strong potential in real-world applications such as air pollution early warning systems. However, due to the complexity of multivariate inputs, the model demands high computational resources, and its ability to handle diverse atmospheric factors still requires optimization. Future work will focus on enhancing scalability and expanding support for more complex multivariate weather prediction tasks.


Inside a plan to use AI to amplify doubts about the dangers of pollutants

The Guardian

An industry-backed researcher who has forged a career sowing doubt about the dangers of pollutants is attempting to use artificial intelligence (AI) to amplify his perspective. Louis Anthony "Tony" Cox Jr, a Denver-based risk analyst and former Trump adviser who once reportedly claimed there is no proof that cleaning air saves lives, is developing an AI application to scan academic research for what he sees as the false conflation of correlation with causation. Cox has described the project as an attempt to weed "propaganda" out of epidemiological research and perform "critical thinking at scale" in emails to industry researchers, which were obtained via Freedom of Information Act requests by the Energy and Policy Institute, a non-profit advocacy group, and exclusively reviewed by the Guardian. He has long leveled accusations of flimsiness at research linking exposure to chemical compounds with health dangers, including on behalf of polluting interests such as cigarette manufacturer Philip Morris and the American Petroleum Institute – a fossil fuel lobbying group he has even allowed to "copy edit" his findings. Both the tobacco and oil industries have a history of weaponizing scientific uncertainty, experts say, with some arguing that similar tactics drive the Trump administration's current deregulatory efforts. The president's May "gold standard" science order, for instance, empowered his appointees to "correct scientific information" and "discipline" those who breach the administration's views, prompting outrage from some scientists. Cox has obtained funding to develop the new AI reviewer from the American Chemistry Council (ACC), the nation's largest chemical industry advocacy group, which counts oil and chemical giants such as Exxon and DuPont as members.


Instructor-Worker Large Language Model System for Policy Recommendation: a Case Study on Air Quality Analysis of the January 2025 Los Angeles Wildfires

Gao, Kyle, Lu, Dening, Li, Liangzhi, Chen, Nan, He, Hongjie, Xu, Linlin, Li, Jonathan

arXiv.org Artificial Intelligence

The Los Angeles wildfires of January 2025 caused more than 250 billion dollars in damage and lasted for nearly an entire month before containment. Following our previous work, the Digital Twin Building, we modify and leverage the multi-agent large language model framework as well as the cloud-mapping integration to study the air quality during the Los Angeles wildfires. Recent advances in large language models have allowed for out-of-the-box automated large-scale data analysis. We use a multi-agent large language system comprised of an Instructor agent and Worker agents. Upon receiving the users' instructions, the Instructor agent retrieves the data from the cloud platform and produces instruction prompts to the Worker agents. The Worker agents then analyze the data and provide summaries. The summaries are finally input back into the Instructor agent, which then provides the final data analysis. We test this system's capability for data-based policy recommendation by assessing our Instructor-Worker LLM system's health recommendations based on air quality during the Los Angeles wildfires.


AirCast: Improving Air Pollution Forecasting Through Multi-Variable Data Alignment

Nedungadi, Vishal, Munir, Muhammad Akhtar, Rußwurm, Marc, Sarafian, Ron, Athanasiadis, Ioannis N., Rudich, Yinon, Khan, Fahad Shahbaz, Khan, Salman

arXiv.org Artificial Intelligence

Air pollution remains a leading global health risk, exacerbated by rapid industrialization and urbanization, contributing significantly to morbidity and mortality rates. In this paper, we introduce AirCast, a novel multi-variable air pollution forecasting model, by combining weather and air quality variables. AirCast employs a multi-task head architecture that simultaneously forecasts atmospheric conditions and pollutant concentrations, improving its understanding of how weather patterns affect air quality. Predicting extreme pollution events is challenging due to their rare occurrence in historic data, resulting in a heavy-tailed distribution of pollution levels. To address this, we propose a novel Frequency-weighted Mean Absolute Error (fMAE) loss, adapted from the class-balanced loss for regression tasks. Informed from domain knowledge, we investigate the selection of key variables known to influence pollution levels. Additionally, we align existing weather and chemical datasets across spatial and temporal dimensions. AirCast's integrated approach, combining multi-task learning, frequency weighted loss and domain informed variable selection, enables more accurate pollution forecasts. Our source code and models are made public here (https://github.com/vishalned/AirCast.git)


Analysis of Premature Death Rates in Texas Counties: The Impact of Air Quality, Socioeconomic Factors, and COPD Prevalence

Rich, Richard, Diaz, Ernesto

arXiv.org Artificial Intelligence

Understanding factors contributing to premature mortality is critical for public health planning. This study examines the relationships between premature death rates and multiple risk factors across several Texas counties, utilizing EPA air quality data, Census information, and county health records from recent years. We analyze the impact of air quality (PM2.5 levels), socioeconomic factors (median household income), and health conditions (COPD prevalence) through statistical analysis and modeling techniques. Results reveal COPD prevalence as a strong predictor of premature death rates, with higher prevalence associated with a substantial increase in years of potential life lost. While socioeconomic factors show a significant negative correlation, air quality demonstrates more complex indirect relationships. These findings emphasize the need for integrated public health interventions that prioritize key health conditions while addressing underlying socioeconomic disparities.