AITopics | Gaithersburg

Diverse Community Data for Benchmarking Data Privacy Algorithms

Neural Information Processing SystemsFeb-16-2026, 06:02:15 GMT

Deidentification algorithms are vulnerable to the same bias and privacy issues that impact other data analytics and machine learning applications, and it can even amplify those issues by contaminating downstream applications.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe (0.14)
(5 more...)

Genre: Research Report (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.30)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Nonparametric_Uncertainty_Quantification_for_Single_Deterministic_Neural_Network (1)

Nikita Kotelevskii

Neural Information Processing SystemsFeb-12-2026, 16:02:41 GMT

dataset, international conference, learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
(15 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Add feedback

SoftMatcha 2: A Fast and Soft Pattern Matcher for Trillion-Scale Corpora

Yoneda, Masataka, Matsushita, Yusuke, Kamoda, Go, Suenaga, Kohei, Akiba, Takuya, Waga, Masaki, Yokoi, Sho

arXiv.org Machine LearningFeb-12-2026

We present an ultra-fast and flexible search algorithm that enables search over trillion-scale natural language corpora in under 0.3 seconds while handling semantic variations (substitution, insertion, and deletion). Our approach employs string matching based on suffix arrays that scales well with corpus size. To mitigate the combinatorial explosion induced by the semantic relaxation of queries, our method is built on two key algorithmic ideas: fast exact lookup enabled by a disk-aware design, and dynamic corpus-aware pruning. We theoretically show that the proposed method suppresses exponential growth in the search space with respect to query length by leveraging statistical properties of natural language. In experiments on FineWeb-Edu (Lozhkov et al., 2024) (1.4T tokens), we show that our method achieves significantly lower search latency than existing methods: infini-gram (Liu et al., 2024), infini-gram mini (Xu et al., 2025), and SoftMatcha (Deguchi et al., 2025). As a practical application, we demonstrate that our method identifies benchmark contamination in training corpora, unidentified by existing approaches. We also provide an online demo of fast, soft search across corpora in seven languages.

large language model, machine learning, pattern recognition, (25 more...)

arXiv.org Machine Learning

2602.10908

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Austria > Styria > Graz (0.04)
Europe > Austria > Vienna (0.04)
(17 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Leisure & Entertainment > Sports > Olympic Games (0.95)
Health & Medicine > Therapeutic Area > Immunology (0.92)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

ac4920f4085b5662133dd751493946a6-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 07:23:20 GMT

arxiv preprint arxiv, procedure, proceedings, (11 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)

Add feedback

3151e460c41ba67dc55412861184ef35-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 18:17:55 GMT

activation memory, fine-tuning, gradient, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(20 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Hybrid Physics-ML Model for Forward Osmosis Flux with Complete Uncertainty Quantification

Ratn, Shiv, Rampriyan, Shivang, Ray, Bahni

arXiv.org Machine LearningDec-12-2025

Forward Osmosis (FO) is a promising low-energy membrane separation technology, but challenges in accurately modelling its water flux (Jw) persist due to complex internal mass transfer phenomena. Traditional mechanistic models struggle with empirical parameter variability, while purely data-driven models lack physical consistency and rigorous uncertainty quantification (UQ). This study introduces a novel Robust Hybrid Physics-ML framework employing Gaussian Process Regression (GPR) for highly accurate, uncertainty-aware Jw prediction. The core innovation lies in training the GPR on the residual error between the detailed, non-linear FO physical model prediction (Jw_physical) and the experimental water flux (Jw_actual). Crucially, we implement a full UQ methodology by decomposing the total predictive variance (sigma2_total) into model uncertainty (epistemic, from GPR's posterior variance) and input uncertainty (aleatoric, analytically propagated via the Delta method for multi-variate correlated inputs). Leveraging the inherent strength of GPR in low-data regimes, the model, trained on a meagre 120 data points, achieved a state-of-the-art Mean Absolute Percentage Error (MAPE) of 0.26% and an R2 of 0.999 on the independent test data, validating a truly robust and reliable surrogate model for advanced FO process optimization and digital twin development.

delta method, gpr, variance, (14 more...)

arXiv.org Machine Learning

2512.10457

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
Asia > India > NCT > New Delhi (0.04)

Genre: Research Report (0.40)

Industry: Water & Waste Management > Water Management > Lifecycle > Reuse (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Privacy-Preserving Generative Modeling and Clinical Validation of Longitudinal Health Records for Chronic Disease

Ballyk, Benjamin D., Gupta, Ankit, Konda, Sujay, Subramanian, Kavitha, Landon, Chris, Naseer, Ahmed Ammar, Maierhofer, Georg, Swaminathan, Sumanth, Venkateshwaran, Vasudevan

arXiv.org Machine LearningDec-2-2025

Data privacy is a critical challenge in modern medical workflows as the adoption of electronic patient records has grown rapidly. Stringent data protection regulations limit access to clinical records for training and integrating machine learning models that have shown promise in improving diagnostic accuracy and personalized care outcomes. Synthetic data offers a promising alternative; however, current generative models either struggle with time-series data or lack formal privacy guaranties. In this paper, we enhance a state-of-the-art time-series generative model to better handle longitudinal clinical data while incorporating quantifiable privacy safeguards. Using real data from chronic kidney disease and ICU patients, we evaluate our method through statistical tests, a Train-on-Synthetic-Test-on-Real (TSTR) setup, and expert clinical review. Our non-private model (Augmented TimeGAN) outperforms transformer- and flow-based models on statistical metrics in several datasets, while our private model (DP-TimeGAN) maintains a mean authenticity of 0.778 on the CKD dataset, outperforming existing state-of-the-art models on the privacy-utility frontier. Both models achieve performance comparable to real data in clinician evaluations, providing robust input data necessary for developing models for complex chronic conditions without compromising data privacy.

dataset, synthetic longitudinal ehr, timegan, (13 more...)

arXiv.org Machine Learning

2512.00434

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(11 more...)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area > Nephrology (0.88)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.64)

Add feedback

5d1f02132ef51602adf07000ca5b6138-Paper-Conference.pdf

Neural Information Processing SystemsNov-18-2025, 20:48:22 GMT

code change, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Austria > Vienna (0.14)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

A New Paradigm for Protecting Homes from Disastrous Fires

The New YorkerOct-22-2025, 10:00:00 GMT

Scientists have identified more than fifty ways that houses can ignite. It's possible to defend against all of them--but it's arduous, and homeowners can't do it alone. In June, 2012, hundreds of homes in Mountain Shadows, Colorado, a subdivision in the foothills of the Rockies, were reduced to ash during the wind-whipped Waldo Canyon Fire. On a cul-de-sac called Hot Springs Court, however, four dwellings somehow remained standing. The mystery of their survival nagged at Alex Maranghides, a fire-protection engineer at the National Institute of Standards and Technology (), who worked with several colleagues on a meticulous reconstruction of the fire. How did the homes make it through? Was there something special about them--a fireproof roof, say, or a fancy sprinkler system? The team collected weather reports, topographic data, G.P.S. records from fire engines, photos, videos, and property-damage reports.

california, maranghide, neighbor, (16 more...)

The New Yorker

Country: