Goto

Collaborating Authors

 focal



A Doubly Robust Machine Learning Approach for Disentangling Treatment Effect Heterogeneity with Functional Outcomes

Salmaso, Filippo, Testa, Lorenzo, Chiaromonte, Francesca

arXiv.org Machine Learning

Causal inference is paramount for understanding the effects of interventions, yet extracting personalized insights from increasingly complex data remains a significant challenge for modern machine learning. This is the case, in particular, when considering functional outcomes observed over a continuous domain (e.g., time, or space). Estimation of heterogeneous treatment effects, known as CATE, has emerged as a crucial tool for personalized decision-making, but existing meta-learning frameworks are largely limited to scalar outcomes, failing to provide satisfying results in scientific applications that leverage the rich, continuous information encoded in functional data. Here, we introduce FOCaL (Functional Outcome Causal Learning), a novel, doubly robust meta-learner specifically engineered to estimate a functional heterogeneous treatment effect (F-CATE). FOCaL integrates advanced functional regression techniques for both outcome modeling and functional pseudo-outcome reconstruction, thereby enabling the direct and robust estimation of F-CATE. We provide a rigorous theoretical derivation of FOCaL, demonstrate its performance and robustness compared to existing non-robust functional methods through comprehensive simulation studies, and illustrate its practical utility on diverse real-world functional datasets. FOCaL advances the capabilities of machine intelligence to infer nuanced, individualized causal effects from complex data, paving the way for more precise and trustworthy AI systems in personalized medicine, adaptive policy design, and fundamental scientific discovery.


Few-ShotContinualActiveLearningbyaRobot

Neural Information Processing Systems

The framework also uses uncertainty measures on the Gaussian representations of thepreviously learned classes tofindthemost informativesamples tobelabeled in an increment. We evaluate our approach on the CORe-50 dataset and on a real humanoid robot for the object classification task.


FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space

Neural Information Processing Systems

This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understanding the underlying sensing physics. Besides, contrastive frameworks for time series have not handled the temporal information locality appropriately. FOCAL solves these challenges by making the following contributions: First, given multimodal time series, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective.



ForTIFAI: Fending Off Recursive Training Induced Failure for AI Model Collapse

Shabgahi, Soheil Zibakhsh, Aghazadeh, Pedram, Mirhoseini, Azalia, Koushanfar, Farinaz

arXiv.org Artificial Intelligence

The increasing reliance on generative AI models is rapidly increasing the volume of synthetic data, with some projections suggesting that most available new data for training could be machine-generated by 2030 Gartner, Inc. (2022). This shift to a mainly synthetic content presents a critical challenge: repeated training in synthetic data leads to a phenomenon known as model collapse, where model performance degrades over generations of training, eventually rendering the models ineffective. While the causes of model collapse are increasingly understood, effective mitigation strategies remain scarce. We address this challenge by leveraging a key insight: auto-regressive models tend to generate text sequences to which they assign high confidence (i.e., high log-likelihood). Based on this observation, we introduce the Truncated-Cross-Entropy (TCE) loss function. Our experiments demonstrate that models trained with TCE not only learn effectively but also exhibit significantly increased resilience, tolerating over 2.3 more synthetic data before the onset of collapse. In addition, we provide an open-source benchmark for collapse dynamics in mixed-data settings. Our results demonstrate that confidence-aware training objectives can substantially delay collapse onset, offering a practical and generalizable tool for model robustness under synthetic-data exposure. Generative models have become the foundation for modern AI applications in several modalities, including text, image, code, and audio. Large Language Models (LLMs) such as ChatGPT (OpenAI et al., 2024), LLaMA (Grattafiori et al., 2024) and Gemma (Team et al., 2025), as well as image generators DALL-E (Ramesh et al., 2021) and Imagen (Saharia et al., 2022), all rely on large datasets scraped from the Web. As these models are continuously updated to reflect recent knowledge and linguistic patterns, the need for ever larger and frequently refreshed training corpora has grown substantially. However, this demand is colliding with a shift in the data landscape: synthetic content is increasingly populating the Internet, contaminating the very datasets used for model training. This shift raises fundamental concerns.



FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space

Neural Information Processing Systems

This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understanding the underlying sensing physics. Besides, contrastive frameworks for time series have not handled the temporal information locality appropriately.



FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space

Neural Information Processing Systems

This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understanding the underlying sensing physics. Besides, contrastive frameworks for time series have not handled the temporal information locality appropriately. FOCAL solves these challenges by making the following contributions: First, given multimodal time series, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective.