Goto

Collaborating Authors

 default




A Proof of Proposition 1 Proof: First, it is straightforward to show that the IPW estimator of the ground truth treatment effect ˆ δ

Neural Information Processing Systems

We proceed to compute the variances of each estimator. The proof also holds for the non-zero mean case trivially. Causal model details for Section 5.2 In Section 5.2, We include a wide range of machine learning-based causal inference methods to evaluate the performance of causal error estimators. Others configs are kept as default. The others are kept as default.


Incorporating data drift to perform survival analysis on credit risk

Peng, Jianwei, Lessmann, Stefan

arXiv.org Machine Learning

Survival analysis has become a standard approach for modelling time to default by time-varying covariates in credit risk. Unlike most existing methods that implicitly assume a stationary data-generating process, in practise, mortgage portfolios are exposed to various forms of data drift caused by changing borrower behaviour, macroeconomic conditions, policy regimes and so on. This study investigates the impact of data drift on survival-based credit risk models and proposes a dynamic joint modelling framework to improve robustness under non-stationary environments. The proposed model integrates a longitudinal behavioural marker derived from balance dynamics with a discrete-time hazard formulation, combined with landmark one-hot encoding and isotonic calibration. Three types of data drift (sudden, incremental and recurring) are simulated and analysed on mortgage loan datasets from Freddie Mac. Experiments and corresponding evidence show that the proposed landmark-based joint model consistently outperforms classical survival models, tree-based drift-adaptive learners and gradient boosting methods in terms of discrimination and calibration across all drift scenarios, which confirms the superiority of our model design.


You Need Better Attention Priors

Litman, Elon, Guo, Gabe

arXiv.org Machine Learning

We generalize the attention mechanism by viewing it through the lens of Entropic Optimal Transport, revealing that standard attention corresponds to a transport problem regularized by an implicit uniform prior. We introduce Generalized Optimal transport Attention with Trainable priors (GOAT), a new attention mechanism that replaces this naive assumption with a learnable, continuous prior. This prior maintains full compatibility with optimized kernels such as FlashAttention. GOAT also provides an EOT-based explanation of attention sinks and materializes a solution for them, avoiding the representational trade-offs of standard attention. Finally, by absorbing spatial information into the core attention computation, GOAT learns an extrapolatable prior that combines the flexibility of learned positional embeddings with the length generalization of fixed encodings.


A Multilayered Approach to Classifying Customer Responsiveness and Credit Risk

Afolabi, Ayomide, Ogburu, Ebere, Kimitei, Symon

arXiv.org Machine Learning

AB S TRACT This study evaluates the performance of various classifiers in three distinct models: r esponse, r isk, and r esponse - r isk, concerning credit card mail campaigns and default prediction. In the r esponse model, the Extra Trees classifier demonstrates the highest recall level (79.1%), emphasizing its effectiveness in identifying potential responders to targeted credit card offers. Conversely, in the r isk model, the Random Forest classifier exhibits remarkable specificity of 84.1%, crucial for identifying customers least likely to default. Furthermore, in the multi - class r esponse - r isk model, the Random Forest classifier achieve s the highest accuracy (83.2%), indicating its efficacy in discerning both potential responders to credit card mail campaign and low - risk credit card users . In this study, we optimized various performance metrics to solve a specific credit risk and mail responsiveness business problem.


Stop Using Your Keyboard and Start Using Handy, a Free Speech-to-Text App

WIRED

It's called Handy, and it uses AI models to accurately convert your speaking voice into text--all for free. If old sci-fi shows are anything to go by, we're all using our computers wrong. We're still typing with our fingers, like cave people, instead of talking out loud the way the future was supposed to be. Have you ever seen Picard touch a keyboard? And it's odd because our computers are all capable of turning speech into text by default.


Time-aware UNet and super-resolution deep residual networks for spatial downscaling

Sipilä, Mika, Maggio, Sabrina, De Iaco, Sandra, Nordhausen, Klaus, Palma, Monica, Taskinen, Sara

arXiv.org Machine Learning

Satellite data of atmospheric pollutants are often available only at coarse spatial resolution, limiting their applicability in local-scale environmental analysis and decision-making. Spatial downscaling methods aim to transform the coarse satellite data into high-resolution fields. In this work, two widely used deep learning architectures, the super-resolution deep residual network (SRDRN) and the encoder-decoder-based UNet, are considered for spatial downscaling of tropospheric ozone. Both methods are extended with a lightweight temporal module, which encodes observation time using either sinusoidal or radial basis function (RBF) encoding, and fuses the temporal features with the spatial representations in the networks. The proposed time-aware extensions are evaluated against their baseline counterparts in a case study on ozone downscaling over Italy. The results suggest that, while only slightly increasing computational complexity, the temporal modules significantly improve downscaling performance and convergence speed.


SyGra: A Unified Graph-Based Framework for Scalable Generation, Quality Tagging, and Management of Synthetic Data

Pradhan, Bidyapati, Dasgupta, Surajit, Saha, Amit Kumar, Anustoop, Omkar, Puttagunta, Sriram, Mittal, Vipul, Sarda, Gopal

arXiv.org Artificial Intelligence

The advancement of large language models (LLMs) is critically dependent on the availability of high-quality datasets for Supervised Fine-Tuning (SFT), alignment tasks like Direct Preference Optimization (DPO), etc. In this work, we present a comprehensive synthetic data generation framework that facilitates scalable, configurable, and high-fidelity generation of synthetic data tailored for these training paradigms. Our approach employs a modular and configuration-based pipeline capable of modeling complex dialogue flows with minimal manual intervention. This framework uses a dual-stage quality tagging mechanism, combining heuristic rules and LLM-based evaluations, to automatically filter and score data extracted from OASST-formatted conversations, ensuring the curation of high-quality dialogue samples. The resulting datasets are structured under a flexible schema supporting both SFT and DPO use cases, enabling seamless integration into diverse training workflows. Together, these innovations offer a robust solution for generating and managing synthetic conversational data at scale, significantly reducing the overhead of data preparation in LLM training pipelines.


Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time

Richman, Daniel D., Karaguesian, Jessica, Suomivuori, Carl-Mikael, Dror, Ron O.

arXiv.org Artificial Intelligence

The function of biomolecules such as proteins depends on their ability to interconvert between a wide range of structures or "conformations." Researchers have endeavored for decades to develop computational methods to predict the distribution of conformations, which is far harder to determine experimentally than a static folded structure. We present ConforMix, an inference-time algorithm that enhances sampling of conformational distributions using a combination of classifier guidance, filtering, and free energy estimation. Our approach upgrades diffusion models -- whether trained for static structure prediction or conformational generation -- to enable more efficient discovery of conformational variability without requiring prior knowledge of major degrees of freedom. ConforMix is orthogonal to improvements in model pretraining and would benefit even a hypothetical model that perfectly reproduced the Boltzmann distribution. Remarkably, when applied to a diffusion model trained for static structure prediction, ConforMix captures structural changes including domain motion, cryptic pocket flexibility, and transporter cycling, while avoiding unphysical states. Case studies of biologically critical proteins demonstrate the scalability, accuracy, and utility of this method.