Country
How Shivon Zilis Operated as Elon Musk's OpenAI Insider
Messages presented at trial reveal how Zilis, the mother of four of Musk's children, acted as an intermediary between him and OpenAI. As the first week of trial in comes to a close, one person has emerged as a critical behind-the-scenes manager of communications and egos in OpenAI's early years: Shivon Zilis. A longtime employee of Musk and the mother to four of his children, Zilis first joined OpenAI as an advisor in 2016. She later served as a director of its nonprofit board from 2020 until 2023 and has also worked as an executive at Musk's other companies, Neuralink and Tesla. When asked about the nature of his relationship with Zilis in court, Musk offered several answers.
Value-Aware Product Recommendation by Customer Segmentation using a suitable High-Dimensional Similarity Measure
Acosta, Marรญa Florencia, Arancibia, Rodrigo Garcรญa, Llop, Pamela, Lovatto, Mariel, Mansilla, Lucas
This paper presents a novel value-aware approach to product recommendation that simultaneously addresses the high dimensionality and sparsity of user-item data while explicitly incorporating the contribution of each product and user to overall sales revenue. The proposed framework encodes revenue contributions in the user-item matrix and computes customer similarity directly on this basis using suitable distance measures. This enables the segmentation of users according to the revenue-based similarity of their purchase baskets and supports recommendations aligned with profitability objectives. We compare conventional similarity metrics with a novel alternative tailored to high-dimensional contexts and propose three recommendation strategies based on revenue share, product popularity, and expected profit generation. The effectiveness of the proposed method is validated through simulation experiments and a real-world application using the UCI Online Retail dataset.
Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution
Dobiczek, Karol, Mozolewski, Maciej, Bobek, Szymon, Szafarczyk, Michaล, van Dam, Peter, Nalepa, Grzegorz J.
Deep learning models for 12-lead electrocardiogram (ECG) analysis achieve high diagnostic performance but lack the intuitive interpretability required for clinical integration. Standard feature attribution methods are limited by the inherent difficulty in mapping abstract waveform fluctuations to physical anatomical pathologies. To resolve this, we propose a cross-modal method that projects feature attributions from high-performance 12-lead ECG models onto the CineECG 3D anatomical space. Our study reveals that while models trained directly on CineECG signals suffer from reduced accuracy and incoherent attributions, the proposed mapping mechanism effectively recovers clinically relevant feature rankings. Validated against a ground-truth dataset of 20 cases annotated by domain experts, the mapped explanations yield a Dice score of 0.56, significantly outperforming the 0.47 baseline of standard 12-lead attributions. These findings indicate that cross-modal averaging mapping effectively filters attribution instability and improves the localization of pathological features, combining the diagnostic expressiveness of standard ECG with the intuitive clarity of anatomical visualization.
SCOPE-FE: Structured Control of Operator and Pairwise Exploration for Feature Engineering
Park, Minhee, Son, Seongyeon, Lee, Yonghyun, Kim, Eunchan
Automatic feature engineering is an effective approach for improving predictive performance in tabular learning. However, expand-and-reduce methods, such as OpenFE, become increasingly computationally expensive as the input dimensionality grows. This limitation arises primarily from the combinatorial explosion of candidate features generated through operator-feature combinations. To address this issue, we propose SCOPE-FE, a structured search space control framework that improves efficiency by reducing the candidate space prior to feature generation. SCOPE-FE jointly regulates two major sources of combinatorial growth: the operator space and feature-pair space. First, OperatorProbing estimates the dataset-specific utility of candidate operators and eliminates low-contribution operators in advance. Second, FeatureClustering employs spectral embedding and fuzzy c-means clustering to group structurally related features, thereby restricting candidate generation to relevant within-cluster combinations. In addition, we introduce ReliabilityScoring, which incorporates variance across subsamples to stabilize pruning decisions. Experiments on ten benchmark datasets demonstrate that SCOPE-FE substantially reduces feature engineering time while maintaining competitive predictive performance relative to existing baselines. The efficiency gains are particularly pronounced for high-dimensional datasets. These results indicate that structured control of the search space is an effective strategy for scalable automatic feature engineering. The code will be made publicly available upon acceptance.
Linear Models, Variable Selection, Artificial Intelligence
Alrawkan, By Riyadh, Boone, Edward, Ghanam, Ryad, Westveld, Anton
Variable selection in linear regression models has been a problem since hypothesis testing began. Which variables to include or exclude from a model is not an easy task. Techniques such as Forward, Back ward, Stepwise Regression sequentially add or delete variables from a model. Penalized likelihood methods such as AIC, BIC, etc. seek to choose variables that have a significant contribution to the likelihood. Penalized sum of square methods such as LASSO and Elastic Net have been used to penalize small coefficients to only allow variables with large coefficients in the model. This work introduces an Artificial Intelligence approach to model selection where an ANN is trained to determine the significance of the variables based on OLS estimates. A simulation study shows the accuracy across various sample sizes and variances. Furthermore, a simulation study is conducted to compare the performance of the approach against Forward, Backward, AIC, BIC and LASSO. The approach is illustrated using a dataset from the World Health Organization regarding Life Expectancy. A github link is provided to the pretrained ANN that can handle up to 100 predictor variables, the original WHO dataset and the subset used in this work.
FoReco and FoRecoML: A Unified Toolbox for Forecast Reconciliation in R
Girolimetto, Daniele, Rombouts, Jeroen, Wilms, Ines, Yang, Yangzhuoran Fin
In this paper, we introduce the forecast reconciliation packages FoReco and FoRecoML for R (RCore Team 2026). Forecast reconciliation adjusts forecasts for linearly constrained multiple time series (such as hierarchical or grouped series, or series observed at different temporal frequencies) so that they are coherent with respect to the underlying constraints, improving both accuracy and consistency for informed decision making. The contributions of the packages are threefold. First, FoReco and FoRecoML are the first to offer functionality for forecast reconciliation methods across cross-sectional, temporal and cross-temporal frameworks. Second, the packages provide a comprehensive set of forecast reconciliation approaches, including classical (e.g., top-down, bottom-up and middle-out) and regression based reconciliation methods - in FoReco - as well as non-linear reconciliation methods using machine learning - in FoRecoML. A third key contribution is their unified design, which enables easy-to-use forecast reconciliation functions built on the same philosophy, regardless of the reconciliation framework or method.
Optimized Deferral for Imbalanced Settings
Cortes, Corinna, Mao, Anqi, Mohri, Mehryar, Zhong, Yutao
Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational cost. This approach, known as learning to defer, is essential in domains like natural language generation, medical diagnosis, and computer vision, where an effective deferral can reduce errors at low extra resource consumption. However, the two-stage learning to defer setting, which leverages existing predictors such as a collection of LLMs or other classifiers, often faces challenges due to an expert imbalance problem. This imbalance can lead to suboptimal performance, with deferral algorithms favoring the majority expert. We present a comprehensive study of two-stage learning to defer in expert imbalance settings. We cast the deferral loss optimization as a novel cost-sensitive learning problem over the input-expert domain. We derive new margin-based loss functions and guarantees tailored to this setting, and develop novel algorithms for cost-sensitive learning. Leveraging these results, we design principled deferral algorithms, MILD (Margin-based Imbalanced Learning to Defer), specifically suited for expert imbalance settings. Extensive experiments demonstrate the effectiveness of our approach, showing clear improvements over existing baselines on both image classification and real-world Large Language Model (LLM) routing tasks.
Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction
The choice of loss function in classification involves a fundamental trade-off: smooth losses (like Cross-Entropy) enable fast optimization rates but yield slow square-root consistency bounds, while piecewise-linear losses (like Hinge) offer fast linear consistency rates but suffer from non-differentiability. We propose Linear-Core (LC) Surrogates, a new family of convex loss functions that resolve this tension by stitching a linear core to a smooth tail. We prove that these surrogates are differentiable everywhere while retaining strict linear $H$-consistency bounds, effectively combining the optimization benefits of smoothness with the statistical efficiency of margin-based losses. In the structured prediction setting, we show that this smoothness unlocks a massive computational and energy advantage: it allows for an unbiased stochastic gradient estimator that bypasses the quadratic complexity $O(|\mathscr{Y}|^2)$ of exact inference (e.g., Viterbi). Empirically, our method achieves a 23$\times$ speedup over Structured SVMs on large-vocabulary sequence tagging tasks and demonstrates superior robustness to instance-dependent label noise, outperforming Cross-Entropy by 2.6% on corrupted CIFAR-10.