chatterjee
On the Variance, Admissibility, and Stability of Empirical Risk Minimization
It is well known that Empirical Risk Minimization (ERM) may attain minimax suboptimal rates in terms of the mean squared error (Birgé and Massart, 1993). In this paper, we prove that, under relatively mild assumptions, the suboptimality of ERM must be due to its bias. Namely, the variance error term of ERM (in terms of the bias and variance decomposition) enjoys the minimax rate. In the fixed design setting, we provide an elementary proof of this result using the probabilistic method. Then, we extend our proof to the random design setting for various models. In addition, we provide a simple proof of Chatterjee's admissibility theorem (Chatterjee, 2014, Theorem 1.4), which states that in the fixed design setting, ERM cannot be ruled out as an optimal method, and then we extend this result to the random design setting. We also show that our estimates imply stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes. Finally, we highlight the somewhat irregular nature of the loss landscape of ERM in the non-Donsker regime, by showing that functions can be close to ERM, in terms of $L_2$ distance, while still being far from almost-minimizers of the empirical loss.
Reinforcement Learning for Self-Healing Material Systems
Chatterjee, Maitreyi, Agarwal, Devansh, Chatterjee, Biplab
The transition to autonomous material systems necessitates adaptive control methodologies to maximize structural longevity. This study frames the self-healing process as a Reinforcement Learning (RL) problem within a Markov Decision Process (MDP), enabling agents to autonomously derive optimal policies that efficiently balance structural integrity maintenance against finite resource consumption. A comparative evaluation of discrete-action (Q-learning, DQN) and continuous-action (TD3) agents in a stochastic simulation environment revealed that RL controllers significantly outperform heuristic baselines, achieving near-complete material recovery. Crucially, the TD3 agent utilizing continuous dosage control demonstrated superior convergence speed and stability, underscoring the necessity of fine-grained, proportional actuation in dynamic self-healing applications.
- Asia > India > West Bengal > Kolkata (0.06)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
A more efficient method for large-sample model-free feature screening via multi-armed bandits
Ouyang, Xiaxue, Kang, Xinlai, Li, Mengyu, Dou, Zhenxing, Yu, Jun, Meng, Cheng
We consider the model-free feature screening in large-scale ultrahigh-dimensional data analysis. Existing feature screening methods often face substantial computational challenges when dealing with large sample sizes. To alleviate the computational burden, we propose a rank-based model-free sure independence screening method (CR-SIS) and its efficient variant, BanditCR-SIS. The CR-SIS method, based on Chatterjee's rank correlation, is as straightforward to implement as the sure independence screening (SIS) method based on Pearson correlation introduced by Fan and Lv(2008), but it is significantly more powerful in detecting nonlinear relationships between variables. Motivated by the multi-armed bandit (MAB) problem, we reformulate the feature screening procedure to significantly reduce the computational complexity of CR-SIS. For a predictor matrix of size n \times p, the computational cost of CR-SIS is O(nlog(n)p), while BanditCR-SIS reduces this to O(\sqrt(n)log(n)p + nlog(n)). Theoretically, we establish the sure screening property for both CR-SIS and BanditCR-SIS under mild regularity conditions. Furthermore, we demonstrate the effectiveness of our methods through extensive experimental studies on both synthetic and real-world datasets. The results highlight their superior performance compared to classical screening methods, requiring significantly less computational time.
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.48)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Apple quietens Wall Street's fears of China struggles and slow AI progress
Apple has been under pressure this year. It's playing catch-up to its fellow tech giants on artificial intelligence, it's seen its stock fall by double digits since the year began, it closed a store in China for the first time ever this week, and looming US tariffs on Beijing threaten its supply chain. On Thursday, the company released its third-quarter earnings of the fiscal year as investors scrutinize how the iPhone maker might turn things around. Despite the gloomy outlook, the company is still worth more than 3tn, and it beat Wall Street's expectations for profit and revenue this quarter. Apple reported a massive 10% year-over-year increase in revenue to 94.04bn, and 1.57 per share in earnings.
- North America > United States > New York > New York County > New York City (0.63)
- Asia > China > Beijing > Beijing (0.26)
- Asia > India (0.07)
- Asia > Vietnam (0.06)
- Banking & Finance > Trading (0.77)
- Government > Regional Government > North America Government > United States Government (0.52)
AI takes backseat as Apple unveils software revamp and new apps
Apple's artificial intelligence features took a backseat on Monday at its latest annual Worldwide Developers Conference. The company announced a revamped software design called Liquid Glass, new phone and camera apps as well as new features on Apple Watch and Vision Pro. But in spite of pressure to compete with firms that have gone all-in on AI, Apple's AI announcements were limited to incremental features and upgrades. Users will have a few new Apple Intelligence-powered features to look forward to including live translation, a real-time language translation feature that will be integrated into messages, FaceTime and the Phone app. The Android operating system has offered a similar feature for several years.
XicorAttention: Time Series Transformer Using Attention with Nonlinear Correlation
Kimura, Daichi, Izumitani, Tomonori, Kashima, Hisashi
Various Transformer-based models have been proposed for time series forecasting. These models leverage the self-attention mechanism to capture long-term temporal or variate dependencies in sequences. Existing methods can be divided into two approaches: (1) reducing computational cost of attention by making the calculations sparse, and (2) reshaping the input data to aggregate temporal features. However, existing attention mechanisms may not adequately capture inherent nonlinear dependencies present in time series data, leaving room for improvement. In this study, we propose a novel attention mechanism based on Chatterjee's rank correlation coefficient, which measures nonlinear dependencies between variables. Specifically, we replace the matrix multiplication in standard attention mechanisms with this rank coefficient to measure the query-key relationship. Since computing Chatterjee's correlation coefficient involves sorting and ranking operations, we introduce a differentiable approximation employing SoftSort and SoftRank. Our proposed mechanism, ``XicorAttention,'' integrates it into several state-of-the-art Transformer models. Experimental results on real-world datasets demonstrate that incorporating nonlinear correlation into the attention improves forecasting accuracy by up to approximately 9.1\% compared to existing models.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Modeling & Simulation (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Value Iteration with Guessing for Markov Chains and Markov Decision Processes
Chatterjee, Krishnendu, JafariRaviz, Mahdi, Saona, Raimundo, Svoboda, Jakub
Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path. The widely studied algorithmic approach for these problems is the Value Iteration (VI) algorithm which iteratively applies local updates called Bellman updates. There are many practical approaches for VI in the literature but they all require exponentially many Bellman updates for MCs in the worst case. A preprocessing step is an algorithm that is discrete, graph-theoretical, and requires linear space. An important open question is whether, after a polynomial-time preprocessing, VI can be achieved with sub-exponentially many Bellman updates. In this work, we present a new approach for VI based on guessing values. Our theoretical contributions are twofold. First, for MCs, we present an almost-linear-time preprocessing algorithm after which, along with guessing values, VI requires only subexponentially many Bellman updates. Second, we present an improved analysis of the speed of convergence of VI for MDPs. Finally, we present a practical algorithm for MDPs based on our new approach. Experimental results show that our approach provides a considerable improvement over existing VI-based approaches on several benchmark examples from the literature.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Europe > Austria (0.04)
Early-Stopped Mirror Descent for Linear Regression over Convex Bodies
Wegel, Tobias, Kur, Gil, Rebeschini, Patrick
Early-stopped iterative optimization methods are widely used as alternatives to explicit regularization, and direct comparisons between early-stopping and explicit regularization have been established for many optimization geometries. However, most analyses depend heavily on the specific properties of the optimization geometry or strong convexity of the empirical objective, and it remains unclear whether early-stopping could ever be less statistically efficient than explicit regularization for some particular shape constraint, especially in the overparameterized regime. To address this question, we study the setting of high-dimensional linear regression under additive Gaussian noise when the ground truth is assumed to lie in a known convex body and the task is to minimize the in-sample mean squared error. Our main result shows that for any convex body and any design matrix, up to an absolute constant factor, the worst-case risk of unconstrained early-stopped mirror descent with an appropriate potential is at most that of the least squares estimator constrained to the convex body. We achieve this by constructing algorithmic regularizers based on the Minkowski functional of the convex body.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
- North America (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
Apple unveils souped up version of its cheapest iPhone
Apple has released a sleeker and more expensive version of its lowest-priced iPhone in an attempt to widen the audience for a bundle of artificial intelligence technology that the company has been hoping will revive demand for its most profitable product lineup. The iPhone 16e unveiled Wednesday is the fourth generation of a model that's sold at a dramatically lower price than the iPhone's standard and premium models. The previous bargain-bin models were called the iPhone SE, with the last version coming out in 2022. Like the higher-priced iPhone 16 lineup unveiled last September, the iPhone 16e includes the souped-up computer chip needed to process an array of AI features that automatically summarise text and audio and create on-the-fly emojis while smartening up the device's virtual assistant, Siri. It will also have a more powerful battery and camera.
- Asia > China (0.08)
- North America > United States > California > Santa Clara County > Cupertino (0.06)
On the Variance, Admissibility, and Stability of Empirical Risk Minimization
It is well known that Empirical Risk Minimization (ERM) may attain minimax suboptimal rates in terms of the mean squared error (Birgé and Massart, 1993). In this paper, we prove that, under relatively mild assumptions, the suboptimality of ERM must be due to its bias. Namely, the variance error term of ERM (in terms of the bias and variance decomposition) enjoys the minimax rate. In the fixed design setting, we provide an elementary proof of this result using the probabilistic method. Then, we extend our proof to the random design setting for various models.