Africa
Two police officers killed in explosion in Moscow
Three people - including two police officers - have been killed in an explosion in Moscow, Russian authorities have said. Two traffic police officers saw a suspicious individual near a police car on the city's Yeletskaya Street, and when they approached the suspect to detain him, an explosive device was detonated, Russia's Investigative Committee has said. The two police officers died from their injuries, along with another individual who was standing nearby. The attack comes two days after a senior Russian general was killed in a car bombing in the capital on Monday. Lt Gen Fanil Sarvarov died after an explosive device - which had been planted under a car - was detonated.
Stranger Things: What could happen next as the show's finale looms?
Stranger Things: What could happen next as the show's finale looms? Spoiler warning: This contains some details about what has happened in the show so far, but does not reveal anything about the final four episodes. A Christmas feast may be around the corner, or perhaps another chocolate (no strawberry creams, thanks), but for fans of Stranger Things, another gift is waiting to be consumed. The grand finale of Netflix's hugely popular sci-fi fantasy horror series, which also showcases some questionable 80s fashion choices, is looming. Fans last saw the inhabitants of Hawkins in a perilous place as season five opened, with Demogorgons running rampant, along with the monstrous Vecna.
Russia-Ukraine war: List of key events, day 1,399
Could Ukraine hold a presidential election right now? Will Europe use frozen Russian assets to fund war? How can Ukraine rebuild China ties? 'Ukraine is running out of men, money and time' Russian forces began a "massive attack" on Ukraine on Monday night, killing three people and targeting 13 regions with 650 drones and 30 missiles, Ukrainian President Volodymyr Zelenskyy said in a post on X. Those killed in the overnight attack included a four-year-old girl in the central Zhytomyr region, Governor Vitalii Bunechko said on Telegram.
Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Learnable Channel Attention
We study the problem of learning a low-degree spherical polynomial of degree $\ell_0 = ฮ(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural network (NN) with channel attention in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0,1)$, a carefully designed two-layer NN with channel attention and finite width of $m \ge ฮ({n^4 \log (2n/ฮด)}/{d^{2\ell_0}})$ trained by the vanilla gradient descent (GD) requires the lowest sample complexity of $n \asymp ฮ(d^{\ell_0}/\eps)$ with probability $1-ฮด$ for every $ฮด\in (0,1)$, in contrast with the representative sample complexity $ฮ\pth{d^{\ell_0} \max\set{\eps^{-2},\log d}}$, where $n$ is the training daata size. Moreover, such sample complexity is not improvable since the trained network renders a sharp rate of the nonparametric regression risk of the order $ฮ(d^{\ell_0}/{n})$ with probability at least $1-ฮด$. On the other hand, the minimax optimal rate for the regression risk with a kernel of rank $ฮ(d^{\ell_0})$ is $ฮ(d^{\ell_0}/{n})$, so that the rate of the nonparametric regression risk of the network trained by GD is minimax optimal. The training of the two-layer NN with channel attention consists of two stages. In Stage 1, a provable learnable channel selection algorithm identifies the ground-truth channel number $\ell_0$ from the initial $L \ge \ell_0$ channels in the first-layer activation, with high probability. This learnable selection is achieved by an efficient one-step GD update on both layers, enabling feature learning for low-degree polynomial targets. In Stage 2, the second layer is trained by standard GD using the activation function with the selected channels.
Structure-Preserving Nonlinear Sufficient Dimension Reduction for Tensors
Lin, Dianjun, Li, Bing, Xue, Lingzhou
We introduce two nonlinear sufficient dimension reduction methods for regressions with tensor-valued predictors. Our goal is two-fold: the first is to preserve the tensor structure when performing dimension reduction, particularly the meaning of the tensor modes, for improved interpretation; the second is to substantially reduce the number of parameters in dimension reduction, thereby achieving model parsimony and enhancing estimation accuracy. Our two tensor dimension reduction methods echo the two commonly used tensor decomposition mechanisms: one is the Tucker decomposition, which reduces a larger tensor to a smaller one; the other is the CP-decomposition, which represents an arbitrary tensor as a sequence of rank-one tensors. We developed the Fisher consistency of our methods at the population level and established their consistency and convergence rates. Both methods are easy to implement numerically: the Tucker-form can be implemented through a sequence of least-squares steps, and the CP-form can be implemented through a sequence of singular value decompositions. We investigated the finite-sample performance of our methods and showed substantial improvement in accuracy over existing methods in simulations and two data applications.
Robust Causal Directionality Inference in Quantum Inference under MNAR Observation and High-Dimensional Noise
In quantum mechanics, observation actively shapes the system, paralleling the statistical notion of Missing Not At Random (MNAR). This study introduces a unified framework for \textbf{robust causal directionality inference} in quantum engineering, determining whether relations are system$\to$observation, observation$\to$system, or bidirectional. The method integrates CVAE-based latent constraints, MNAR-aware selection models, GEE-stabilized regression, penalized empirical likelihood, and Bayesian optimization. It jointly addresses quantum and classical noise while uncovering causal directionality, with theoretical guarantees for double robustness, perturbation stability, and oracle inequalities. Simulation and real-data analyses (TCGA gene expression, proteomics) show that the proposed MNAR-stabilized CVAE+GEE+AIPW+PEL framework achieves lower bias and variance, near-nominal coverage, and superior quantum-specific diagnostics. This establishes robust causal directionality inference as a key methodological advance for reliable quantum engineering.
One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing
Reliable estimation of feature contributions in machine learning models is essential for trust, transparency and regulatory compliance, especially when models are proprietary or otherwise operate as black boxes. While permutation-based methods are a standard tool for this task, classical implementations rely on repeated random permutations, introducing computational overhead and stochastic instability. In this paper, we show that by replacing multiple random permutations with a single, deterministic, and optimal permutation, we achieve a method that retains the core principles of permutation-based importance while being non-random, faster, and more stable. We validate this approach across nearly 200 scenarios, including real-world household finance and credit risk applications, demonstrating improved bias-variance tradeoffs and accuracy in challenging regimes such as small sample sizes, high dimensionality, and low signal-to-noise ratios. Finally, we introduce Systemic Variable Importance, a natural extension designed for model stress-testing that explicitly accounts for feature correlations. This framework provides a transparent way to quantify how shocks or perturbations propagate through correlated inputs, revealing dependencies that standard variable importance measures miss. Two real-world case studies demonstrate how this metric can be used to audit models for hidden reliance on protected attributes (e.g., gender or race), enabling regulators and practitioners to assess fairness and systemic risk in a principled and computationally efficient manner.
Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting
In lifelong learning systems based on artificial neural networks, one of the biggest obstacles is the inability to retain old knowledge as new information is encountered. This phenomenon is known as catastrophic forgetting. In this paper, we propose a new kind of connectionist architecture, the Sequential Neural Coding Network, that is robust to forgetting when learning from streams of data points and, unlike networks of today, does not learn via the popular back-propagation of errors. Grounded in the neurocognitive theory of predictive coding, our model adapts its synapses in a biologically-plausible fashion while another neural system learns to direct and control this cortex-like structure, mimicking some of the task-executive control functionality of the basal ganglia. In our experiments, we demonstrate that our self-organizing system experiences significantly less forgetting compared to standard neural models, outperforming a swath of previously proposed methods, including rehearsal/data buffer-based methods, on both standard (SplitMNIST, Split Fashion MNIST, etc.) and custom benchmarks even though it is trained in a stream-like fashion. Our work offers evidence that emulating mechanisms in real neuronal systems, e.g., local learning, lateral competition, can yield new directions and possibilities for tackling the grand challenge of lifelong machine learning.
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models
We focus on the task of learning a single index model $\sigma(w^\star \cdot x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions. Prior work has shown that the sample complexity of learning $w^\star$ is governed by the information exponent $k^\star$ of the link function $\sigma$, which is defined as the index of the first nonzero Hermite coefficient of $\sigma$. Ben Arous et al. (2021) showed that $n \gtrsim d^{k^\star-1}$ samples suffice for learning $w^\star$ and that this is tight for online SGD. However, the CSQ lower bound for gradient based methods only shows that $n \gtrsim d^{k^\star/2}$ samples are necessary. In this work, we close the gap between the upper and lower bounds by showing that online SGD on a smoothed loss learns $w^\star$ with $n \gtrsim d^{k^\star/2}$ samples. We also draw connections to statistical analyses of tensor PCA and to the implicit regularization effects of minibatch SGD on empirical losses.
Three killed after Russia launches 'massive' attack across Ukraine
Three killed after Russia launches'massive' attack across Ukraine Russia carried out a massive overnight attack on several Ukrainian cities, President Volodymyr Zelensky has said, a day after he warned of strikes over the Christmas period. At least three people were killed, according to Ukrainian officials, including a four-year-old child, while energy infrastructure was also targeted, leaving several regions without power. Russia launched 635 drones and 38 missiles, Ukraine's air force said, adding that 621 of them were downed. Zelensky said people simply want to be with their families, at home, and safe in the run-up to Christmas, and said the strikes sent an extremely clear signal about Russia's priorities despite ongoing peace talks. He added that Russian President Vladimir Putin still cannot accept that he must stop killing.