Goto

Collaborating Authors

 South America


Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case

arXiv.org Machine Learning

A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$ that may influence both $X$ and $Y$. The methodology's main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$, replaced by the more readily implementable plain independence between $W$ and the random variable $Z_Y = T(Z,Y)$ that solves the [Monge] Optimal Transport Barycenter Problem for $Z\mid Y$. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders $Z$ are unknown, other measurable contextual variables $S$ can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix $ฮฃ_{ZS}$ has full range. The resulting linear feature extractor adopts a closed form in terms of the first $d$ eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.


Random Gradient-Free Optimization in Infinite Dimensional Spaces

arXiv.org Machine Learning

In this paper, we propose a random gradient-free method for optimization in infinite dimensional Hilbert spaces, applicable to functional optimization in diverse settings. Though such problems are often solved through finite-dimensional gradient descent over a parametrization of the functions, such as neural networks, an interesting alternative is to instead perform gradient descent directly in the function space by leveraging its Hilbert space structure, thus enabling provable guarantees and fast convergence. However, infinite-dimensional gradients are often hard to compute in practice, hindering the applicability of such methods. To overcome this limitation, our framework requires only the computation of directional derivatives and a pre-basis for the Hilbert space domain, i.e., a linearly-independent set whose span is dense in the Hilbert space. This fully resolves the tractability issue, as pre-bases are much more easily obtained than full orthonormal bases or reproducing kernels -- which may not even exist -- and individual directional derivatives can be easily computed using forward-mode scalar automatic differentiation. We showcase the use of our method to solve partial differential equations ร  la physics informed neural networks (PINNs), where it effectively enables provable convergence.


Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding

Neural Information Processing Systems

Symbolic regression is the process of identifying mathematical expressions that fit observed output from a black-box process. It is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem include neural-guided search (e.g. using reinforcement learning) and genetic programming. In this work, we introduce a hybrid neural-guided/genetic programming approach to symbolic regression and other combinatorial optimization problems. We propose a neural-guided component used to seed the starting population of a random restart genetic programming component, gradually learning better starting populations. On a number of common benchmark tasks to recover underlying expressions from a dataset, our method recovers 65% more expressions than a recently published top-performing model using the same experimental setup. We demonstrate that running many genetic programming generations without interdependence on the neural-guided component performs better for symbolic regression than alternative formulations where the two are more strongly coupled. Finally, we introduce a new set of 22 symbolic regression benchmark problems with increased difficulty over existing benchmarks.


Giving a 140 pound stingray a check up requires 8 people

Popular Science

The male leopard whiptail ray also boasts a four-foot-three-inch wingspan. Leopard whiptail rays have spotted skin and a long, thin tail they use for balance, steering, and defense. Breakthroughs, discoveries, and DIY tips sent every weekday. Getting that annual check-up can feel daunting for anyone. At the weight of an adult human with a four-foot-three-inch wingspan, just moving the giant fish from its habitat to an exam pool is an exercise in teamwork.


How Should We Approach A.I. in 2026?

The New Yorker

The rapid normalization of artificial intelligence is forcing a reckoning with how much of the future is being shaped by hype rather than utility. The writers Charles Duhigg, Cal Newport, and Anna Wiener join Tyler Foggatt for a conversation about artificial intelligence and the promises, myths, and anxieties surrounding it. The discussion was recorded before a live audience at The New Yorker Festival this fall. They explore the gap between Silicon Valley's sweeping claims and what generative A.I. can actually do today; how people are using the technology for work, creativity, and emotional support; and why the tech's most immediate political consequences may be the hardest to grapple with. " The Biggest Threat to the 2026 Economy Is Still Donald Trump," by John Cassidy What Can We Do Instead?," by Jay Caspian Kang When an Ivy League school turned against a student .


Big Bird: Transformers for Longer Sequences

Neural Information Processing Systems

Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having $O(1)$ global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.


Two police officers killed in explosion in Moscow

BBC News

Three people - including two police officers - have been killed in an explosion in Moscow, Russian authorities have said. Two traffic police officers saw a suspicious individual near a police car on the city's Yeletskaya Street, and when they approached the suspect to detain him, an explosive device was detonated, Russia's Investigative Committee has said. The two police officers died from their injuries, along with another individual who was standing nearby. The attack comes two days after a senior Russian general was killed in a car bombing in the capital on Monday. Lt Gen Fanil Sarvarov died after an explosive device - which had been planted under a car - was detonated.


Stranger Things: What could happen next as the show's finale looms?

BBC News

Stranger Things: What could happen next as the show's finale looms? Spoiler warning: This contains some details about what has happened in the show so far, but does not reveal anything about the final four episodes. A Christmas feast may be around the corner, or perhaps another chocolate (no strawberry creams, thanks), but for fans of Stranger Things, another gift is waiting to be consumed. The grand finale of Netflix's hugely popular sci-fi fantasy horror series, which also showcases some questionable 80s fashion choices, is looming. Fans last saw the inhabitants of Hawkins in a perilous place as season five opened, with Demogorgons running rampant, along with the monstrous Vecna.


Russia-Ukraine war: List of key events, day 1,399

Al Jazeera

Could Ukraine hold a presidential election right now? Will Europe use frozen Russian assets to fund war? How can Ukraine rebuild China ties? 'Ukraine is running out of men, money and time' Russian forces began a "massive attack" on Ukraine on Monday night, killing three people and targeting 13 regions with 650 drones and 30 missiles, Ukrainian President Volodymyr Zelenskyy said in a post on X. Those killed in the overnight attack included a four-year-old girl in the central Zhytomyr region, Governor Vitalii Bunechko said on Telegram.


One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing

arXiv.org Machine Learning

Reliable estimation of feature contributions in machine learning models is essential for trust, transparency and regulatory compliance, especially when models are proprietary or otherwise operate as black boxes. While permutation-based methods are a standard tool for this task, classical implementations rely on repeated random permutations, introducing computational overhead and stochastic instability. In this paper, we show that by replacing multiple random permutations with a single, deterministic, and optimal permutation, we achieve a method that retains the core principles of permutation-based importance while being non-random, faster, and more stable. We validate this approach across nearly 200 scenarios, including real-world household finance and credit risk applications, demonstrating improved bias-variance tradeoffs and accuracy in challenging regimes such as small sample sizes, high dimensionality, and low signal-to-noise ratios. Finally, we introduce Systemic Variable Importance, a natural extension designed for model stress-testing that explicitly accounts for feature correlations. This framework provides a transparent way to quantify how shocks or perturbations propagate through correlated inputs, revealing dependencies that standard variable importance measures miss. Two real-world case studies demonstrate how this metric can be used to audit models for hidden reliance on protected attributes (e.g., gender or race), enabling regulators and practitioners to assess fairness and systemic risk in a principled and computationally efficient manner.