Goto

Collaborating Authors

 Problem Solving


AIhub monthly digest: May 2026 – AI for science, the lottery ticket hypothesis, and world models

AIHub

Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we learn about AI for science, delve into world models, research transparent and trustworthy AI, and hear about the lottery ticket hypothesis. The latest interview in our series with the AAAI/SIGAI Doctoral Consortium participants featured Ximing Wen who is researching transparent and trustworthy AI systems. We found out more about her work, her experience as a research intern, and what inspired her to study AI. In this wide-ranging conversation, Jonathan Frankle delves into empiricism versus theoretical proofs, how the approach to computer science has changed (even if the fundamental problems haven't), how younger researchers are rapidly adapting to a world that values impact above all else, and what it means to be a researcher.


UWM-JEPA: Predictive World Models That Imagine in Belief Space

arXiv.org Machine Learning

World models for partially observed environments must imagine multiple compatible hidden futures and steer between them under counterfactual actions. Joint Embedding Predictive Architectures (JEPAs) do this in latent space, but a vector-valued latent has no internal structure for carrying the belief over hidden continuations through blind rollout. We introduce the Unitary World Model JEPA (UWM-JEPA), a JEPA world model with a density-matrix latent on a joint system-environment space and a learned unitary predictor. The construction preserves the joint-state spectrum exactly during rollout, so the predictor itself cannot dissipate the represented uncertainty. On a hidden-velocity indicator task requiring five-step forward simulation under a given action sequence with the target observation masked, UWM-JEPA reaches 0.77 accuracy and degrades monotonically as actions are perturbed; a parameter-matched LSTM-JEPA trained under the same counterfactual-target objective and action head collapses to majority-class accuracy (0.53) under every action condition. Under blind rollout, UWM-JEPA loses fewer than ten points of probe R^2 at short horizons while vector-latent baselines lose forty-one and sixty-eight; both nevertheless tie on a held-out context probe, locating the separation in the predictor rather than the encoder. Action sensitivity itself requires training against counterfactual rather than teacher-forced targets, a finding that applies beyond the unitary parameterisation. For JEPA world models to imagine under partial observability, latent geometry and predictor dynamics matter, not frozen context-encoding capacity alone.


World Models: 10 Things That Matter in AI Right Now

MIT Technology Review

Join a subscriber-only discussion live on Thursday, May 21. A woman's uterus has been kept alive outside the body for the first time Jessica Hamzelou Want to understand the current state of AI? Check out these charts. A woman's uterus has been kept alive outside the body for the first time The team behind the feat plan to study uterine disorders and the early stages of pregnancy--and potentially grow a human fetus. Want to understand the current state of AI? Check out these charts. According to Stanford's 2026 AI Index, AI is sprinting, and we're struggling to keep up. The ultimate plan to live forever is a brand new body.


Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

arXiv.org Machine Learning

How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\toα$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.


The problem of cosmic inflation and how to solve it

New Scientist

One of the best-performing models in cosmology is also one with the least physical rationale behind it. Can a theory of quantum gravity illuminate what happened just after the big bang? Cosmic inflation is a problem. During the first tiny fraction of a second of the universe, it is generally believed that the universe expanded by a factor of around 10. And then, as quickly as it began, this exponential growth just stopped.


The Partial Testimony of Logs: Evaluation of Language Model Generation under Confounded Model Choice

arXiv.org Machine Learning

Offline evaluation of language models from usage logs is biased when model choice is confounded: the same user-side factors that influence which model is used can also influence how its output is judged, so raw comparisons of logged scores mix self-selected populations rather than estimating a common quantity of interest. A small randomized experiment can break this bias by overriding model choice, but in practice such experiments are scarce and costly. We study a three-source design that combines a large confounded observational log (OBS) for scale, a small randomized experiment (EXP) for unconfounded scoring, and an offline simulator (SIM) that replays candidate models on cached contexts. Our main result is an identification theorem showing that the randomized experiment and the simulator are together enough to recover causal model values; the observational log enters only afterward, to reduce estimation error rather than to make the causal comparison valid. Six estimator families are evaluated in a controlled semi-synthetic validation and in two real-task cached benchmarks for summarization and coding. No family dominates every regime; relative performance depends on the amount of unbiased EXP supervision and on how closely the target reward aligns with OBS-derived structure.


Cyber-Insecurity in the AI Era

MIT Technology Review

Cybersecurity was already under strain before AI entered the stack. Now, as AI expands the attack surface and adds new complexity, the limits of legacy approaches are becoming harder to ignore. This session from MIT Technology Review's EmTech AI conference explores why security must be rethought with AI at its core, not layered on after the fact. A prolific inventor and internationally recognized authority in knowledge representation, inference calculus, and AI planning, Tarique has spent his career applying autonomously collaborative AI to solve complex, ultra-high-scale challenges across cybersecurity, data security, and compliance -- with deep expertise spanning Data Classification, DLP, and DSPM industries. His groundbreaking innovations and multiple USPTO patents have earned him global recognition, including frequent invitations to deliver keynote addresses at prestigious international security conferences and forums. At GCCybersecurity, Tarique architected the core AI algorithms powering the company's 4th and 5th generation fully autonomous data leak protection and exfiltration platform -- among the most advanced platform of its kind.


012a91467f210472fab4e11359bbfef6-AuthorFeedback.pdf

Neural Information Processing Systems

First, as R4 suggested, "symbolic35 tree" was more approachable for people in the ML community. Second, the symbolic tree is declared by the user using36 decorators and serves to represent high-level program constructs, which is different from the AST that represents all37 the syntactic structures for the program. For example, the full Python AST contains information about objects' class38 methods, whereas our symbolic representation does not.39 R4: "Second, most of their tool/language design could be summarized as adding some kind of non determinis-40 tic/parametric choice ... It's extension to ML does not introduce anything particularly new ..."41 We agree with R4 that symbolic programming and non-deterministic programming are well-studied topics in the PL42 community. However, we would like to emphasize that this work is the first to introduce such concepts to AutoML43 to significantly reduce engineering effort, which is a novel and useful contribution. For example, PyGlove leverages44 symbolic manipulation to decouple the search algorithm, search space and child program, which enabled us to unify45 the interface among search methods with and without weight sharing. To enable symbolic programming in Python,46 PyGlove implements an object model for maintaining the consistency of program state during symbolic manipulation.47 R4 "Provide the grammar in the main text"48 We understand the "grammar" here as a reference to the formal definition of the search space specification. We will49 revise current Appendix Table 3 into a formal definition, and add it to the "search space" sub-section.50


SCOPE-FE: Structured Control of Operator and Pairwise Exploration for Feature Engineering

arXiv.org Machine Learning

Automatic feature engineering is an effective approach for improving predictive performance in tabular learning. However, expand-and-reduce methods, such as OpenFE, become increasingly computationally expensive as the input dimensionality grows. This limitation arises primarily from the combinatorial explosion of candidate features generated through operator-feature combinations. To address this issue, we propose SCOPE-FE, a structured search space control framework that improves efficiency by reducing the candidate space prior to feature generation. SCOPE-FE jointly regulates two major sources of combinatorial growth: the operator space and feature-pair space. First, OperatorProbing estimates the dataset-specific utility of candidate operators and eliminates low-contribution operators in advance. Second, FeatureClustering employs spectral embedding and fuzzy c-means clustering to group structurally related features, thereby restricting candidate generation to relevant within-cluster combinations. In addition, we introduce ReliabilityScoring, which incorporates variance across subsamples to stabilize pruning decisions. Experiments on ten benchmark datasets demonstrate that SCOPE-FE substantially reduces feature engineering time while maintaining competitive predictive performance relative to existing baselines. The efficiency gains are particularly pronounced for high-dimensional datasets. These results indicate that structured control of the search space is an effective strategy for scalable automatic feature engineering. The code will be made publicly available upon acceptance.