Industry
Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution
End-to-end autonomous driving methods aim to directly map raw sensor inputs to future driving actions such as planned trajectories, bypassing traditional modular pipelines. While these approaches have shown promise, they often operate under a one-shot paradigm that relies heavily on the current scene context, potentially underestimating the importance of scene dynamics and their temporal evolution. This limitation restricts the model's ability to make informed and adaptive decisions in complex driving scenarios. We propose a new perspective: the future trajectory of an autonomous vehicle is closely intertwined with the evolving dynamics of its environment, and conversely, the vehicle's own future states can influence how the surrounding scene unfolds. Motivated by this bidirectional relationship, we introduce SeerDrive, a novel end-to-end framework that jointly models future scene evolution and trajectory planning in a closed-loop manner. Our method first predicts future bird's-eye view (BEV) representations to anticipate the dynamics of the surrounding scene, then leverages this foresight to generate future-context-aware trajectories. Two key components enable this: (1) future-aware planning, which injects predicted BEV features into the trajectory planner, and (2) iterative scene modeling and vehicle planning, which refines both future scene prediction and trajectory generation through collaborative optimization. Extensive experiments on the NAVSIM and nuScenes benchmarks show that SeerDrive significantly outperforms existing state-of-the-art methods.
Multiscale guidance of protein structure prediction with heterogeneous cryo-EM data
Protein structure prediction models are now capable of generating accurate 3D structural hypotheses from sequence alone. However, they routinely fail to capture the conformational diversity of dynamic biomolecular complexes, often requiring heuristic MSA subsampling approaches for generating alternative states. In parallel, cryo-electron microscopy (cryo-EM) has emerged as a powerful tool for imaging near-native structural heterogeneity, but is challenged by arduous pipelines to transform raw experimental data into atomic models. Here, we bridge the gap between these modalities, combining cryo-EM density maps with the rich sequence and biophysical priors learned by protein structure prediction models. Our method, CryoBoltz, guides the sampling trajectory of a pretrained biomolecular structure prediction model using both global and local structural constraints derived from density maps, driving predictions towards conformational states consistent with the experimental data. We demonstrate that this flexible yet powerful inferencetime approach allows us to build atomic models into heterogeneous cryo-EM maps across a variety of dynamic biomolecular systems including transporters and antibodies.
Geometric Algebra-Enhanced Bayesian Flow Network for RNAInverse Design
With the development of biotechnology, RNA therapies have shown great potential. However, different from proteins, the sequences corresponding to a single RNA three-dimensional structure are more abundant. Most of the existing RNA design methods merely take into account the secondary structure of RNA, or are only capable of generating a limited number of candidate sequences. To address these limitations, we propose a geometric-algebra-enhanced Bayesian Flow Network for the inverse design of RNA, called RBFN. RBFN uses a Bayesian Flow Network to model the distribution of nucleotide sequences in RNA, enabling the generation of more reasonable RNA sequences. Meanwhile, considering the more flexible characteristics of RNA conformations, we utilize geometric algebra to enhance the modeling ability of the RNA three-dimensional structure, facilitating a better understanding of RNA structural properties. In addition, due to the scarcity of RNA structures and the limitation that there are only four types of nucleic acids, we propose a new time-step distribution sampling to address the scarcity of RNA structure data and the relatively small number of nucleic acid types. Evaluation on the single-state fixed-backbone re-design benchmark and multi-state fixedbackbone benchmark indicates that RBFN can outperform existing RNA design methods in various RNA design tasks, enabling effective RNA sequence design.
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL
Large language models (LLMs) excel in tasks like question answering and dialogue, but complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning. Reinforcement learning (RL) fine-tuning can enable such planning in principle, but suffers from drawbacks that hinder scalability. In particular, multi-turn RL training incurs high memory and computational costs, which are exacerbated when training LLMs as policies. Furthermore, the largest LLMs do not expose the APIs necessary to be trained in such manner. As a result, modern methods to improve the reasoning of LLMs rely on sophisticated prompting mechanisms rather than RL fine-tuning. To remedy this, we propose a novel approach that uses goal-conditioned value functions to guide the reasoning of LLM agents, that scales even to large API-based models. These value functions predict how a task will unfold given an action, allowing the LLM agent to evaluate multiple possible outcomes, both positive and negative, to plan effectively. In addition, these value functions are trained over reasoning steps rather than full actions, to be a concise and light-weight module that facilitates decisionmaking in multi-turn interactions.
Apple reportedly has three more iOS 27 features coming in the fall
Apple is still holding some of its iOS 27 cards close to its chest. Beyond the upcoming teasers shown off at WWDC 2026, 's Mark Gurman reported that there are three more unannounced features coming with iOS 27. In the newsletter, Gurman said Apple is working on a new watch face, a customizable Camera app and functionality for third-party chatbots within Siri. Gurman said he's unsure why these feature weren't shown off, but some of them are confirmed as they already exist within the developer beta for iOS 27. According to Gurman, a new watch face will likely arrive when Apple refreshes its smartwatches in the fall. Previous rumors detailed that an upcoming watch face could be based on the Modular Ultra option that's exclusive to the Apple Watch Ultra, but one that's more simplified and designed for the other Apple Watch products.
Towards Accurate Time Series Forecasting via Implicit Decoding
Recent booming time series models have demonstrated remarkable forecasting performance. However, these methods often place greater focus on more effectively modelling the historical series, largely neglecting the forecasting phase, which generates long-term forecasts by separately predicting multiple time points. Given that real-world time series typically consist of various long short-term dynamics, independent predictions over individual time points may fail to express complex underlying patterns and can lead to a lack of global views. To address these issues, this work explores new perspectives from the forecasting phase and proposes a novel Implicit Forecaster (IF) as an additional decoding module. Inspired by decomposition forecasting, IF adopts a more nuanced approach by implicitly predicting constituent waves represented by their frequency, amplitude, and phase, thereby accurately forming the time series. Extensive experimental results from multiple real-world datasets show that IF can consistently boost mainstream time series models, achieving state-of-the-art forecasting performance.
TROVE: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models
Vision-language models (VLMs) have made great strides in addressing temporal understanding tasks, which involve characterizing visual changes across a sequence of images. However, recent works have suggested that when making predictions, VLMs may rely on static feature biases, such as background or object features, rather than dynamic visual changes. Static feature biases are a type of shortcut and can contribute to systematic prediction errors on downstream tasks; as a result, identifying and characterizing error-inducing static feature biases is critical prior to real-world model deployment. Existing approaches for identifying such systematic failure modes in trained models (i) are typically designed for nontemporal settings and (ii) are challenging to evaluate in temporal settings due to the lack of quantitative evaluation frameworks. In this work, we address these challenges by introducing TROVE, an automated approach for discovering errorinducing static feature biases learned by temporal VLMs. Given a trained VLM and an annotated validation dataset associated with a downstream classification task, TROVE extracts candidate static features from the dataset and scores each feature by (i) the effect of the feature on classification errors as well as (ii) the extent to which the VLM relies on the feature when making predictions. In order to quantitatively evaluate TROVE, we introduce an evaluation framework consisting of 101 trained temporal VLMs paired with ground-truth annotations for learned static feature biases. We use this framework to demonstrate that TROVE can accurately identify error-inducing static feature biases in VLMs, achieving a 28.6% improvement over the closest baseline. Finally, we apply TROVE to 7 off-the-shelf VLMs and 2 temporal understanding tasks, surfacing previouslyunknown static feature biases and demonstrating that knowledge of learned biases can aid in improving model performance at test time.
0e4b12a79106789483fe6746702f4cb0-Paper-Conference.pdf
As large language models (LLMs) continue to advance, their capacity to function effectively across a diverse range of languages has shown marked improvement. Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts. This has led to the widespread assumption that LLMs may "think" in English.
Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy Phuc Tran VinUniversity Nisheeth K. Vishnoi Yale University Van H. Vu Yale University
A central challenge in machine learning is to understand how noise or measurement errors affect low-rank approximations--particularly in the spectral norm. This question is especially important in differentially private low-rank approximation, where one aims to preserve the top-pstructure of a data-derived matrix while ensuring privacy.
Towards Multi-Table Learning: ANovel Paradigm for Complementarity Quantification and Integration
Multi-table data integrate various entities and attributes, with potential interconnections between them. However, existing tabular learning methods often struggle to describe and leverage the underlying complementarity across distinct tables. To address this limitation, we propose the first unified paradigm for multi-table learning that systematically quantifies and integrates complementary information across tables. Specifically, we introduce a metric called complementarity strength (CS), which captures inter-table complementarity by incorporating relevance, similarity, and informativeness. For the first time, we systematically formulate the paradigm towards multi-table learning by establishing formal definitions of tasks and loss functions. Correspondingly, we present a network for multi-table learning that combines Adaptive Table encoder and Cross table Attention mechanism (ATCANet), achieving the simultaneous integration of complementary information from distinct tables. Experiments show that ATCA-Net effectively leverages complementary information and that the CS metric accurately quantifies the richness of complementarity across multiple tables. To the best of our knowledge, this is the first work to establish theoretical and practical foundations for multi-table learning.