recipe
Realms for Integrated Agent Intelligence
AI agents today are mostly siloed -- they either retrieve and reason over vast amount of digital information and knowledge obtained online; or interact with the physical world through embodied perception, planning and action -- but rarely both. This separation limits their ability to solve tasks that require integrated physical and digital intelligence, such as cooking from online recipes, navigating with dynamic map data, or interpreting real-world landmarks using web knowledge. We introduce EMBODIEDWEBAGENTS, a novel paradigm for AI agents that fluidly bridge embodiment and web-scale reasoning.
Beef tea was all the rage in the 1800s
A cup a day kept the doctor away--at least according to these 19th century remedies. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Australian boxing manager and trainer Tom Maguire gives Australian boxer Dave Sands a cup of beef tea in his bedroom at their London hotel in April 1949. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning
Hybrid language models that combine Attention and State Space Models (SSMs) have been shown to achieve state-of-the-art accuracy and runtime performance. Recent work has also demonstrated that applying pruning and distillation to Attention-only models yields smaller, more accurate models at a fraction of the training cost. In this work, we explore the effectiveness of compressing Hybrid architectures. To this end, we introduce a novel group-aware pruning method for Mamba layers that preserves the structural integrity of SSM blocks and their sequence modeling capabilities. We combine this method with FFN, embedding dimension, and layer pruning, along with knowledge distillation-based retraining to obtain a unified compression recipe for hybrid models. Using this recipe, we compress the Nemotron-H 8B Hybrid model down to 4B parameters with up to $40\times$ fewer training tokens compared to similarly-sized models.
Reinforcement Learning with Action Chunking
Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a *chunked* action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
Co-PatcheR: Collaborative Software Patching with Component-specific Small Reasoning Models
Motivated by the success of general purpose large language models (LLMs) in software patching, recent works started to train specialized patching models. Most works trained one model to handle the end to end patching pipeline (including issue localization, patch generation, and patch validation). However, it is hard for a small model to handle all tasks, as different sub-tasks have different workflows and require different expertise. As such, by using a 70 billion model, SOTA methods can only reach up to 41% resolved rate on SWE-bench-Verified. Motivated by the collaborative nature, we propose Co-PatcheR, the first collaborative patching system with small and specialized reasoning models for individual components.
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
The need for training multilingual multi-task speech processing (MSP) models that perform both automatic speech recognition and speech-to-text translation is increasingly evident. However, a significant challenge arises from the conflicts among multiple objectives when using a single model. Multi-objective optimization can address this challenge by facilitating the optimization of multiple conflicting objectives and aligning the gradient updates in a common descent direction. While multi-objective optimization helps avoid conflicting gradient updates, a critical issue is that when there are many objectives, such as in MSP, it is often {\em difficult to find} a common descent direction. This leads to an important question: Is it more effective to separate highly conflicting objectives into different optimization levels or to keep them in a single level? To address this question, this paper investigates three multi-objective MSP formulations, which we refer to as \textbf{objective soup recipes}. These formulations apply multi-objective optimization at different optimization levels to mitigate potential conflicts among all objectives. To keep computation and memory overhead low, we incorporate a lightweight layer selection strategy that detects the most conflicting layers and uses only their gradients when computing the conflict avoidance direction. We conduct an extensive investigation using the CoVoST v2 dataset for combined multilingual ASR and ST tasks, along with the LibriSpeech and AISHELL-1 datasets for multilingual ASR, to identify highly conflicting objectives and determine the most effective training recipe among the three proposed multi-objective optimization algorithms.
Gemini for Google Home will no longer freak out if you ask it how to make a margarita
Google has updated Gemini for Home so that it no longer acts like a strict parent when you ask it for cocktail recipes. In the past, you may have encountered a message that says I cannot provide recipes for alcoholic beverages when you ask the AI assistant for a margarita recipe on Google smart home devices, such as the Nest Hub . Now, Google has updated its safeguards to prevent adult users from encountering filters meant for younger ones. Adults will now experience improved availability for general queries, including recipes for age-gated beverages, the company said in the Google Home support page . If Gemini still isn't responding when you ask it for instructions on how to make a cocktail, you may have to check you Parental Control settings and your Gemini for Home response filter settings in the Google Home app.
ProteinJEPA: Latent prediction complements protein language models
Ofer, Dan, Shahaf, Dafna, Linial, Michal
Protein language models are trained primarily with masked language modeling (MLM), which predicts amino-acid identities at masked positions. We ask whether latent-space prediction can complement these token-level objectives under matched wall-clock budget. Across pretrained and random-init protein sequence encoders at 35--150M parameters, we find that the best protein-JEPA design is not all-position latent prediction but a variant: predicting latent targets only at masked positions, and retaining the MLM cross-entropy. We call this recipe masked-position MLM+JEPA. On a 16-task downstream suite (15 frozen linear probes plus SCOPe-40 zero-shot fold retrieval), under matched wall-clock budgets, this recipe wins more tasks than it loses against MLM-only continuation: 10 wins / 3 losses / 3 ties (hereafter W/L/T) on pretrained ESM2-35M, 11/2/3 on ESM2-150M while results in pretraining from scratch are mixed (6/8/2). Gains are seen for multiple models on 11 of 16 tasks, including stability, \b{eta}β\b{eta}-lactamase fitness, variant effect, intrinsic disorder, remote homology, enzyme classification, and SCOPe-40 fold retrieval. Tasks with more losses than wins are Fluorescence (TAPE) and Peptide-HLA Binding. All-position MLM+JEPA matches MLM-only overall but does not reproduce the masked-position gains. JEPA-only (no MLM) collapses in nearly every experiment. We conclude that JEPA, when combined with MLM, is competitive and can outperform pure MLM in pretraining and continued training, even under matched wall-clock budgets.
Unleashing the Power of Randomization in Auditing Differentially Private ML
We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. We take a first principles approach based on three key components. First, we introduce Lifted Differential Privacy (LiDP) which expands the definition of differential privacy to handle randomized datasets. This gives us the freedom to design randomized canaries. Second, we audit LiDP by trying to distinguish between the model trained with K canaries versus K 1 canaries in the dataset, leaving one canary out. By drawing the canaries i.i.d., LiDP can leverage the symmetry in the design and reuse each privately trained model to run multiple statistical tests, one for each canary. Third, we introduce novel confidence intervals that take advantage of the multiple test statistics by adapting to the empirical higher-order correlations. Together, this new recipe demonstrates significant improvements in sample complexity, both theoretically and empirically, using synthetic and real data. Further, recent advances in designing stronger canaries can be readily incorporated into the new framework.