Genre
745-mile whale graveyard found at the bottom of Indian Ocean
A 5.3-million-year old fossil was lurking inside this extensive whale fall. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Fragmentary whale skeletal remains are abundant on the deep seafloor of the Diamantina Zone, reflecting long-term exposure and slow carcass degradation. These bones are typically colonised by hard-substrate animals, including stalked sea anemones, sponges, and sea stars. Photographs taken from the Chinese submersible'Fendouzhe.' Breakthroughs, discoveries, and DIY tips sent six days a week.
Multi-Agent Imitation by Learning and Sampling from Factorized Soft Q-Function
Learning from multi-agent expert demonstrations, known as Multi-Agent Imitation Learning (MAIL), provides a promising approach to sequential decision-making. However, existing MAIL methods including Behavior Cloning (BC) and Adversarial Imitation Learning (AIL) face significant challenges: BC suffers from the compounding error issue, while the very nature of adversarial optimization makes AIL prone to instability. In this work, we propose \textbf{M}ulti-\textbf{A}gent imitation by learning and sampling from \textbf{F}actor\textbf{I}zed \textbf{S}oft Q-function (MAFIS), a novel method that addresses these limitations for both online and offline MAIL settings. Built upon the single-agent IQ-Learn framework, MAFIS introduces the value decomposition network to factorize the imitation objective at agent level, thus enabling scalable training for multi-agent systems. Moreover, we observe that the soft Q-function implicitly defines the optimal policy as an energy-based model, from which we can sample actions via stochastic gradient Langevin dynamics. This allows us to estimate the gradient of the factorized optimization objective for continuous control tasks, avoiding the adversarial optimization between the soft Q-function and the policy required by prior work. By doing so, we obtain a tractable and \emph{non-adversarial} objective for both discrete and continuous multi-agent control. Experiments on common benchmarks including the discrete control tasks StarCraft Multi-Agent Challenge v2 (SMACv2), Gold Miner, and Multi Particle Environments (MPE), as well as the continuous control task Multi-Agent MuJoCo (MaMuJoCo), demonstrate that MAFIS achieves superior performance compared with baselines. Our code is available at https://github.com/LAMDA-RL/MAFIS.
Could raccoons become the new dogs?
Could raccoons become the new dogs? They're undeniably cute, but they'd also be a pretty annoying pet. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .
ModelโBehavior Alignment under Flexible Evaluation: When the Best-Fitting Model Isn't the Right One
Linearly transforming stimulus representations of deep neural networks yields high-performing models of behavioral and neural responses to complex stimuli. But does the test accuracy of such predictions identify genuine representational alignment? We addressed this question through a large-scale model-recovery study. Twenty diverse vision models were linearly aligned to 4.5 million behavioral judgments from the THINGS odd-one-out dataset and calibrated to reproduce human response variability. For each model in turn, we sampled synthetic responses from its probabilistic predictions, fitted all candidate models to the synthetic data, and tested whether the data-generating model would re-emerge as the best predictor of the simulated data. Model recovery accuracy improved with training-set size but plateaued below 80%, even at millions of simulated trials. Regression analyses linked misidentification primarily to shifts in representational geometry induced by the linear transformation, as well as to the effective dimensionality of the transformed features. These findings demonstrate that, even with massive behavioral data, overly flexible alignment metrics may fail to guide us toward artificial representations that are genuinely more human-aligned. Model comparison experiments must be designed to balance the trade-off between predictive accuracy and identifiability--ensuring that the best-fitting model is also the right one.
Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression
State-space models (SSMs), particularly Mamba, emerge as an efficient Transformer alternative with linear complexity for long-sequence modeling. Recent empirical works demonstrate Mamba's in-context learning (ICL) capabilities competitive with Transformers, a critical capacity for large foundation models. However, theoretical understanding of Mamba's ICL remains limited, restricting deeper insights into its underlying mechanisms. Even fundamental tasks such as linear regression ICL, widely studied as a standard theoretical benchmark for Transformers, have not been thoroughly analyzed in the context of Mamba. To address this gap, we study the training dynamics of Mamba on the linear regression ICL task. By developing novel techniques tackling non-convex optimization with gradient descent related to Mamba's structure, we establish an exponential convergence rate to ICL solution, and derive a loss bound that is comparable to Transformer's. Importantly, our results reveal that Mamba can perform a variant of \textit{online gradient descent} to learn the latent function in context. This mechanism is different from that of Transformer, which is typically understood to achieve ICL through gradient descent emulation. The theoretical results are verified by experimental simulation.
Diffusion Beats Autoregressive in Data-Constrained Settings
Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings--where training involves repeated passes over limited data--and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We find new scaling laws for diffusion models and derive a closed-form expression for the critical compute threshold at which diffusion begins to outperform AR. Finally, we explain why diffusion models excel in this regime: their randomized masking objective implicitly trains over a rich distribution of token orderings, acting as an implicit data augmentation that AR's fixed left-to-right factorization lacks. Our results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm.
Get officially certified in Claude AI for just 19.99
When you purchase through links in our articles, we may earn a small commission. Get officially certified in Claude AI for just $19.99 A Claude AI Professional E-Degree is on sale for $19.99 (reg. AI skills are no longer a nice-to-have. A verifiable credential in one of the most popular AI models on the market is a real resume differentiator, and right now, you can get an e-degree in Claude for just $19.99 (reg. While plenty of people have dabbled with Claude, there's a big difference between "I've used it a few times" and actually knowing how to make it work for you.
Baltic states fear Russia-Ukraine war spillover after drone incursions
Recent incidents heighten anxieties that hybrid warfare tactics could trigger military confrontation with Russia. Lithuanian armed special forces and members of the Lithuanian Riflemen's Union take part in a military exercise in central Lithuania [File: Nils Adler/Al Jazeera] A member of the Lithuanian Riflemen's Union joins in military exercises in central Lithuania [File: Nils Adler/Al Jazeera] Along the forests and marshlands that separate the Baltic states from Russia and Belarus, workers are digging anti-tank ditches, pouring concrete bunkers and erecting rows of dragon's teeth - jagged concrete obstacles designed to slow and channel advancing armour - to buy precious time in the event of an attack. Russia's full-scale invasion of Ukraine in 2022 reignited old fears in Estonia, Latvia and Lithuania, where memories of Soviet rule remain close to the surface. In the years since, those fears have been channelled into preparation. Defence budgets have surged, military exercises have intensified, and new fortifications have emerged even as daily life largely continues as normal.
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function, either, as the worst-case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $\epsilon$ sub-optimality and a feasible policy after $O(\epsilon^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search; thus, we reduce the computation time and achieve a better performance, especially for continuous state-space.
Investigating and Mitigating Catastrophic Forgetting in Medical Knowledge Injection through Internal Knowledge Augmentation Learning
Large Language Models (LLMs) are expected to possess comprehensive medical knowledge to support real-world clinical applications. While domain-specific fine-tuning effectively injects medical knowledge into LLMs, it often causes catastrophic forgetting of previously acquired knowledge and instruction-following capabilities. In this paper, we investigate this issue and reveal a pattern of proximity-dependent forgetting: knowledge that is semantically or topically close to the injected content is more likely to be forgotten, while unrelated knowledge shows minimal degradation. Moreover, we observe that existing mitigation techniques fail to address this type of forgetting effectively. Motivated by this observation and inspired by human learning mechanisms, we proposeInternAL (\Internal Knowledge Augmentation Learning), a novel approach that leverages LLMs' own internal knowledge to mitigate forgetting.