Technology
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
We prove rich algebraic structures of the solution space for 2-layer neural networks with quadratic activation and L2 loss, trained on reasoning tasks in Abelian group (e.g., modular addition). Such a rich structure enables analytical construction of global optimal solutions from partial solutions that only satisfy part of the loss, despite its high nonlinearity.
ChartSketcher Reasoning with Feedback and Reflection for Chart Understanding
Charts are high-density visualization carriers for complex data, serving as a crucial medium for information extraction and analysis. Automated chart understanding poses significant challenges to existing multimodal large language models (MLLMs) due to the need for precise and complex visual reasoning. Current step-by-step reasoning models primarily focus on text-based logical reasoning for chart understanding. However, they struggle to refine or correct their reasoning when errors stem from flawed visual understanding, as they lack the ability to leverage multimodal interaction for deeper comprehension. Inspired by human cognitive behavior, we propose ChartSketcher, a multimodal feedback-driven stepby-step reasoning method designed to address these limitations. ChartSketcher is a chart understanding model that employs Sketch-CoT, enabling MLLMs to annotate intermediate reasoning steps directly onto charts using a programmatic sketching library, iteratively feeding these visual annotations back into the reasoning process. This mechanism enables the model to visually ground its reasoning and refine its understanding over multiple steps. We employ a two-stage training strategy: a cold start phase to learn sketch-based reasoning patterns, followed by off-policy reinforcement learning to enhance reflection and generalization. Experiments demonstrate that ChartSketcher achieves promising performance on chart understanding benchmarks and general vision tasks, providing an interactive and interpretable approach to chart comprehension.
ChatGPT can be made to generate sexualised and violent images, researchers find
The latest public version of ChatGPT can be made to generate sexualised images or depict scenes of graphic violence with a simple prompt, researchers have told the BBC. British AI security startup Mindgard figured out how to make ChatGPT create graphic pictures by slightly altering a widely-shared instruction, or prompt, which was originally designed to produce humorous results. After being contacted by the BBC, ChatGPT's maker OpenAI said it had taken action to stop the chatbot responding with those types of images. After investigating this trend, we've introduced additional safeguards against this type of prompt, it said in a statement. It also said it has multiple layers of protection to prevent users making content which breaches its terms and conditions.
'We had to get out of the way': The backlash over delivery robots
'We had to get out of the way': The backlash over delivery robots The first time Chicago resident John Roberts saw a delivery robot trundling down the sidewalk on his street he was impressed. I actually thought they were kind of neat - it felt futuristic, he says. But his attitude started to change when, soon after, he was out for a walk with his family. As another robot approached, they found themselves having to dodge it. To us it felt a little off - the fact that we were on the one strip reserved for walking, and we were having to get out of the way, says Roberts.
MoCha: Towards Movie-Grade Talking Character Generation
Recent advancements in video generation have achieved impressive motion realism, yet they often overlook character-driven storytelling, a crucial task for automated film, animation generation. We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text.
Simultaneous Statistical Inference for Off-Policy Evaluation in Reinforcement Learning
This work presents the first theoretically justified simultaneous inference framework for off-policy evaluation (OPE). In contrast to existing methods that focus on point estimates or pointwise confidence intervals (CIs), the new framework quantifies global uncertainty across an infinite or continuous initial state space, offering valid inference over the entire state space.
The Korean Telecom Giant at the Center of Anthropic's Mythos Controversy
Days before Anthropic took its most advanced AI models offline, the White House ordered the company to revoke SK Telecom's access to Claude Mythos over claims of alleged ties to China. The Trump administration's move to impose export controls on Anthropic's most powerful AI technology followed a spat over the company granting South Korean telecom giant SK Telecom access to its Claude Mythos model, according to people familiar with the matter. US officials were concerned about what they alleged were SK Telecom's ties to China, those people said. Those concerns appear to have compounded when Amazon later flagged vulnerabilities it identified in Fable 5 to the White House. Fable 5 is a highly safeguarded version of Mythos that Anthropic released to the public on June 9.
Chain of Execution Supervision Promotes General Reasoning in Large Language Models
Building robust and general reasoning ability is a central goal in the development of large language models (LLMs). Recent efforts increasingly turn to code as a rich training source, given its inherent logical structure and diverse reasoning paradigms--such as divide-and-conquer, topological ordering, and enumeration. However, reasoning in code is often expressed implicitly and entangled with syntactic or implementation noise, making direct training on raw code suboptimal. To address this, we introduce TraceMind, a large-scale corpus of 2.6 million samples that transforms code execution into explicit, step-by-step chain-of-thought style rationales, which we call Chain of Execution (CoE). The corpus spans domains including mathematics, classical algorithms and algorithmic competition, and is enriched with variable-tracing questions and code rewritings to enhance logical granularity and code diversity. We evaluate Tracepile using three training setups--continue-pretraining, instruction tuning after pretraining, and two-stage finetuning. Experiments across four base models (LLaMA 3, LLaMA 3.1, Qwen-2.5, and Qwen-2.5 Coder) and 20 benchmarks covering math, code, logic, and algorithms demonstrate consistent improvements. Notably, Tracepile boosts LLaMA3-8B by 9.2\% on average across nine math datasets and delivers clear gains on LiveCodeBench, CRUX, and Zebra Logic under two-stage finetuning.
Multi-Kernel Correlation-Attention Vision Transformer for Enhanced Contextual Understanding and Multi-Scale Integration
Significant progress has been achieved using Vision Transformers (ViTs) in computer vision. However, challenges persist in modeling multi-scale spatial relationships, hindering effective integration of fine-grained local details and longrange global dependencies. To address this limitation, a Multi-Kernel CorrelationAttention Vision Transformer (MK-CAViT) grounded in the Hirschfeld-GebeleinRényi (HGR) theory was proposed, introducing three key innovations. A parallel multi-kernel architecture was utilized to extract multi-scale features through small, medium, and large kernels, overcoming the single-scale constraints of conventional ViTs. The cross-scale interactions were enhanced through the Fast-HGR attention mechanism, which models nonlinear dependencies and applies adaptive scaling to weigh connections and refine contextual reasoning. Additionally, a stable multi-scale fusion strategy was adopted, integrating dynamic normalization and staged learning to mitigate gradient variance, progressively fusing local and global contexts, and improving training stability.
Conformal Arbitrage: Risk-Controlled Balancing of Competing Objectives in Language Models
Modern language-model deployments must often balance competing objectives--for example, helpfulness versus harmlessness, cost versus accuracy, and reward versus safety. We introduce Conformal Arbitrage, a post-hoc framework that learns a data-driven threshold to mediate between a Primary model optimized for a primary objective and a more conservative Guardian--which could be another model or a human domain expert--aligned with a guardrail objective. The threshold is calibrated with conformal risk control, yielding finite-sample, distribution-free guarantees that the long-run frequency of undesirable events (such as factual errors or safety violations) does not exceed a user-specified quota. Because Conformal Arbitrage operates wholly at the API level--without requiring access to model logits or updating model weights--it complements weight-based alignment techniques and integrates seamlessly with existing cost-aware cascades. Empirically, Conformal Arbitrage traces an efficient frontier, allowing users to define an acceptable performance level for one objective while maximizing utility in another. We observe that our method outperforms (in terms of accuracy on multiple-choice style questions) cost-matched random routing between models. These properties make Conformal Arbitrage a practical, theoretically grounded tool for trustworthy and economical deployment of large language models across a broad range of potentially competing objectives.