AITopics

2508.01173

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

MIT Technology ReviewDec-3-2025, 13:10:00 GMT

The Download: AI and coding, and Waymo's aggressive driverless cars

Plus: the FDA's newly-appointed head drug regulator is out AI has already transformed how code is written, but a new wave of autonomous systems promise to make the process even smoother and less prone to making mistakes. Amazon Web Services has just revealed three new "frontier" AI agents, its term for a more sophisticated class of autonomous agents capable of working for days at a time without human intervention. One of them, called Kiro, is designed to work independently without the need for a human to constantly point it in the right direction. Another, AWS Security Agent, scans a project for common vulnerabilities: an interesting development given that many AI-enabled coding assistants can end up introducing errors. To learn more about the exciting direction AI-enhanced coding is heading in, check out our team's reporting: Are we ready for what could happen next? They remember previous sessions and continuously learn from a company's codebase.

artificial intelligence, machine learning, social media, (15 more...)

MIT Technology Review

Country:

Asia > China (0.06)
North America > United States > Missouri (0.05)
North America > United States > Massachusetts (0.05)

Industry:

Health & Medicine (0.94)
Information Technology > Services (0.50)
Transportation > Passenger (0.42)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Makur, Anuran, Singh, Japneet

Hypothesis Testing for Generalized Thurstone Models

arXiv.org Machine LearningDec-3-2025

In this work, we develop a hypothesis testing framework to determine whether pairwise comparison data is generated by an underlying \emph{generalized Thurstone model} $\mathcal{T}_F$ for a given choice function $F$. While prior work has predominantly focused on parameter estimation and uncertainty quantification for such models, we address the fundamental problem of minimax hypothesis testing for $\mathcal{T}_F$ models. We formulate this testing problem by introducing a notion of separation distance between general pairwise comparison models and the class of $\mathcal{T}_F$ models. We then derive upper and lower bounds on the critical threshold for testing that depend on the topology of the observation graph. For the special case of complete observation graphs, this threshold scales as $Θ((nk)^{-1/2})$, where $n$ is the number of agents and $k$ is the number of comparisons per pair. Furthermore, we propose a hypothesis test based on our separation distance, construct confidence intervals, establish time-uniform bounds on the probabilities of type I and II errors using reverse martingale techniques, and derive minimax lower bounds using information-theoretic methods. Finally, we validate our results through experiments on synthetic and real-world datasets.

pairwise comparison model, theorem 3, threshold, (12 more...)

arXiv.org Machine Learning

2512.02912

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada > British Columbia > Vancouver (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(11 more...)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Xiong, Yuan, Miao, Ziqi, Li, Lijun, Qian, Chen, Li, Jie, Shao, Jing

While Multimodal Large Language Models (MLLMs) show remarkable capabilities, their safety alignments are susceptible to jailbreak attacks. Existing attack methods typically focus on text-image interplay, treating the visual modality as a secondary prompt. This approach underutilizes the unique potential of images to carry complex, contextual information. To address this gap, we propose a new image-centric attack method, Contextual Image Attack (CIA), which employs a multi-agent system to subtly embeds harmful queries into seemingly benign visual contexts using four distinct visualization strategies. To further enhance the attack's efficacy, the system incorporate contextual element enhancement and automatic toxicity obfuscation techniques. Experimental results on the MMSafetyBench-tiny dataset show that CIA achieves high toxicity scores of 4.73 and 4.83 against the GPT-4o and Qwen2.5-VL-72B models, respectively, with Attack Success Rates (ASR) reaching 86.31\% and 91.07\%. Our method significantly outperforms prior work, demonstrating that the visual modality itself is a potent vector for jailbreaking advanced MLLMs.

harmful query, large language model, machine learning, (19 more...)

2512.02973

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (0.67)
Law Enforcement & Public Safety (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

VLM as Strategist: Adaptive Generation of Safety-critical Testing Scenarios via Guided Diffusion

Wu, Xinzheng, Chen, Junyi, Zhong, Naiting, Shen, Yong

Autonomous driving technology is spearheading a transformation in the global automotive industries, and its safe and reliable implementation is the core prerequisite for large-scale adoption (Ren et al., 2025). Comprehensive testing and evaluation of autonomous driving systems (ADSs) are essential to ensuring their safety, in which the identification and generation of safety-critical scenarios represent a core challenge (Yang et al., 2025). "Safety-critical scenarios" specifically refer to rare driving situations with potentially high risks (Ding et al., 2023). Conducting tests under such scenarios enables effective evaluation of the ADSs' safety performance, as well as the clarification and iterative refinement of its Operational Design Domain (ODD). However, due to the rarity of safety-critical scenarios in naturalistic driving environments (Feng et al., 2023), real-world road testing is inefficient and cost-prohibitive, making it unsuitable for large-scale testing of high-level ADSs. As a more efficient and practical solution, simulation-based testing has garnered significant industrial and scholarly attention (Sun et al., 2022). In recent years, engineers in enterprises generally extract safety-critical testing scenarios by directly replaying vehicle-collected data in simulation environments (Liu et al., 2024), while some researchers achieve accelerated sampling of safety-critical scenarios through optimization-based search within a predefined scenario parameter space (Wu et al., 2024, 2026). However, the background vehicles (BVs) in the safety-critical testing scenarios generated by the aforementioned methods exhibit fixed behaviors and cannot dynamically respond to the actions of the vehicle under test (VUT). As a remedy, some other studies have introduced reinforcement learning to train adversarial BV driver models, thereby constructing naturalistic adversarial driving environments (NADE) (Feng et al., 2021) or evolving scenarios (Ma et al., 2024; Wu et al., 2025).

large language model, machine learning, natural language, (19 more...)

2512.02844

Country: North America > United States (0.93)

Genre:

Research Report (1.00)
Workflow (0.94)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(2 more...)

Enhancing Automated Paper Reproduction via Prompt-Free Collaborative Agents

Lin, Zijie, Cai, Qilin, Shen, Liang, Xiao, Mingjun

Automated paper reproduction has emerged as a promising approach to accelerate scientific research, employing multi-step workflow frameworks to systematically convert academic papers into executable code. However, existing frameworks often lack mechanisms to verify and refine the outputs at each generation step, or rely heavily on manually designed prompts for self-refinement, which limits their adaptability and scalability. To address these limitations, we propose a prompt-free collaborative agent framework that automatically enhances the quality of paper-to-code generation. Our approach employs two collaborative agents: a verification agent that examines whether the outputs at each step satisfy the requirements specified in the corresponding system prompt, and a refinement agent that revises the outputs based on the identified issues. Unlike previous methods that require human experts to craft specific refinement prompts for each step, our framework achieves automatic verification and improvement by leveraging only the original system prompts. We integrate our collaborative agents into the Paper2Code framework and conduct comprehensive experiments on PaperBench Code-Dev and Paper2CodeBench datasets. Experimental results demonstrate that our approach significantly improves the accuracy and completeness of reproduced code, achieving performance gains of approximately 15\% and 13\%, respectively, compared to the baseline without our agents. Furthermore, comparative experiments against Self-Refine validate the robustness and consistency of our prompt-free approach across different datasets.

agent, artificial intelligence, arxiv preprint arxiv, (13 more...)

2512.02812

Country: Asia > China (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

CogDrive: Cognition-Driven Multimodal Prediction-Planning Fusion for Safe Autonomy

Huang, Heye, Yang, Yibin, Fan, Mingfeng, Wang, Haoran, Zhao, Xiaocong, Wang, Jianqiang

Safe autonomous driving in mixed traffic requires a unified understanding of multimodal interactions and dynamic planning under uncertainty. Existing learning based approaches struggle to capture rare but safety critical behaviors, while rule based systems often lack adaptability in complex interactions. To address these limitations, CogDrive introduces a cognition driven multimodal prediction and planning framework that integrates explicit modal reasoning with safety aware trajectory optimization. The prediction module adopts cognitive representations of interaction modes based on topological motion semantics and nearest neighbor relational encoding. With a differentiable modal loss and multimodal Gaussian decoding, CogDrive learns sparse and unbalanced interaction behaviors and improves long horizon trajectory prediction. The planning module incorporates an emergency response concept and optimizes safety stabilized trajectories, where short term consistent branches ensure safety during replanning cycles and long term branches support smooth and collision free motion under low probability switching modes. Experiments on Argoverse2 and INTERACTION datasets show that CogDrive achieves strong performance in trajectory accuracy and miss rate, while closed loop simulations confirm adaptive behavior in merge and intersection scenarios. By combining cognitive multimodal prediction with safety oriented planning, CogDrive offers an interpretable and reliable paradigm for safe autonomy in complex traffic.

machine learning, prediction, trajectory, (20 more...)

2512.02777

Country: North America > United States (0.67)

Genre: Research Report (0.82)

Industry: Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Self-Improving AI Agents through Self-Play

Chojecki, Przemyslaw

We extend the moduli-theoretic framework of psychometric batteries to the domain of dynamical systems. While previous work established the AAI capability score as a static functional on the space of agent representations, this paper formalizes the agent as a flow $ν_r$ parameterized by computational resource $r$, governed by a recursive Generator-Verifier-Updater (GVU) operator. We prove that this operator generates a vector field on the parameter manifold $Θ$, and we identify the coefficient of self-improvement $κ$ as the Lie derivative of the capability functional along this flow. The central contribution of this work is the derivation of the Variance Inequality, a spectral condition that is sufficient (under mild regularity) for the stability of self-improvement. We show that a sufficient condition for $κ> 0$ is that, up to curvature and step-size effects, the combined noise of generation and verification must be small enough. We then apply this formalism to unify the recent literature on Language Self-Play (LSP), Self-Correction, and Synthetic Data bootstrapping. We demonstrate that architectures such as STaR, SPIN, Reflexion, GANs and AlphaZero are specific topological realizations of the GVU operator that satisfy the Variance Inequality through filtration, adversarial discrimination, or grounding in formal systems.

large language model, machine learning, natural language, (18 more...)

2512.02731

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Bisconti, Piercosma, Galisai, Marcello, Pierucci, Federico, Bracale, Marcantonio, Prandi, Matteo

Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions

This paper examines why safety mechanisms designed for human-model interaction do not scale to environments where large language models (LLMs) interact with each other. Most current governance practices still rely on single-agent safety containment, prompts, fine-tuning, and moderation layers that constrain individual model behavior but leave the dynamics of multi-model interaction ungoverned. These mechanisms assume a dyadic setting: one model responding to one user under stable oversight. Yet research and industrial development are rapidly shifting toward LLM-to-LLM ecosystems, where outputs are recursively reused as inputs across chains of agents. In such systems, local compliance can aggregate into collective failure even when every model is individually aligned. We propose a conceptual transition from model-level safety to system-level safety, introducing the framework of the Emergent Systemic Risk Horizon (ESRH) to formalize how instability arises from interaction structure rather than from isolated misbehavior. The paper contributes (i) a theoretical account of collective risk in interacting LLMs, (ii) a taxonomy connecting micro, meso, and macro-level failure modes, and (iii) a design proposal for InstitutionalAI, an architecture for embedding adaptive oversight within multi-agent systems.

artificial intelligence, large language model, natural language, (16 more...)

2512.02682

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.47)

IACT: A Self-Organizing Recursive Model for General AI Agents: A Technical White Paper on the Architecture Behind kragent.ai

Lu, Pengju

This technical white paper introduces the Interactive Agents Call Tree (IACT), a computational model designed to address the limitations of static, hard-coded agent workflows. Unlike traditional systems that require pre-defined graphs or specialized programming, IACT operates as a general-purpose autonomous system driven purely by user dialogue. Given a high-level objective, the system autonomously grows a dynamic, recursive agent topology incrementally tailored to the problem's structure. This allows it to scale its organizational complexity to match open-ended tasks. To mitigate the error propagation inherent in unidirectional function calls, IACT introduces interactional redundancy by replacing rigid invocations with bidirectional, stateful dialogues. This mechanism enables runtime error correction and ambiguity resolution. We describe the architecture, design principles, and practical lessons behind the production deployment of this model in the kragent.ai system, presenting qualitative evidence from real-world workflows rather than exhaustive benchmark results.

agent, artificial intelligence, machine learning, (18 more...)

2512.02605

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)