AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsFeb-16-2026, 22:47:39 GMT

INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness Hung Le

INDICT: a new framework that empowers LLMs with Internal Dialogues of Critiques for both safety and helpfulness guidance.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-16-2026, 03:42:00 GMT

ToolQA: A Dataset for LLM Question Answering with External Tools

Our development of ToolQA involved a scalable, automated process for dataset curation, along with 13 specialized tools designed for interaction with external knowledge in order to answer questions.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Country:

North America > Canada (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Consumer Products & Services > Travel (0.93)
Information Technology > Security & Privacy (0.93)
Transportation > Passenger (0.67)
Transportation > Air (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)

WIREDJan-19-2026, 11:30:15 GMT

Dumbphone Owners Have Lost Their Minds

All my Gen Z friends want to ditch their smartphones. But there's more at stake than they think. My friend Lilah is the crunchiest person I know. She refuses to kill bugs and rats. She once made me try her homemade wine (disastrous). A few years ago, she quit her food-justice nonprofit job to live in a yurt, and after that she went to grad school and moved into an attic, where her roommates were squirrels. Against her will, she did own an iPhone for a time.

brain, lilah, smartphone, (14 more...)

WIRED

Country:

South America > Venezuela > Capital District > Caracas (0.04)
North America > United States > California (0.04)
North America > Central America (0.04)
(2 more...)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.70)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.70)

arXiv.org Artificial IntelligenceOct-30-2025

OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning

Hu, Ziyou, Shi, Zhengliang, Zhu, Minghang, Li, Haitao, Sun, Teng, Ren, Pengjie, Verberne, Suzan, Ren, Zhaochun

Reward models (RMs) have become essential for aligning large language models (LLMs), serving as scalable proxies for human evaluation in both training and inference. However, existing RMs struggle on knowledge-intensive and long-form tasks, where evaluating correctness requires grounding beyond the model's internal knowledge. This limitation hinders them from reliably discriminating subtle quality differences, especially when external evidence is necessary. To address this, we introduce OpenRM, a tool-augmented long-form reward model that systematically judges open-ended responses by invoking external tools to gather relevant evidence. We train OpenRM with Group Relative Policy Optimization (GRPO) on over 27K synthesized pairwise examples generated through a controllable data synthesis framework. The training objective jointly supervises intermediate tool usage and final outcome accuracy, incentivizing our reward model to learn effective evidence-based judgment strategies. Extensive experiments on three newly-collected datasets and two widely-used benchmarks demonstrate that OpenRM substantially outperforms existing reward modeling approaches. As a further step, we integrate OpenRM into both inference-time response selection and training-time data selection. This yields consistent gains in downstream LLM alignment tasks, highlighting the potential of tool-augmented reward models for scaling reliable long-form evaluation.

large language model, machine learning, natural language, (18 more...)

2510.24636

Country:

Europe (0.28)
Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Lee, Kuan-Yi, Lin, Tsung-En, Lee, Hung-Yi

Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning

arXiv.org Artificial IntelligenceOct-14-2025

Recent advancements in large multimodal models (LMMs) have shown strong capabilities in audio understanding. However, most systems rely solely on end-to-end reasoning, limiting interpretability and accuracy for tasks that require structured knowledge or specialized signal analysis. In this work, we present Audio-Maestro -- a tool-augmented audio reasoning framework that enables audio-language models to autonomously call external tools and integrate their timestamped outputs into the reasoning process. This design allows the model to analyze, transform, and interpret audio signals through specialized tools rather than relying solely on end-to-end inference. Experiments show that Audio-Maestro consistently improves general audio reasoning performance: Gemini-2.5-flash's average accuracy on MMAU-Test rises from 67.4% to 72.1%, DeSTA-2.5 from 58.3% to 62.8%, and GPT-4o from 60.8% to 63.9%. To our knowledge, Audio-Maestro is the first framework to integrate structured tool output into the large audio language model reasoning process.

large language model, machine learning, natural language, (19 more...)

2510.11454

Country: Asia > Taiwan (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

arXiv.org Artificial IntelligenceOct-13-2025

RA-Gen: A Controllable Code Generation Framework Using ReAct for Multi-Agent Task Execution

Liu, Aofan, Li, Haoxuan, Wang, Bin, Yang, Ao, Li, Hui

Code generation models based on large language models (LLMs) have gained wide adoption, but challenges remain in ensuring safety, accuracy, and controllability, especially for complex tasks. Existing methods often lack dynamic integration of external tools, transparent reasoning, and user control over safety. To address these issues, we propose a controllable code generation framework utilizing the ReAct paradigm for multi-agent task execution. This framework is a multi-agent system designed to enable efficient, precise, and interpretable code generation through dynamic interactions between LLMs and external resources. The framework adopts a collaborative architecture comprising four specialized agents: a Planner for task decomposition, a Searcher that leverages the ReAct framework for reasoning and tool integration, a CodeGen agent for accurate code generation, and an Extractor for structured data retrieval. The ReAct-based Searcher alternates between generating reasoning traces and executing actions, facilitating seamless integration of internal knowledge with external tools (such as search engines) to enhance accuracy and user control. Experimental results show the framework's effectiveness across multiple languages, achieving a 94.8% security rate on the SVEN dataset with CodeQL, outperforming existing approaches. Its transparent reasoning process fosters user trust and improves controllability.

large language model, machine learning, natural language, (18 more...)

2510.08665

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Neural Information Processing SystemsOct-10-2025, 11:07:51 GMT

INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness Hung Le

INDICT: a new framework that empowers LLMs with Internal Dialogues of Critiques for both safety and helpfulness guidance.

arxiv preprint arxiv, indict, language model, (14 more...)

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 02:44:15 GMT

ToolQA: A Dataset for LLM Question Answering with External Tools Y uchen Zhuang, Yue Y u, Kuan Wang, Haotian Sun, Chao Zhang College of Computing, Georgia Institute of Technology, Atlanta GA

Our development of ToolQA involved a scalable, automated process for dataset curation, along with 13 specialized tools designed for interaction with external knowledge in order to answer questions.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.40)
North America > Canada (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Consumer Products & Services > Travel (0.93)
Information Technology > Security & Privacy (0.93)
Transportation > Passenger (0.67)
Transportation > Air (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)

arXiv.org Artificial IntelligenceOct-3-2025

A Study on the MCP x A2A Framework for Enhancing Interoperability of LLM-based Autonomous Agents

Jeong, Cheonsu

This paper provides an in-depth technical analysis and implementation methodology of the open-source Agent-to-Agent (A2A) protocol developed by Google and the Model Context Protocol (MCP) introduced by Anthropic. While the evolution of LLM-based autonomous agents is rapidly accelerating, efficient interactions among these agents and their integration with external systems remain significant challenges. In modern AI systems, collaboration between autonomous agents and integration with external tools have become essential elements for building practical AI applications. A2A offers a standardized communication method that enables agents developed in heterogeneous environments to collaborate effectively, while MCP provides a structured I/O framework for agents to connect with external tools and resources. Prior studies have focused primarily on the features and applications of either A2A or MCP individually. In contrast, this study takes an integrated approach, exploring how the two protocols can complement each other to address interoperability issues and facilitate efficient collaboration within complex agent ecosystems.

agent, artificial intelligence, natural language, (15 more...)

doi: 10.13088/jiis.2025.31.3.141

2506.01804

Genre:

Research Report (1.00)
Workflow (0.69)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)