AITopics | search tool

Collaborating Authors

search tool

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Google's AI Searches Love to Refer You Back to Google

WIREDMar-13-2026, 12:31:55 GMT

The app reads your email inbox and your meeting calendar, then gives you a short audio summary. It can help you spend less time scrolling, but of course, there are privacy drawbacks to consider.

artificial intelligence, machine learning, natural language, (22 more...)

WIRED

Country:

North America > United States > California (0.15)
Asia > Middle East > Iran (0.05)
Europe > Slovakia (0.05)
(3 more...)

Industry:

Information Technology > Security & Privacy (0.48)
Information Technology > Services (0.33)

Technology:

Information Technology > Communications (1.00)
Information Technology > Information Management > Search (0.79)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Look It Up: Analysing Internal Web Search Capabilities of Modern LLMs

Kale, Sahil

arXiv.org Artificial IntelligenceNov-25-2025

Modern large language models integrate web search to provide real-time answers, yet it remains unclear whether they are efficiently calibrated to use search when it is actually needed. We introduce a benchmark evaluating both the necessity and effectiveness of web access across commercial models with no access to internal states or parameters. The dataset includes a static split of 783 temporally anchored questions answerable from pre-cutoff knowledge, aimed at testing whether models invoke search based on low internal confidence, and a dynamic split of 288 post-cutoff queries designed to test whether models recognise when search is required and retrieve updated information. Web access substantially improves static accuracy for GPT-5-mini and Claude Haiku 4.5, though confidence calibration worsens. On dynamic queries, both models frequently invoke search yet remain below 70 percent accuracy due to weak query formulation. Costs per accuracy-improving call remain low, but returns diminish once initial retrieval fails. Selective invocation helps, but models become overconfident and inconsistent after search. Overall, built-in web search meaningfully improves factual accuracy and can be invoked selectively, yet models remain overconfident, skip retrieval when it is essential, and falter once initial search queries underperform. Taken together, internal web search works better as a good low-latency verification layer than a reliable analytical tool, with clear room for improvement.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.18931

Country:

Asia (0.69)
North America > United States (0.46)
North America > Mexico (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

InfoAgent: Advancing Autonomous Information-Seeking Agents

Zhang, Gongrui, Zhu, Jialiang, Yang, Ruiqi, Qiu, Kai, Zhang, Miaosen, Wu, Zhirong, Dai, Qi, Liu, Bei, Luo, Chong, Yang, Zhengyuan, Li, Linjie, Wang, Lijuan, Chen, Weizhu, Zhang, Yuan, Li, Xin, Liu, Zhaoyi, Geng, Xin, Guo, Baining

arXiv.org Artificial IntelligenceSep-30-2025

Building Large Language Model agents that expand their capabilities by interacting with external tools represents a new frontier in AI research and applications. In this paper, we introduce InfoAgent, a deep research agent powered by an innovative data synthesis pipeline and orchestrated web search tools. To construct challenging, hard-to-find queries, we build entity trees and apply sub-tree sampling with entity fuzzification to systematically increase question difficulty. Unlike prior work that relies heavily on commercial search tools, we develop a dedicated self-hosted search infrastructure, enhancing transparency of agent environments and facilitating further advancement of agent capacity. We evaluate the effectiveness of our data pipeline by measuring the average number of tool calls required to correctly answer a question, and also show that our agent yields better performance when equipped with our tools. Our InfoAgent is post-trained from Qwen3-14B using a two-stage recipe: cold-start supervised finetun-ing to instill long-horizon search behaviors, followed by reinforcement learning which significantly improves reasoning-driven tool use. With our methods, InfoAgent achieves 15.3% accuracy on BrowseComp, 29.2% on BrowseComp-ZH, and 40.4% on Xbench-DS, outperforming prior open-source deep research agents such as WebSailor-72B and DeepDive-32B. The Internet has revolutionized the way people acquire knowledge, yet the tools that mediate access to online information have evolved unevenly (Zhang et al., 2025). Recently, researchers have enhanced Large Language Models (LLMs) with agentic capabilities via Reinforcement Learning (RL), which allows them to autonomously plan, search, and learn in an ongoing loop (OpenAI, 2025b). Deep Research Agents (DRAs) are distinguished by their ability to plan, reason, execute multi-step information-seeking actions, such as retrieving documents from the Internet via given tools, and complete complex research tasks. Recognizing their potential, major AI providers have raced to deliver commercial implementations (OpenAI, 2025a; Perplexity, 2025; xAI, 2025a; Google, 2025). This phenomenon shows that deep research is becoming a defining feature of next-generation information platforms. The implementation of DRA faces two challenges: effective strategy for data synthesis and the establishment of an efficient interactive environment. Existing open-source DRAs often perform shallow searches, mainly because they are trained on relatively simple data (Jin et al., 2025; Li et al., 2025c). Training dataset must encompass a broad range of data, which is of various uncertain types, so that the agent is forced to link disparate pieces of information and infer new knowledge when retrieving documents. Meanwhile, some agents are trained in simulated environments, which are underpowered when confronted with challenging real-world problems (Jin et al., 2025).

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.25189

Country: North America > United States > Virginia > Arlington County (0.15)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Sports > Baseball (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation

Wang, Yuechen, Qiao, Yuming, Meng, Dan, Yang, Jun, Lu, Haonan, Yang, Zhenyu, Zhang, Xudong

arXiv.org Artificial IntelligenceAug-13-2025

Multimodal Retrieval-Augmented Generation (mRAG) has emerged as a promising solution to address the temporal limitations of Multimodal Large Language Models (MLLMs) in real-world scenarios like news analysis and trending topics. However, existing approaches often suffer from rigid retrieval strategies and under-utilization of visual information. To bridge this gap, we propose E-Agent, an agent framework featuring two key innovations: a mRAG planner trained to dynamically orchestrate multimodal tools based on contextual reasoning, and a task executor employing tool-aware execution sequencing to implement optimized mRAG workflows. E-Agent adopts a one-time mRAG planning strategy that enables efficient information retrieval while minimizing redundant tool invocations. To rigorously assess the planning capabilities of mRAG systems, we introduce the Real-World mRAG Planning (RemPlan) benchmark. This novel benchmark contains both retrieval-dependent and retrieval-independent question types, systematically annotated with essential retrieval tools required for each instance. The benchmark's explicit mRAG planning annotations and diverse question design enhance its practical relevance by simulating real-world scenarios requiring dynamic mRAG decisions. Experiments across RemPlan and three established benchmarks demonstrate E-Agent's superiority: 13% accuracy gain over state-of-the-art mRAG methods while reducing redundant searches by 37%.

efficientagent, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.08816

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

Tan, Jiejun, Dou, Zhicheng, Yu, Yan, Cheng, Jiehan, Ju, Qiang, Xie, Jian, Wen, Ji-Rong

arXiv.org Artificial IntelligenceAug-12-2025

Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web. However, enterprises often require private deep search systems that can leverage search tools over both local and the Web corpus. Simply training an agent equipped with multiple search tools using flat reinforcement learning (RL) is a straightforward idea, but it has problems such as low training data efficiency and poor mastery of complex tools. To address the above issue, we propose a hierarchical agentic deep search framework, HierSearch, trained with hierarchical RL. At the low level, a local deep search agent and a Web deep search agent are trained to retrieve evidence from their corresponding domains. At the high level, a planner agent coordinates low-level agents and provides the final answer. Moreover, to prevent direct answer copying and error propagation, we design a knowledge refiner that filters out hallucinations and irrelevant evidence returned by low-level agents. Experiments show that HierSearch achieves better performance compared to flat RL, and outperforms various deep search and multi-source retrieval-augmented generation baselines in six benchmarks across general, finance, and medical domains.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.08088

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

Chen, Zijian, Ma, Xueguang, Zhuang, Shengyao, Nie, Ping, Zou, Kai, Liu, Andrew, Green, Joshua, Patel, Kshama, Meng, Ruoxi, Su, Mingyi, Sharifymoghaddam, Sahel, Li, Yanxi, Hong, Haoran, Shi, Xinyu, Liu, Xuye, Thakur, Nandan, Zhang, Crystina, Gao, Luyu, Chen, Wenhu, Lin, Jimmy

arXiv.org Artificial IntelligenceAug-12-2025

Deep-Research agents, which integrate large language models (LLMs) with search tools, have shown success in improving the effectiveness of handling complex queries that require iterative search planning and reasoning over search results. Evaluations on current benchmarks like BrowseComp relies on black-box live web search APIs, have notable limitations in (1) fairness: dynamic and opaque web APIs hinder fair comparisons and reproducibility of deep research methods; (2) transparency: lack of control over the document corpus makes it difficult to isolate retriever contributions. In other words, the current evaluations may compare a complete deep research system at a given time, but they do not foster well-controlled experiments to provide insights into the capability of underlying deep research LLMs. To address these challenges, we introduce BrowseComp-Plus, a benchmark derived from BrowseComp, employing a fixed, carefully curated corpus. Each query in BrowseComp-Plus includes human-verified supporting documents and mined challenging negatives, enabling controlled experimentation. The benchmark is shown to be effective in distinguishing the performance of deep research systems. For instance, the open-source model Search-R1, when paired with the BM25 retriever, achieves 3.86% accuracy, whereas the GPT-5 achieves 55.9%. Integrating the GPT-5 with the Qwen3-Embedding-8B retriever further enhances its accuracy to 70.1% with fewer search calls. This benchmark allows comprehensive evaluation and disentangled analysis of deep research agents and retrieval methods, fostering insights into retrieval effectiveness, citation accuracy, and context engineering in Deep-Research system.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.066

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

RAVine: Reality-Aligned Evaluation for Agentic Search

Xu, Yilong, Long, Xiang, Zheng, Zhi, Gao, Jinhua

arXiv.org Artificial IntelligenceAug-1-2025

Agentic search, as a more autonomous and adaptive paradigm of retrieval augmentation, is driving the evolution of intelligent search systems. However, existing evaluation frameworks fail to align well with the goals of agentic search. First, the complex queries commonly used in current benchmarks often deviate from realistic user search scenarios. Second, prior approaches tend to introduce noise when extracting ground truth for end-to-end evaluations, leading to distorted assessments at a fine-grained level. Third, most current frameworks focus solely on the quality of final answers, neglecting the evaluation of the iterative process inherent to agentic search. To address these limitations, we propose RAVine -- a Reality-Aligned eValuation framework for agentic LLMs with search. RAVine targets multi-point queries and long-form answers that better reflect user intents, and introduces an attributable ground truth construction strategy to enhance the accuracy of fine-grained evaluation. Moreover, RAVine examines model's interaction with search tools throughout the iterative process, and accounts for factors of efficiency. We benchmark a series of models using RAVine and derive several insights, which we hope will contribute to advancing the development of agentic search systems. The code and datasets are available at https://github.com/SwordFaith/RAVine.

information retrieval, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2507.16725

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report (1.00)
Workflow (0.67)

Industry:

Law (1.00)
Health & Medicine (1.00)
Banking & Finance > Economy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.92)

Add feedback

Agent-as-Tool: A Study on the Hierarchical Decision Making with Reinforcement Learning

Zhang, Yanfei

arXiv.org Artificial IntelligenceJul-3-2025

Large Language Models (LLMs) have emerged as one of the most significant technological advancements in artificial intelligence in recent years. Their ability to understand, generate, and reason with natural language has transformed how we interact with AI systems. With the development of LLM-based agents and reinforcement-learning-based reasoning models, the study of applying reinforcement learning in agent frameworks has become a new research focus. However, all previous studies face the challenge of deciding the tool calling process and the reasoning process simultaneously, and the chain of reasoning was solely relied on the unprocessed raw result with redundant information and symbols unrelated to the task from the tool, which impose a heavy burden on the model's capability to reason. Therefore, in our research, we proposed a hierarchical framework Agent-as-tool that detach the tool calling process and the reasoning process, which enables the model to focus on the verbally reasoning process while the tool calling process is handled by another agent. Our work had achieved comparable results with only a slight reinforcement fine-tuning on 180 samples, and had achieved exceptionally well performance in Bamboogle with 63.2% of exact match and 75.2% in cover exact match, exceeding Search-R1 by 4.8% in exact match and 3.2% in cover exact match.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2507.01489

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.91)

Add feedback

MMSearch-R1: Incentivizing LMMs to Search

Wu, Jinming, Deng, Zihao, Li, Wei, Liu, Yiding, You, Bo, Li, Bo, Ma, Zejun, Liu, Ziwei

arXiv.org Artificial IntelligenceJun-26-2025

Robust deployment of large multimodal models (LMMs) in real-world scenarios requires access to external knowledge sources, given the complexity and dynamic nature of real-world information. Existing approaches such as retrieval-augmented generation (RAG) and prompt engineered search agents rely on rigid pipelines, often leading to inefficient or excessive search behaviors. We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables LMMs to perform on-demand, multi-turn search in real-world Internet environments. Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty. To support training, We collect a multimodal search VQA dataset through a semi-automated pipeline that covers diverse visual and textual knowledge needs and curate a search-balanced subset with both search-required and search-free samples, which proves essential for shaping efficient and on-demand search behavior. Extensive experiments on knowledge-intensive and info-seeking VQA tasks show that our model not only outperforms RAG-based baselines of the same model size, but also matches the performance of a larger RAG-based model while reducing search calls by over 30%. We further analyze key empirical findings to offer actionable insights for advancing research in multimodal search.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.2067

Country: Europe (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation

Szczepanik, Kamil, Chudziak, Jarosław A.

arXiv.org Artificial IntelligenceJun-24-2025

TRIZ, the Theory of Inventive Problem Solving, is a structured, knowledge-based framework for innovation and abstracting problems to find inventive solutions. However, its application is often limited by the complexity and deep interdisciplinary knowledge required. Advancements in Large Language Models (LLMs) have revealed new possibilities for automating parts of this process. While previous studies have explored single LLMs in TRIZ applications, this paper introduces a multi-agent approach. We propose an LLM-based multi-agent system, called TRIZ agents, each with specialized capabilities and tool access, collaboratively solving inventive problems based on the TRIZ methodology. This multi-agent system leverages agents with various domain expertise to efficiently navigate TRIZ steps. The aim is to model and simulate an inventive process with language agents. We assess the effectiveness of this team of agents in addressing complex innovation challenges based on a selected case study in engineering. We demonstrate the potential of agent collaboration to produce diverse, inventive solutions. This research contributes to the future of AI-driven innovation, showcasing the advantages of decentralized problem-solving in complex ideation tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.5220/0013321900003890

2506.18783

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback