AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Neural Information Processing SystemsFeb-15-2026, 18:33:07 GMT

8b94879b177d9780c17f5a78f62a6a8a-Supplemental-Datasets_and_Benchmarks.pdf

artificial intelligence, json, machine learning, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Neural Information Processing SystemsFeb-15-2026, 10:41:27 GMT

62c6d7893b13a13c659cb815852dd00d-Supplemental-Datasets_and_Benchmarks_Track.pdf

large language model, machine learning, natural language, (19 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Hong Kong (0.04)
Europe > Italy > Lazio > Rome (0.04)
(2 more...)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)

Neural Information Processing SystemsFeb-11-2026, 14:25:41 GMT

SupplementaryMaterial

A sitting is a meeting of parliament members. While in the virtual environment, you will need to install the specific Gensim1 version needed for theCompassapproach. Inotherinstances,thebeginning of the line that specifies the speaker consists of the role of the parliament member, for example "SPEAKEROFTHEPARLIAMENT" (meaning the member of parliament presiding), followed, but not always, by the actual full name of the person in parenthesis. Theidisa unique number we assigned to each file. Themainchallenge of translating the files from Greek to English was the conversion of the Greek alphabetic numeralstoindo-arabicnumerals.

artificial intelligence, machine learning, natural language, (18 more...)

Country:

Europe > Greece (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Florida > Hillsborough County > Tampa (0.04)
(2 more...)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-8-2026, 08:27:28 GMT

Appendix: LanguageModelswithImageDescriptors areStrongFew-ShotVideo-LanguageLearners

For VaTeX captioning and retrieval, we use the latest v1.1 version3, which contains 25,991 videos for training and 6,000 videos for public testing. The statistics can be found in Table 1. Visual genome synsets are pairs, where the keys are noisy natural language phrases and the values are the mapped WordNet synsets [6]. Ifavisualtokenoccurs in multiple frames, we use the averaged frame indexas its temporal indicator. Specifically,for UniVL, we set the number of epoches to be50 and the linear warmup steps to be40.

artificial intelligence, machine learning, natural language, (18 more...)

Country:

North America > United States > Illinois (0.05)
Asia > China (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.74)

Lee, Kevin, Spiewak, Russell, Walsh, James

Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning

arXiv.org Artificial IntelligenceNov-27-2025

Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and providing clear scientific formats through coordinated approaches. To address these challenges, we present Reasoning With a Star, a newly contributed heliophysics dataset applicable to reasoning; we also provide an initial benchmarking approach. Our data are constructed from National Aeronautics and Space Administration & University Corporation for Atmospheric Research Living With a Star summer school problem sets and compiled into a readily consumable question-and-answer structure with question contexts, reasoning steps, expected answer type, ground-truth targets, format hints, and metadata. A programmatic grader checks the predictions using unit-aware numerical tolerance, symbolic equivalence, and schema validation. We benchmark a single-shot baseline and four multi-agent patterns, finding that decomposing workflows through systems engineering principles outperforms direct prompting on problems requiring deductive reasoning rather than pure inductive recall.

large language model, machine learning, natural language, (20 more...)

2511.20694

Country: North America > United States > California > Los Angeles County (0.28)

Genre:

Workflow (1.00)
Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Energy (0.46)
Aerospace & Defense (0.46)
Government > Space Agency (0.34)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Hofer, Marvin, Rahm, Erhard

KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs

arXiv.org Artificial IntelligenceNov-25-2025

Building high-quality knowledge graphs (KGs) from diverse sources requires combining methods for information extraction, data transformation, ontology mapping, entity matching, and data fusion. Numerous methods and tools exist for each of these tasks, but support for combining them into reproducible and effective end-to-end pipelines is still lacking. We present a new framework, KGpipe for defining and executing integration pipelines that can combine existing tools or LLM (Large Language Model) functionality. To evaluate different pipelines and the resulting KGs, we propose a benchmark to integrate heterogeneous data of different formats (RDF, JSON, text) into a seed KG. We demonstrate the flexibility of KGpipe by running and comparatively evaluating several pipelines integrating sources of the same or different formats using selected performance and quality metrics.

artificial intelligence, large language model, natural language, (17 more...)

2511.18364

Country:

North America > United States (0.46)
Europe > Germany (0.29)

Genre: Research Report (0.82)

Industry:

Media > Film (0.93)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceNov-18-2025

FunReason-MT Technical Report: Advanced Data Synthesis Solution for Real-world Multi-Turn Tool-use

Xu, Zengzhuang, Hao, Bingguang, Wang, Zechuan, Wen, Yuntao, Xu, Xinyi, Liu, Yang, Chen, Long, Wang, Dong, Wang, Maolin, Zhao, Tong, Chen, Yicheng, Peng, Cunyin, Gu, Jinjie, Gan, Leilei, Zhao, Xiangyu, Zhuang, Chenyi, Gu, Shi

Function calling (FC) empowers large language models (LLMs) and autonomous agents to interface with external tools, a critical capability for solving complex, real-world problems. As this ability becomes increasingly central to advanced AI systems, the need for high-quality, multi-turn training data to develop and refine it cannot be overstated. Existing data synthesis methods, such as random environment sampling or multi-agent role-playing, are not powerful enough to generate high-quality data in real-world environments. Practical challenges come in three folds: targeted data synthesis, hard query construction, and multi-turn logical dependency. To address these structural deficiencies, we present FunReason-MT, a novel data synthesis framework for real-world multi-turn tool use. FunReason-MT resolves the complexity barrier in multi-turn FC data by employing 1) Environment-API Graph Interactions to gather varied high-quality trajectories with targeted tool, 2) Advanced Tool-Query Synthesis to simplify hard query construction, and 3) Guided Iterative Chain for sophisticated CoT generation. Evaluations on Berkeley Function-Calling Leaderboard (BFCLv3) demonstrate the power of our framework: a 4B model built upon FunReason-MT generated data achieves state-of-the-art performance among comparable-sized models. Further performance improvements on BFCLv4 confirm that FunReason-MT provides a reliable and robust source for agentic learning.

large language model, machine learning, natural language, (19 more...)

2510.24645

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-18-2025

TimeStampEval: A Simple LLM Eval and a Little Fuzzy Matching Trick to Improve Search Accuracy

McCammon, James

Traditional fuzzy matching often fails when searching for quotes that are semantically identical but syntactically different across documents-a common issue when aligning official written records with speech-to-text transcripts. We introduce TimeStampEval, a benchmark for retrieving precise millisecond timestamps from long transcripts given non-verbatim quotes. Our simple two-stage method dramatically improves retrieval accuracy while cutting inference costs by over 90%. The motivating use case is an automated long-form podcast that assembles Congressional Record clips into AI-hosted narration. The technical challenge: given a sentence-timestamped transcript and a target quote that may differ due to transcription or editorial drift, return exact start and end boundaries. Standard algorithms handle verbatim text but break under fuzzier variants. Evaluating six modern LLMs on a 2,800-sentence (120k-token) transcript revealed four key findings. (1) Prompt design matters more than model choice: placing the query before the transcript and using compact formatting improved accuracy by 3-20 points while reducing token count by 30-40%. (2) Off-by-one errors form a distinct category, showing models understand the task but misplace boundaries. (3) A modest reasoning budget (600-850 tokens) raises accuracy from 37% to 77% for weak setups and to above 90% for strong ones. (4) Our "Assisted Fuzzy" approach-RapidFuzz pre-filtering followed by LLM verification on short snippets-improves fuzzy match accuracy by up to 50 points while halving latency and reducing cost per correct result by up to 96%. Extended tests on ten transcripts (50k-900k tokens, 1989-2025) confirm robustness to transcript length, vocabulary drift, and domain change, maintaining 95-100% rejection accuracy for absent targets.

large language model, machine learning, natural language, (22 more...)

2511.11594

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-13-2025

Simpliflow: A Lightweight Open-Source Framework for Rapid Creation and Deployment of Generative Agentic AI Workflows

Panchal, Deven

Generative Agentic AI systems are emerging as a powerful paradigm for automating complex, multi-step tasks. However, many existing frameworks for building these systems introduce significant complexity, a steep learning curve, and substantial boilerplate code, hindering rapid prototyping and deployment. This paper introduces simpliflow, a lightweight, open-source Python framework designed to address these challenges. simpliflow enables the rapid development and orchestration of linear, deterministic agentic workflows through a declarative, JSON-based configuration. Its modular architecture decouples agent management, workflow execution, and post-processing, promoting ease of use and extensibility. By integrating with LiteLLM, it supports over 100 Large Language Models (LLMs) out-of-the-box. We present the architecture, operational flow, and core features of simpliflow, demonstrating its utility through diverse use cases ranging from software development simulation to real-time system interaction. A comparative analysis with prominent frameworks like LangChain and AutoGen highlights simpliflow's unique position as a tool optimized for simplicity, control, and speed in deterministic workflow environments.

large language model, machine learning, natural language, (19 more...)

2510.10675

Genre: Workflow (1.00)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)