AITopics

Recent research has explored using Large Language Models for recommendation tasks by transforming user interaction histories and item metadata into text prompts, then having the LLM produce rankings or recommendations. A promising approach involves connecting collaborative filtering knowledge to LLM representations through compact adapter networks, which avoids expensive fine-tuning while preserving the strengths of both components. Yet several challenges persist in practice: collaborative filtering models often use static snapshots that miss rapidly changing user preferences; many real-world items contain rich visual and audio content beyond textual descriptions; and current systems struggle to provide trustworthy explanations backed by concrete evidence. Our work introduces \model{}, a framework that tackles these limitations through three key innovations. We develop an online adaptation mechanism that continuously incorporates new user interactions through lightweight modules, avoiding the need to retrain large models. We create a unified representation that seamlessly combines collaborative signals with visual and audio features, handling cases where some modalities may be unavailable. Finally, we design an explanation system that grounds recommendations in specific collaborative patterns and item attributes, producing natural language rationales users can verify. Our approach maintains the efficiency of frozen base models while adding minimal computational overhead, making it practical for real-world deployment.

artificial intelligence, large language model, natural language, (14 more...)

2510.01606

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.94)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Chitty-Venkata, Krishna Teja, Emani, Murali

ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models

W e develop ImageNet-Think, a multimodal reasoning dataset designed to aid the development of Vision Language Models (VLMs) with explicit reasoning capabilities. Our dataset is built on 250,000 images from ImageNet-21k dataset, providing structured thinking tokens and corresponding answers. Our synthetic dataset is generated by two state-of-the-art VLMs: GLM-4.1V-9B-Thinking and Kimi-VL-A3B-Thinking-2506. Each image is accompanied by two pairs of thinking-answer sequences, creating a resource for training and evaluating multimodal reasoning models. W e capture the step-by-step reasoning process of VLMs and the final descriptive answers. Our goal with this dataset is to enable the development of more robust VLMs while contributing to the broader understanding of multi-modal reasoning mechanisms. The dataset and evaluation benchmarks will be publicly available to aid research in reasoning/thinking multimodal VLMs. The dataset is available here on HuggingFace.

large language model, machine learning, natural language, (20 more...)

2510.01582

Genre: Research Report (1.00)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Halstead, Maura E, Green, Mark A., Jay, Caroline, Kingston, Richard, Topping, David, Singleton, Alexander

From keywords to semantics: Perceptions of large language models in data discovery

This matching requires researchers to know the exact wording that other researchers previously used, creating a challenging process that could lead to missing relevant data. Large Language Models (LLMs) could enhance data discovery by removing this requirement and allowing researchers to ask questions with natural language. However, we do not currently know if researchers would accept LLMs for data discovery. Using a human-centered artificial intelligence (HCAI) focus, we ran focus groups (N = 27) to understand researchers' perspectives towards LLMs for data discovery. Our conceptual model shows that the potential benefits are not enough for researchers to use LLMs instead of current technology. Barriers prevent researchers from fully accepting LLMs, but features around transparency could overcome them. Using our model will allow developers to incorporate features that result in an increased acceptance of LLMs for data discovery.

artificial intelligence, large language model, natural language, (17 more...)

2510.01473

Country:

North America (0.28)
Europe > United Kingdom (0.28)

Genre:

Research Report > Experimental Study (0.95)
Research Report > New Finding (0.70)

Industry:

Health & Medicine (1.00)
Government (1.00)
Information Technology (0.68)
Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Meisenbacher, Stephen, Nestorov, Svetlozar, Norlander, Peter

Extracting O*NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data

Data from online job postings are difficult to access and are not built in a standard or transparent manner. Data included in the standard taxonomy and occupational information database (O*NET) are updated infrequently and based on small survey samples. We adopt O*NET as a framework for building natural language processing tools that extract structured information from job postings. We publish the Job Ad Analysis Toolkit (JAAT), a collection of open-source tools built for this purpose, and demonstrate its reliability and accuracy in out-of-sample and LLM-as-a-Judge testing. We extract more than 10 billion data points from more than 155 million online job ads provided by the National Labor Exchange (NLx) Research Hub, including O*NET tasks, occupation codes, tools, and technologies, as well as wages, skills, industry, and more features. We describe the construction of a dataset of occupation, state, and industry level features aggregated by monthly active jobs from 2015 - 2025. We illustrate the potential for research and future uses in education and workforce development.

large language model, machine learning, natural language, (22 more...)

2510.0147

Country: North America > United States > California (0.45)

Genre: Research Report (1.00)

Industry:

Law > Labor & Employment Law (1.00)
Law Enforcement & Public Safety > Fire & Emergency Services (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(3 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Aguirre, Nicolás, Caso, Ramiro, Colmeiro, Ramiro Rodríguez, Santelli, Mauro, Calderón, Joaquín Toranzo

A-VERT: Agnostic Verification with Embedding Ranking Targets

The automatic evaluation of Language Model (LM) responses is a critical piece in the development of benchmarks and metrics, both for model training and quality assessment of production model endpoints. The current approaches to response classification relies on methods that are too expensive (i.e. LLM-as-a-Judge) or that are far from real-world conditions (string-matching, logprob). In this paper, a structure-free evaluation method is presented. The method makes use of semantic embedding distances to match target candidates with arbitrary LM-generated text, resulting in a robust classification of the response at a relatively low compute cost (embedding models of less than $10B$ parameters). The results show a regression score of ~0.97 and an accuracy of ~96% against human annotators, tested over 3 data sets and 3 different LM architectures.

large language model, machine learning, natural language, (21 more...)

2510.01469

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.66)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Attia, Ahmed Adel, Liu, Jing, Wilson, Carol Espy

RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines

The scarcity of large-scale classroom speech data has hindered the development of AI-driven speech models for education. Classroom datasets remain limited and not publicly available, and the absence of dedicated classroom noise or Room Impulse Response (RIR) corpora prevents the use of standard data augmentation techniques. In this paper, we introduce a scalable methodology for synthesizing classroom noise and RIRs using game engines, a versatile framework that can extend to other domains beyond the classroom. Building on this methodology, we present RealClass, a dataset that combines a synthesized classroom noise corpus with a classroom speech dataset compiled from publicly available corpora. The speech data pairs a children's speech corpus with instructional speech extracted from YouTube videos to approximate real classroom interactions in clean conditions. Experiments on clean and noisy speech show that RealClass closely approximates real classroom speech, making it a valuable asset in the absence of abundant real classroom speech.

artificial intelligence, machine learning, speech, (16 more...)

2510.01462

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Education > Educational Setting (1.00)
Information Technology (0.86)
Leisure & Entertainment > Games > Computer Games (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.31)

Sreedharan, Sarath, Sikes, Kelsey, Blanchard, Nathaniel, Mason, Lisa, Krishnaswamy, Nikhil, Zarestky, Jill

On the Role of Domain Experts in Creating Effective Tutoring Systems

The role that highly curated knowledge, provided by domain experts, could play in creating effective tutoring systems is often overlooked within the AI for education community. In this paper, we highlight this topic by discussing two ways such highly curated expert knowledge could help in creating novel educational systems. First, we will look at how one could use explainable AI (XAI) techniques to automatically create lessons. Most existing XAI methods are primarily aimed at debugging AI systems. However, we will discuss how one could use expert specified rules about solving specific problems along with novel XAI techniques to automatically generate lessons that could be provided to learners. Secondly, we will see how an expert specified curriculum for learning a target concept can help develop adaptive tutoring systems, that can not only provide a better learning experience, but could also allow us to use more efficient algorithms to create these systems. Finally, we will highlight the importance of such methods using a case study of creating a tutoring system for pollinator identification, where such knowledge could easily be elicited from experts.

learner, machine learning, natural language, (18 more...)

doi: 10.1007/978-3-031-99261-2_5

2510.01432

Country: North America > United States (0.70)

Genre:

Research Report (1.00)
Instructional Material (0.66)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)

Personnat, Gerson, Lin, Tao, Hossain, Safwan, Parkes, David C.

Learning to Play Multi-Follower Bayesian Stackelberg Games

In a multi-follower Bayesian Stackelberg game, a leader plays a mixed strategy over $L$ actions to which $n\ge 1$ followers, each having one of $K$ possible private types, best respond. The leader's optimal strategy depends on the distribution of the followers' private types. We study an online learning version of this problem: a leader interacts for $T$ rounds with $n$ followers with types sampled from an unknown distribution every round. The leader's goal is to minimize regret, defined as the difference between the cumulative utility of the optimal strategy and that of the actually chosen strategies. We design learning algorithms for the leader under different feedback settings. Under type feedback, where the leader observes the followers' types after each round, we design algorithms that achieve $\mathcal O\big(\sqrt{\min\{L\log(nKA T), nK \} \cdot T} \big)$ regret for independent type distributions and $\mathcal O\big(\sqrt{\min\{L\log(nKA T), K^n \} \cdot T} \big)$ regret for general type distributions. Interestingly, those bounds do not grow with $n$ at a polynomial rate. Under action feedback, where the leader only observes the followers' actions, we design algorithms with $\mathcal O( \min\{\sqrt{ n^L K^L A^{2L} L T \log T}, K^n\sqrt{ T } \log T \} )$ regret. We also provide a lower bound of $Ω(\sqrt{\min\{L, nK\}T})$, almost matching the type-feedback upper bounds.

artificial intelligence, data mining, machine learning, (21 more...)

2510.01387

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Education (0.67)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Ibrahim, Humaid, Rozanov, Nikolai, Rei, Marek

Fine-tuning with RAG for Improving LLM Learning of New Skills

Large language model (LLM) agents deployed for multi-step tasks frequently fail in predictable ways: attempting actions with unmet preconditions, issuing redundant commands, or mishandling environment constraints. While retrieval-augmented generation (RAG) can improve performance by providing runtime guidance, it requires maintaining external knowledge databases and adds computational overhead at every deployment. We propose a simple pipeline that converts inference-time retrieval into learned competence through distillation. Our approach: (1) extracts compact, reusable hints from agent failures, (2) uses these hints to generate improved teacher trajectories via one-shot retrieval at episode start, and (3) trains student models on these trajectories with hint strings removed, forcing internalization rather than memorization. Across two interactive benchmarks, ALFWorld (household tasks) and WebShop (online shopping), distilled students consistently outperform baseline agents, achieving up to 91% success on ALFWorld (vs. The approach generalizes across model scales (7B/14B parameters) and agent architectures (ReAct/StateAct), demonstrating that retrieval benefits can be effectively internalized through targeted fine-tuning without permanent runtime dependencies. Large language models are increasingly deployed as agents that interact with environments to complete multi-step tasks. Success requires not just generating plausible text but maintaining goals across extended interactions, managing state and preconditions, and recovering from errors. Prior work has explored multiple approaches to improve agent performance. Structured prompting methods like ReAct (Y ao et al., 2023b) and StateAct (Rozanov & Rei, 2025) provide scaffolding for reasoning and state tracking. Self-reflection approaches such as Reflexion (Shinn et al., 2023) enable learning from mistakes across multiple attempts. Retrieval-augmented methods (Lewis et al., 2021; Zhao et al., 2024; Fu et al., 2024) inject external knowledge to guide decisions.

large language model, machine learning, natural language, (20 more...)

2510.01375

Country: Asia (0.28)

Genre: Research Report (0.84)

Industry: Education > Educational Technology > Educational Software (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Cebrian, Manuel, Kito, Tomomi, Fernandez, Raul Castro

Emergent evaluation hubs in a decentralizing large language model ecosystem

Large language models are proliferating, and so are the benchmarks that serve as their common yardsticks. We ask how the agglomeration patterns of these two layers compare: do they evolve in tandem or diverge? Drawing on two curated proxies for the ecosystem, the Stanford Foundation-Model Ecosystem Graph and the Evidently AI benchmark registry, we find complementary but contrasting dynamics. Model creation has broadened across countries and organizations and diversified in modality, licensing, and access. Benchmark influence, by contrast, displays centralizing patterns: in the inferred benchmark-author-institution network, the top 15% of nodes account for over 80% of high-betweenness paths, three countries produce 83% of benchmark outputs, and the global Gini for inferred benchmark authority reaches 0.89. An agent-based simulation highlights three mechanisms: higher entry of new benchmarks reduces concentration; rapid inflows can temporarily complicate coordination in evaluation; and stronger penalties against over-fitting have limited effect. Taken together, these results suggest that concentrated benchmark influence functions as coordination infrastructure that supports standardization, comparability, and reproducibility amid rising heterogeneity in model production, while also introducing trade-offs such as path dependence, selective visibility, and diminishing discriminative power as leaderboards saturate.

benchmark, large language model, machine learning, (20 more...)

2510.01286

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Education (0.67)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)