AITopics

2503.16669

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.69)
Media > Music (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

arXiv.org Artificial IntelligenceMar-20-2025

Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models

Bi, Baolong, Liu, Shenghua, Wang, Yiwei, Xu, Yilong, Fang, Junfeng, Mei, Lingrui, Cheng, Xueqi

Retrieval-Augmented Generation (RAG) mitigates hallucinations in Large Language Models (LLMs) by integrating external knowledge. However, conflicts between parametric knowledge and retrieved context pose challenges, particularly when retrieved information is unreliable or the model's internal knowledge is outdated. In such cases, LLMs struggle to determine whether to rely more on their own parameters or the conflicted context. To address this, we propose **CK-PLUG**, a plug-and-play method for controlling LLMs' reliance on parametric and contextual knowledge. We introduce a novel knowledge consistency metric, Confidence Gain, which detects knowledge conflicts by measuring entropy shifts in token probability distributions after context insertion. CK-PLUG then enables fine-grained control over knowledge preference by adjusting the probability distribution of tokens with negative confidence gain through a single tuning parameter. Experiments demonstrate CK-PLUG's ability to significantly regulate knowledge reliance in counterfactual RAG scenarios while maintaining generation fluency and knowledge accuracy. For instance, on Llama3-8B, memory recall (MR) of RAG response can be adjusted within a broad range (9.9%-71.9%), compared to the baseline of 42.1%. Moreover, CK-PLUG supports adaptive control based on the model's confidence in both internal and external knowledge, achieving consistent performance improvements across various general RAG tasks. Our code is available at: $\href{https://github.com/byronBBL/CK-PLUG}{\text{this https URL}}$.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2503.15888

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England (0.04)
North America > Canada > Quebec > Montreal (0.04)
(17 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Sports (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

An original E.T. from 1982 movie could fetch 1M at auction

A collection of sci-fi movie memorabilia is heading to auction and includes one of the most iconic film aliens of all time. As part of the upcoming series, "There Are Such Things: 20th Century Horror, Science Fiction, and Fantasy on Screen," Sotheby's is offering an original, screen-used E.T. full body model seen in Steven Spielberg's E.T. the Extra-Terrestrial. Designed by the late, great special effects and makeup artist Carlo Rambaldi, the roughly 3-foot-tall piece of pop culture history was used in the famous "closet scene," and is one of just three manufactured during the making of the 1982 movie. While instantly recognizable today, E.T.'s overall look was completely absent from Melissa Mathison's screenplay. Creating the character from scratch came through a collaboration between Spielberg, storyboard artist Ed Verreaux, and Rambaldi.

artificial intelligence, rambaldi, science fiction, (6 more...)

Popular Science

Country:

North America > United States > New York (0.06)
Europe > Italy (0.06)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Science Fiction (1.00)

FOX NewsMar-19-2025, 15:11:10 GMT

Fox News AI Newsletter: 'America's great industrial comeback'

Vice President JD Vance joined the American Dynamism Summit in Washington, D.C., on Tuesday to discuss the future of AI. 'INDUSTRIAL COMEBACK': Vice President JD Vance knocked recent globalization efforts that use "cheap labor as a crutch" while simultaneously hampering innovation on the global scale during a Tuesday tech and artificial intelligence speech. NO MORE CHORES: Developed by the artificial intelligence company 1X, NEO Gamma isn't your clunky, metallic automaton. It is designed to be a helpful, almost human-like assistant. AI DASHCAM DILEMMA: The trucking industry is in the midst of a technological revolution, thanks to the arrival of artificial intelligence-powered dashcams. These innovative devices promise to make roads safer and operations more efficient, but they also raise some important questions about privacy.

artificial intelligence, great industrial comeback, social media, (5 more...)

FOX News

Country: North America > United States > District of Columbia > Washington (0.27)

Industry:

Information Technology > Security & Privacy (0.80)
Media > News (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models

Li, Jiazheng, Yu, Lu, Cui, Qing, Zhang, Zhiqiang, Zhou, Jun, Ye, Yanfang, Zhang, Chuxu

High-quality data plays a critical role in the pretraining and fine-tuning of large language models (LLMs), even determining their performance ceiling to some degree. Consequently, numerous data selection methods have been proposed to identify subsets of data that can effectively and efficiently enhance model performance. However, most of these methods focus on general data selection and tend to overlook the specific nuances of domain-related data. In this paper, we introduce MASS, a \textbf{MA}thematical data \textbf{S}election framework using the \textbf{S}kill graph for pretraining LLMs in the mathematical reasoning domain. By taking into account the unique characteristics of mathematics and reasoning, we construct a skill graph that captures the mathematical skills and their interrelations from a reference dataset. This skill graph guides us in assigning quality scores to the target dataset, enabling us to select the top-ranked subset which is further used to pretrain LLMs. Experimental results demonstrate the efficiency and effectiveness of MASS across different model sizes (1B and 7B) and pretraining datasets (web data and synthetic data). Specifically, in terms of efficiency, models trained on subsets selected by MASS can achieve similar performance to models trained on the original datasets, with a significant reduction in the number of trained tokens - ranging from 50\% to 70\% fewer tokens. In terms of effectiveness, when trained on the same amount of tokens, models trained on the data selected by MASS outperform those trained on the original datasets by 3.3\% to 5.9\%. These results underscore the potential of MASS to improve both the efficiency and effectiveness of pretraining LLMs.

large language model, machine learning, natural language, (13 more...)

2503.14917

Country:

Asia > Vietnam (0.04)
North America > United States > Connecticut (0.04)
Europe > United Kingdom (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment (1.00)
Transportation (0.68)
Media > Film (0.46)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Arcuschin, Iván, Janiak, Jett, Krzyzanowski, Robert, Rajamanoharan, Senthooran, Nanda, Neel, Conmy, Arthur

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Chain-of-Thought (CoT) reasoning has significantly advanced state-of-the-art AI capabilities. However, recent studies have shown that CoT reasoning is not always faithful, i.e. CoT reasoning does not always reflect how models arrive at conclusions. So far, most of these studies have focused on unfaithfulness in unnatural contexts where an explicit bias has been introduced. In contrast, we show that unfaithful CoT can occur on realistic prompts with no artificial bias. Our results reveal non-negligible rates of several forms of unfaithful reasoning in frontier models: Sonnet 3.7 (16.3%), DeepSeek R1 (5.3%) and ChatGPT-4o (7.0%) all answer a notable proportion of question pairs unfaithfully. Specifically, we find that models rationalize their implicit biases in answers to binary questions ("implicit post-hoc rationalization"). For example, when separately presented with the questions "Is X bigger than Y?" and "Is Y bigger than X?", models sometimes produce superficially coherent arguments to justify answering Yes to both questions or No to both questions, despite such responses being logically contradictory. We also investigate restoration errors (Dziri et al., 2023), where models make and then silently correct errors in their reasoning, and unfaithful shortcuts, where models use clearly illogical reasoning to simplify solving problems in Putnam questions (a hard benchmark). Our findings raise challenges for AI safety work that relies on monitoring CoT to detect undesired behavior.

large language model, machine learning, natural language, (21 more...)

2503.08679

Country:

North America > United States > Nevada > Carson City (0.14)
North America > United States > Wisconsin > Sheboygan County > Sheboygan (0.14)
Asia > Middle East > Iraq (0.04)
(28 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.68)
Media > Film (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Challenges and Trends in Egocentric Vision: A Survey

Li, Xiang, Qiu, Heqian, Wang, Lanxiao, Zhang, Hanwen, Qi, Chenghao, Han, Linfeng, Xiong, Huiyu, Li, Hongliang

With the rapid development of artificial intelligence technologies and wearable devices, egocentric vision understanding has emerged as a new and challenging research direction, gradually attracting widespread attention from both academia and industry. Egocentric vision captures visual and multimodal data through cameras or sensors worn on the human body, offering a unique perspective that simulates human visual experiences. This paper provides a comprehensive survey of the research on egocentric vision understanding, systematically analyzing the components of egocentric scenes and categorizing the tasks into four main areas: subject understanding, object understanding, environment understanding, and hybrid understanding. We explore in detail the sub-tasks within each category. We also summarize the main challenges and trends currently existing in the field. Furthermore, this paper presents an overview of high-quality egocentric vision datasets, offering valuable resources for future research. By summarizing the latest advancements, we anticipate the broad applications of egocentric vision technologies in fields such as augmented reality, virtual reality, and embodied intelligence, and propose future research directions based on the latest developments in the field.

data mining, large language model, machine learning, (23 more...)

2503.15275

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
(13 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Leisure & Entertainment (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.67)
(3 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Hardware (1.00)
(10 more...)

R$^2$: A LLM Based Novel-to-Screenplay Generation Framework with Causal Plot Graphs

Lin, Zefeng, Xiao, Yi, Mo, Zhiqiang, Zhang, Qifan, Wang, Jie, Chen, Jiayang, Zhang, Jiajing, Zhang, Hui, Liu, Zhengyi, Fang, Xianyong, Xu, Xiaohua

Published as a conference paper at ICLR 2025R 2: A LLM B ASED N OVEL-TO-S CREENPLAYG ENER-ATIONF RAMEWORK WITH C AUSALP LOT G RAPHS Zefeng Lin 1, Yi Xiao 1, Zhiqiang Mo 1, Qifan Zhang 1, Jie Wang 2, Jiayang Chen 2, Jiajing Zhang 2, Hui Zhang 1, Zhengyi Liu 3, Xianyong Fang 3, Xiaohua Xu 1 1 University of Science and Technology of China 2 Anhui Jianzhu University 3 Anhui University A BSTRACT Automatically adapting novels into screenplays is important for the TV, film, or opera industries to promote products with low costs. The strong performances of large language models (LLMs) in long-text generation call us to propose a LLM based framework Reader-Rewriter (R 2) for this task. However, there are two fundamental challenges here. First, the LLM hallucinations may cause inconsistent plot extraction and screenplay generation. Second, the causality-embedded plot lines should be effectively extracted for coherent rewriting. Therefore, two corresponding tactics are proposed: 1) A hallucination-aware refinement method (HAR) to iteratively discover and eliminate the affections of hallucinations; and 2) a causal plot-graph construction method (CPC) based on a greedy cycle-breaking algorithm to efficiently construct plot lines with event causalities. Recruiting those efficient techniques, R 2 utilizes two modules to mimic the human screenplay rewriting process: The Reader module adopts a sliding window and CPC to build the causal plot graphs, while the Rewriter module generates first the scene outlines based on the graphs and then the screenplays. HAR is integrated into both modules for accurate inferences of LLMs.

artificial intelligence, large language model, natural language, (18 more...)

2503.15655

Country:

Asia > China (0.24)
North America > United States > Colorado > Weld County > Evans (0.04)

Genre: Research Report (0.82)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding

Tu, Chongjun, Zhang, Lin, Chen, Pengtao, Ye, Peng, Zeng, Xianfang, Cheng, Wei, Yu, Gang, Chen, Tao

Multimodal Large Language Models (MLLMs) have shown remarkable capabilities in video content understanding but still struggle with fine-grained motion comprehension. To comprehensively assess the motion understanding ability of existing MLLMs, we introduce FAVOR-Bench, comprising 1,776 videos with structured manual annotations of various motions. Our benchmark includes both close-ended and open-ended tasks. For close-ended evaluation, we carefully design 8,184 multiple-choice question-answer pairs spanning six distinct sub-tasks. For open-ended evaluation, we develop both a novel cost-efficient LLM-free and a GPT-assisted caption assessment method, where the former can enhance benchmarking interpretability and reproducibility. Comprehensive experiments with 21 state-of-the-art MLLMs reveal significant limitations in their ability to comprehend and describe detailed temporal dynamics in video motions. To alleviate this limitation, we further build FAVOR-Train, a dataset consisting of 17,152 videos with fine-grained motion annotations. The results of finetuning Qwen2.5-VL on FAVOR-Train yield consistent improvements on motion-related tasks of TVBench, MotionBench and our FAVOR-Bench. Comprehensive assessment results demonstrate that the proposed FAVOR-Bench and FAVOR-Train provide valuable tools to the community for developing more powerful video understanding models. Project page: \href{https://favor-bench.github.io/}{https://favor-bench.github.io/}.

large language model, machine learning, natural language, (21 more...)

2503.14935

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.70)

Industry:

Education (0.49)
Media > Television (0.32)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings

Xu, Austin, Bansal, Srijan, Ming, Yifei, Yavuz, Semih, Joty, Shafiq

The large language model (LLM)-as-judge paradigm has been used to meet the demand for a cheap, reliable, and fast evaluation of model outputs during AI system development and post-deployment monitoring. While judge models -- LLMs finetuned to specialize in assessing and critiquing model outputs -- have been touted as general purpose evaluators, they are typically evaluated only on non-contextual scenarios, such as instruction following. The omission of contextual settings -- those where external information is used as context to generate an output -- is surprising given the increasing prevalence of retrieval-augmented generation (RAG) and summarization use cases. Contextual assessment is uniquely challenging, as evaluation often depends on practitioner priorities, leading to conditional evaluation criteria (e.g., comparing responses based on factuality and then considering completeness if they are equally factual). To address the gap, we propose ContextualJudgeBench, a judge benchmark with 2,000 challenging response pairs across eight splits inspired by real-world contextual evaluation scenarios. We build our benchmark with a multi-pronged data construction pipeline that leverages both existing human annotations and model-based perturbations. Our comprehensive study across 11 judge models and 9 general purpose models, reveals that the contextual information and its assessment criteria present a significant challenge to even state-of-the-art models. For example, OpenAI's o1, the best-performing model, barely reaches 55% consistent accuracy.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2503.1562

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > New Hampshire (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Law Enforcement & Public Safety (0.67)
Media > Film (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)