AITopics | assistant

Collaborating Authors

assistant

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BoNBoNAlignmentforLargeLanguageModels andtheSweetnessofBest-of-nSampling

Neural Information Processing SystemsFeb-7-2026, 08:44:37 GMT

However,best-of-n requires drawingn samples for each inference, a substantial cost.

cit, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Google Assistant will stick around a bit longer than expected for some Android users

EngadgetDec-20-2025, 13:00:00 GMT

LG TVs add'delete' option for Copilot The transition from Assistant to Gemini will continue into 2026. Google wanted to remove Assistant from most Android phones by the end of 2025 and replace it with Gemini. But now the company has announced that it needs a bit more time to make its AI assistant the new default digital helper for most of its users. Google said that it's adjusting its previously announced timeline to make sure [it delivers] a seamless transition and that updates to convert Assistant to Gemini on Android devices will continue into the next year. The company also said that it's sharing more details in the coming months, so it's possible that the transition will go past early 2026. Assistant's retirement was pretty much expected the moment Google launched Gemini and started giving it Assistant's capabilities, such as the ability to control smart devices connected to your phone.

android user, assistant, google assistant, (8 more...)

Engadget

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.74)

Add feedback

ImF: Implicit Fingerprint for Large Language Models

jiaxuan, Wu, Wanli, Peng, hang, Fu, Yiming, Xue, juan, Wen

arXiv.org Artificial IntelligenceMar-25-2025

Training large language models (LLMs) is resource-intensive and expensive, making intellectual property (IP) protection essential. Most existing model fingerprint methods inject fingerprints into LLMs to protect model ownership. These methods create fingerprint pairs with weak semantic correlations, lacking the contextual coherence and semantic relatedness founded in normal question-answer (QA) pairs in LLMs. In this paper, we propose a Generation Revision Intervention (GRI) attack that can effectively exploit this flaw to erase fingerprints, highlighting the need for more secure model fingerprint methods. Thus, we propose a novel injected fingerprint paradigm called Implicit Fingerprints (ImF). ImF constructs fingerprint pairs with strong semantic correlations, disguising them as natural QA pairs within LLMs. This ensures the fingerprints are consistent with normal model behavior, making them indistinguishable and robust against detection and removal. Our experiment on multiple LLMs demonstrates that ImF retains high verification success rates under adversarial conditions, offering a reliable solution for protecting LLM ownership.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.21805

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.69)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Why LLMs Cannot Think and How to Fix It

Jahrens, Marius, Martinetz, Thomas

arXiv.org Artificial IntelligenceMar-12-2025

This paper elucidates that current state-of-the-art Large Language Models (LLMs) are fundamentally incapable of making decisions or developing "thoughts" within the feature space due to their architectural constraints. We establish a definition of "thought" that encompasses traditional understandings of that term and adapt it for application to LLMs. We demonstrate that the architectural design and language modeling training methodology of contemporary LLMs inherently preclude them from engaging in genuine thought processes. Our primary focus is on this theoretical realization rather than practical insights derived from experimental data. Finally, we propose solutions to enable thought processes within the feature space and discuss the broader implications of these architectural modifications.

continuation, feature space, llm, (15 more...)

arXiv.org Artificial Intelligence

2503.09211

Genre:

Research Report (0.50)
Instructional Material (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

GRP: Goal-Reversed Prompting for Zero-Shot Evaluation with LLMs

Song, Mingyang, Zheng, Mao, Luo, Xuan

arXiv.org Artificial IntelligenceMar-8-2025

Using Large Language Models (LLMs) to evaluate and compare two answers from different models typically involves having LLM-based judges select the better answer. However, humans often approach problem-solving from a reverse perspective, for instance, by choosing the worse option instead of the better one in a pairwise comparison. Generally, this kind of reverse thinking plays a crucial role in human reasoning and decision-making and can further test the difference between original and reverse thought processes simultaneously. To address the above issue, in this paper, we propose a Goal-Reversed Prompting (GRP) approach for pairwise evaluation that shifts the original task from selecting the better answer to choosing the worse one. We encourage LLMs to think in reverse by prompting LLMs to identify the worse response. Experiments on closed-source models demonstrate that GRP significantly enhances evaluation capabilities, outperforming the prompt template with the original goal.

evaluation, language model, template, (15 more...)

arXiv.org Artificial Intelligence

2503.06139

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding

Krumdick, Michael, Lovering, Charles, Reddy, Varshini, Ebner, Seth, Tanner, Chris

arXiv.org Artificial IntelligenceMar-6-2025

LLM-as-a-Judge is a framework that uses an LLM (large language model) to evaluate the quality of natural language text - typically text that is also generated by an LLM. This framework holds great promise due to its relative low-cost, ease of use, and strong correlations with human stylistic preferences. However, LLM Judges have been shown to exhibit biases that can distort their judgments. We evaluate how well LLM Judges can grade whether a given response to a conversational question is correct, an ability crucial to soundly estimating the overall response quality. To do so, we create and publicly release a human-annotated dataset with labels of correctness for 1,200 LLM responses. We source questions from a combination of existing datasets and a novel, challenging benchmark (BFF-Bench) created for this analysis. We demonstrate a strong connection between an LLM's ability to correctly answer a question and grade responses to that question. Although aggregate level statistics might imply a judge has high agreement with human annotators, it will struggle on the subset of questions it could not answer. To address this issue, we recommend a simple solution: provide the judge with a correct, human-written reference answer. We perform an in-depth analysis on how reference quality can affect the performance of an LLM Judge. We show that providing a weaker judge (e.g. Qwen 2.5 7B) with higher quality references reaches better agreement with human annotators than a stronger judge (e.g. GPT-4o) with synthetic references.

agreement, correctness, evaluation, (16 more...)

arXiv.org Artificial Intelligence

2503.05061

Country:

North America > United States > Massachusetts (0.14)
North America > Mexico > Mexico City (0.14)
Europe > Spain (0.14)
Asia > Thailand (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Static Vs. Agentic Game Master AI for Facilitating Solo Role-Playing Experiences

Jørgensen, Nicolai Hejlesen, Tharmabalan, Sarmilan, Aslan, Ilhan, Hansen, Nicolai Brodersen, Merritt, Timothy

arXiv.org Artificial IntelligenceMar-6-2025

This paper presents a game master AI for single-player role-playing games. The AI is designed to deliver interactive text-based narratives and experiences typically associated with multiplayer tabletop games like Dungeons & Dragons. We report on the design process and the series of experiments to improve the functionality and experience design, resulting in two functional versions of the system. While v1 of our system uses simplified prompt engineering, v2 leverages a multi-agent architecture and the ReAct framework to include reasoning and action. A comparative evaluation demonstrates that v2 as an agentic system maintains play while significantly improving modularity and game experience, including immersion and curiosity. Our findings contribute to the evolution of AI-driven interactive fiction, highlighting new avenues for enhancing solo role-playing experiences.

narrative, opponent, participant, (14 more...)

arXiv.org Artificial Intelligence

2502.19519

Country:

North America > United States > New York (0.28)
Europe > Denmark (0.14)
South America > Brazil (0.14)
Asia > Thailand (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.68)
Personal > Interview (0.67)
Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine > Therapeutic Area (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

OWLViz: An Open-World Benchmark for Visual Question Answering

Nguyen, Thuy, Nguyen, Dang, Nguyen, Hoang, Luong, Thuan, Dang, Long Hoang, Lai, Viet Dac

arXiv.org Artificial IntelligenceMar-4-2025

We present a challenging benchmark for the Open WorLd VISual question answering (OWLViz) task. OWLViz presents concise, unambiguous queries that require integrating multiple capabilities, including visual understanding, web exploration, and specialized tool usage. While humans achieve 69.2% accuracy on these intuitive tasks, even state-of-the-art VLMs struggle, with the best model, Gemini 2.0, achieving only 26.6% accuracy. Current agentic VLMs, which rely on limited vision and vision-language models as tools, perform even worse. This performance gap reveals significant limitations in multimodal systems' ability to select appropriate tools and execute complex reasoning sequences, establishing new directions for advancing practical AI research.

assistant, environment, second floor, (15 more...)

arXiv.org Artificial Intelligence

2503.07631

Country:

North America > United States > Oregon > Lane County > Eugene (0.15)
Europe > Austria > Vienna (0.14)
Asia > Thailand (0.14)

Genre: Research Report (0.82)

Industry: Consumer Products & Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(2 more...)

Add feedback

Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles

Wang, Kuang, Li, Xianfei, Yang, Shenghao, Zhou, Li, Jiang, Feng, Li, Haizhou

arXiv.org Artificial IntelligenceFeb-27-2025

User simulators are crucial for replicating human interactions with dialogue systems, supporting both collaborative training and automatic evaluation, especially for large language models (LLMs). However, existing simulators often rely solely on text utterances, missing implicit user traits such as personality, speaking style, and goals. In contrast, persona-based methods lack generalizability, as they depend on predefined profiles of famous individuals or archetypes. To address these challenges, we propose User Simulator with implicit Profiles (USP), a framework that infers implicit user profiles from human-machine conversations and uses them to generate more personalized and realistic dialogues. We first develop an LLM-driven extractor with a comprehensive profile schema. Then, we refine the simulation through conditional supervised fine-tuning and reinforcement learning with cycle consistency, optimizing it at both the utterance and conversation levels. Finally, we adopt a diverse profile sampler to capture the distribution of real-world user profiles. Experimental results demonstrate that USP outperforms strong baselines in terms of authenticity and diversity while achieving comparable performance in consistency. Furthermore, dynamic multi-turn evaluations based on USP strongly align with mainstream benchmarks, demonstrating its effectiveness in real-world applications.

association, computational linguistic, consistency, (15 more...)

arXiv.org Artificial Intelligence

2502.18968

Country:

Asia > China (0.28)
Asia > Thailand (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.14)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Eeyore: Realistic Depression Simulation via Supervised and Preference Optimization

Liu, Siyang, Brie, Bianca, Li, Wenda, Biester, Laura, Lee, Andrew, Pennebaker, James, Mihalcea, Rada

arXiv.org Artificial IntelligenceFeb-21-2025

Large Language Models (LLMs) have been previously explored for mental healthcare training and therapy client simulation, but they still fall short in authentically capturing diverse client traits and psychological conditions. We introduce \textbf{Eeyore}, an 8B model optimized for realistic depression simulation through a structured alignment framework, incorporating expert input at every stage. First, we systematically curate real-world depression-related conversations, extracting depressive traits to guide data filtering and psychological profile construction, and use this dataset to instruction-tune Eeyore for profile adherence. Next, to further enhance realism, Eeyore undergoes iterative preference optimization -- first leveraging model-generated preferences and then calibrating with a small set of expert-annotated preferences. Throughout the entire pipeline, we actively collaborate with domain experts, developing interactive interfaces to validate trait extraction and iteratively refine structured psychological profiles for clinically meaningful role-play customization. Despite its smaller model size, the Eeyore depression simulation outperforms GPT-4o with SOTA prompting strategies, both in linguistic authenticity and profile adherence.

assistant, dataset, psychological profile, (12 more...)

arXiv.org Artificial Intelligence

2503.00018

Country:

North America > United States > Texas (0.14)
North America > United States > Michigan (0.14)
Europe > Czechia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Personal > Interview (0.67)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback