AITopics | Hakimov, Sherzod

Collaborating Authors

Hakimov, Sherzod

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models

Hakimov, Sherzod, Pfennigschmidt, Lara, Schlangen, David

arXiv.org Artificial IntelligenceFeb-17-2025

This study utilizes the game Codenames as a benchmarking tool to evaluate large language models (LLMs) with respect to specific linguistic and cognitive skills. LLMs play each side of the game, where one side generates a clue word covering several target words and the other guesses those target words. We designed various experiments by controlling the choice of words (abstract vs. concrete words, ambiguous vs. monosemic) or the opponent (programmed to be faster or slower in revealing words). Recent commercial and open-weight models were compared side-by-side to find out factors affecting their performance. The evaluation reveals details about their strategies, challenging cases, and limitations of LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.11707

Country:

North America > United States (0.46)
Europe > Austria (0.29)
Europe > Germany (0.28)
North America > Mexico (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Plant in Cupboard, Orange on Table, Book on Shelf. Benchmarking Practical Reasoning and Situation Modelling in a Text-Simulated Situated Environment

Jordan, Jonathan, Hakimov, Sherzod, Schlangen, David

arXiv.org Artificial IntelligenceFeb-17-2025

Large language models (LLMs) have risen to prominence as 'chatbots' for users to interact via natural language. However, their abilities to capture common-sense knowledge make them seem promising as language-based planners of situated or embodied action as well. We have implemented a simple text-based environment -- similar to others that have before been used for reinforcement-learning of agents -- that simulates, very abstractly, a household setting. We use this environment and the detailed error-tracking capabilities we implemented for targeted benchmarking of LLMs on the problem of practical reasoning: Going from goals and observations to actions. Our findings show that environmental complexity and game restrictions hamper performance, and concise action planning is demanding for current LLMs.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.11733

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Free-text Rationale Generation under Readability Level Control

Hsu, Yi-Sheng, Feldhus, Nils, Hakimov, Sherzod

arXiv.org Artificial IntelligenceJul-1-2024

Free-text rationales justify model decisions in natural language and thus become likable and accessible among approaches to explanation across many tasks. However, their effectiveness can be hindered by misinterpretation and hallucination. As a perturbation test, we investigate how large language models (LLMs) perform the task of natural language explanation (NLE) under the effects of readability level control, i.e., being prompted for a rationale targeting a specific expertise level, such as sixth grade or college. We find that explanations are adaptable to such instruction, but the requested readability is often misaligned with the measured text complexity according to traditional readability metrics. Furthermore, the quality assessment shows that LLMs' ratings of rationales across text complexity exhibit a similar pattern of preference as observed in natural language generation (NLG). Finally, our human evaluation suggests a generally satisfactory impression on rationales at all readability levels, with high-school-level readability being most commonly perceived and favored.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.01384

Country:

North America > United States (1.00)
Europe (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government (0.93)
Education > Educational Setting > K-12 Education > Middle School (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft

Kranti, Chalamalasetti, Hakimov, Sherzod, Schlangen, David

arXiv.org Artificial IntelligenceJun-25-2024

In the Minecraft Collaborative Building Task, two players collaborate: an Architect (A) provides instructions to a Builder (B) to assemble a specified structure using 3D blocks. In this work, we investigate the use of large language models (LLMs) to predict the sequence of actions taken by the Builder. Leveraging LLMs' in-context learning abilities, we use few-shot prompting techniques, that significantly improve performance over baseline methods. Additionally, we present a detailed analysis of the gaps in performance for future work

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.17553

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics

Bhavsar, Nidhir, Jordan, Jonathan, Hakimov, Sherzod, Schlangen, David

arXiv.org Artificial IntelligenceJun-20-2024

What makes a good Large Language Model (LLM)? That it performs well on the relevant benchmarks -- which hopefully measure, with some validity, the presence of capabilities that are also challenged in real application. But what makes the model perform well? What gives a model its abilities? We take a recently introduced type of benchmark that is meant to challenge capabilities in a goal-directed, agentive context through self-play of conversational games, and analyse how performance develops as a function of model characteristics like number of parameters, or type of training. We find that while there is a clear relationship between number of parameters and performance, there is still a wide spread of performance points within a given size bracket, which is to be accounted for by training parameters such as fine-tuning data quality and method. From a more practical angle, we also find a certain degree of unpredictability about performance across access methods, possible due to unexposed sampling parameters, and a, very welcome, performance stability against at least moderate weight quantisation during inference.

huggingface, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.14051

Country: Europe > Germany (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)

Add feedback

Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models

Hakimov, Sherzod, Abdullayeva, Yerkezhan, Koshti, Kushal, Schmidt, Antonia, Weiser, Yan, Beyer, Anne, Schlangen, David

arXiv.org Artificial IntelligenceJun-20-2024

While the situation has improved for text-only models, it again seems to be the case currently that multimodal (text and image) models develop faster than ways to evaluate them. In this paper, we bring a recently developed evaluation paradigm from text models to multimodal models, namely evaluation through the goal-oriented game (self) play, complementing reference-based and preference-based evaluation. Specifically, we define games that challenge a model's capability to represent a situation from visual information and align such representations through dialogue. We find that the largest closed models perform rather well on the games that we define, while even the best open-weight models struggle with them. On further analysis, we find that the exceptional deep captioning capabilities of the largest models drive some of the performance. There is still room to grow for both kinds of models, ensuring the continued relevance of the benchmark.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2406.14035

Country:

North America > United States (0.46)
Europe > Germany (0.28)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets

Thakkar, Gaurish, Hakimov, Sherzod, Tadić, Marko

arXiv.org Artificial IntelligenceJun-12-2024

In recent years, multimodal natural language processing, aimed at learning from diverse data types, has garnered significant attention. However, there needs to be more clarity when it comes to analysing multimodal tasks in multi-lingual contexts. While prior studies on sentiment analysis of tweets have predominantly focused on the English language, this paper addresses this gap by transforming an existing textual Twitter sentiment dataset into a multimodal format through a straightforward curation process. Our work opens up new avenues for sentiment-related research within the research community. Additionally, we conduct baseline experiments utilising this augmented dataset and report the findings. Notably, our evaluations reveal that when comparing unimodal and multimodal configurations, using a sentiment-tuned large language model as a text encoder performs exceptionally well.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.01753

Country:

Europe (1.00)
North America > United States > Minnesota (0.14)
North America > United States > Michigan (0.14)
North America > United States > Colorado (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

Beyer, Anne, Chalamalasetti, Kranti, Hakimov, Sherzod, Madureira, Brielen, Sadler, Philipp, Schlangen, David

arXiv.org Artificial IntelligenceMay-31-2024

It has been established in recent work that Large Language Models (LLMs) can be prompted to "self-play" conversational games that probe certain capabilities (general instruction following, strategic goal orientation, language understanding abilities), where the resulting interactive game play can be automatically scored. In this paper, we take one of the proposed frameworks for setting up such game-play environments, and further test its usefulness as an evaluation instrument, along a number of dimensions: We show that it can easily keep up with new developments while avoiding data contamination, we show that the tests implemented within it are not yet saturated (human performance is substantially higher than that of even the best models), and we show that it lends itself to investigating additional questions, such as the impact of the prompting language on performance. We believe that the approach forms a good basis for making decisions on model choice for building applied interactive systems, and perhaps ultimately setting up a closed-loop development environment of system and simulated evaluator.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.20859

Country: Europe > Germany (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies

Sadler, Philipp, Hakimov, Sherzod, Schlangen, David

arXiv.org Artificial IntelligenceMar-26-2024

In collaborative goal-oriented settings, the participants are not only interested in achieving a successful outcome, but do also implicitly negotiate the effort they put into the interaction (by adapting to each other). In this work, we propose a challenging interactive reference game that requires two players to coordinate on vision and language observations. The learning signal in this game is a score (given after playing) that takes into account the achieved goal and the players' assumed efforts during the interaction. We show that a standard Proximal Policy Optimization (PPO) setup achieves a high success rate when bootstrapped with heuristic partner behaviors that implement insights from the analysis of human-human interactions. And we find that a pairing of neural partners indeed reduces the measured joint effort when playing together repeatedly. However, we observe that in comparison to a reasonable heuristic pairing there is still room for improvement -- which invites further research in the direction of cost-sharing in collaborative interactions.

artificial intelligence, follower, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2403.17497

Country:

North America > Canada (0.46)
North America > United States (0.46)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.40)

Add feedback

Learning Communication Policies for Different Follower Behaviors in a Collaborative Reference Game

Sadler, Philipp, Hakimov, Sherzod, Schlangen, David

arXiv.org Artificial IntelligenceFeb-7-2024

Albrecht and Stone (2018) state that modeling of changing behaviors remains an open problem "due to the essentially unconstrained nature of what other agents may do". In this work we evaluate the adaptability of neural artificial agents towards assumed partner behaviors in a collaborative reference game. In this game success is achieved when a knowledgeable Guide can verbally lead a Follower to the selection of a specific puzzle piece among several distractors. We frame this language grounding and coordination task as a reinforcement learning problem and measure to which extent a common reinforcement training algorithm (PPO) is able to produce neural agents (the Guides) that perform well with various heuristic Follower behaviors that vary along the dimensions of confidence and autonomy. We experiment with a learning signal that in addition to the goal condition also respects an assumed communicative effort. Our results indicate that this novel ingredient leads to communicative strategies that are less verbose (staying silent in some of the steps) and that with respect to that the Guide's strategies indeed adapt to the partner's level of confidence and autonomy. Figure 1: An exemplary interaction between a Guide and a Follower that controls the gripper (the black dot).

machine learning, natural language, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2402.04824

Country:

North America > United States > Louisiana (0.14)
North America > United States > Maryland (0.14)
Europe > United Kingdom > England (0.14)

Genre:

Research Report (0.70)
Overview (0.46)

Industry:

Education (0.66)
Leisure & Entertainment > Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback