AITopics | heuristic evaluation

Collaborating Authors

heuristic evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage

Platt, Nolan, Luchs, Ethan, Nizamani, Sehrish

arXiv.org Artificial IntelligenceDec-5-2025

Usability evaluations are essential for ensuring that modern interfaces meet user needs, yet traditional heuristic evaluations by human experts can be time-consuming and subjective, especially early in development. This paper investigates whether large language models (LLMs) can provide reliable and consistent heuristic assessments at the development stage. By applying Jakob Nielsen's ten usability heuristics to thirty open-source websites, we generated over 850 heuristic evaluations in three independent evaluations per site using a pipeline of OpenAI's GPT-4o. For issue detection, the model demonstrated moderate consistency, with an average pairwise Cohen's Kappa of 0.50 and an exact agreement of 84%. Severity judgments showed more variability: weighted Cohen's Kappa averaged 0.63, but exact agreement was just 56%, and Krippendorff's Alpha was near zero. These results suggest that while GPT-4o can produce internally consistent evaluations, especially for identifying the presence of usability issues, its ability to judge severity varies and requires human oversight in practice. Our findings highlight the feasibility and limitations of using LLMs for early-stage, automated usability testing, and offer a foundation for improving consistency in automated User Experience (UX) evaluation. To the best of our knowledge, our work provides one of the first quantitative inter-rater reliability analyses of automated heuristic evaluation and highlights methods for improving model consistency.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/VL-HCC65237.2025.00024

2512.04262

Country: North America > United States > Virginia (0.16)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Synthetic Heuristic Evaluation: A Comparison between AI- and Human-Powered Usability Evaluation

Zhong, Ruican, McDonald, David W., Hsieh, Gary

arXiv.org Artificial IntelligenceJul-4-2025

Usability evaluation is crucial in human-centered design but can be costly, requiring expert time and user compensation. In this work, we developed a method for synthetic heuristic evaluation using multimodal LLMs' ability to analyze images and provide design feedback. Comparing our synthetic evaluations to those by experienced UX practitioners across two apps, we found our evaluation identified 73% and 77% of usability issues, which exceeded the performance of 5 experienced human evaluators (57% and 63%). Compared to human evaluators, the synthetic evaluation's performance maintained consistent performance across tasks and excelled in detecting layout issues, highlighting potential attentional and perceptual strengths of synthetic evaluation. However, synthetic evaluation struggled with recognizing some UI components and design conventions, as well as identifying across screen violations. Additionally, testing synthetic evaluations over time and accounts revealed stable performance. Overall, our work highlights the performance differences between human and LLM-driven evaluations, informing the design of synthetic heuristic evaluations.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2507.02306

Country:

North America > United States > Washington > King County > Seattle (0.14)
Oceania > Australia > Queensland (0.04)
North America > United States > Virginia > Alexandria County > Alexandria (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.48)
Government (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation

Lu, Xinyi, Wang, Xu

arXiv.org Artificial IntelligenceMay-19-2024

Evaluating the quality of automatically generated question items has been a long standing challenge. In this paper, we leverage LLMs to simulate student profiles and generate responses to multiple-choice questions (MCQs). The generative students' responses to MCQs can further support question item evaluation. We propose Generative Students, a prompt architecture designed based on the KLI framework. A generative student profile is a function of the list of knowledge components the student has mastered, has confusion about or has no evidence of knowledge of. We instantiate the Generative Students concept on the subject domain of heuristic evaluation. We created 45 generative students using GPT-4 and had them respond to 20 MCQs. We found that the generative students produced logical and believable responses that were aligned with their profiles. We then compared the generative students' responses to real students' responses on the same set of MCQs and found a high correlation. Moreover, there was considerable overlap in the difficult questions identified by generative students and real students. A subsequent case study demonstrated that an instructor could improve question quality based on the signals provided by Generative Students.

confusion, generative student, student, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3657604.3662031

2405.11591

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.05)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.34)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UX Heuristics and Checklist for Deep Learning powered Mobile Applications with Image Classification

von Wangenheim, Christiane Gresse, Dirschnabel, Gustavo

arXiv.org Artificial IntelligenceJul-5-2023

Advances in mobile applications providing image classification enabled by Deep Learning require innovative User Experience solutions in order to assure their adequate use by users. To aid the design process, usability heuristics are typically customized for a specific kind of application. Therefore, based on a literature review and analyzing existing mobile applications with image classification, we propose an initial set of AIX heuristics for Deep Learning powered mobile applications with image classification decomposed into a checklist. In order to facilitate the usage of the checklist we also developed an online course presenting the concepts and heuristics as well as a web-based tool in order to support an evaluation using these heuristics. These results of this research can be used to guide the design of the interfaces of such applications as well as support the conduction of heuristic evaluations supporting practitioners to develop image classification apps that people can understand, trust, and can engage with effectively.

artificial intelligence, classification, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2307.05513

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
South America > Brazil > Santa Catarina > Florianópolis (0.04)
(9 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.94)
Education > Educational Setting > Online (0.88)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Boosting Search Guidance in Problems with Semantic Attachments

Bernardini, Sara (Royal Holloway, University of London) | Fox, Maria (King's College London) | Long, Derek (King's College London) | Piacentini, Chiara (University of Toronto)

AAAI ConferencesJun-14-2017

Most applications of planning to real problems involve complex and often non-linear equations, including matrix operations. PDDL is ill-suited to express such calculations since it only allows basic operations between numeric fluents. To remedy this restriction, a generic PDDL planner can be connected to a specialised advisor, which equips the planner with the ability to carry out sophisticated mathematical operations. Unlike related techniques based on semantic attachment, our planner is able to exploit an approximation of the numeric information calculated by the advisor to compute informative heuristic estimators. Guided by both causal and numeric information, our planning framework outperforms traditional approaches, especially against problems with numeric goals. We provide evidence of the power of our solution by successfully solving four completely different problems.

approximation, external advisor, ind variable, (12 more...)

AAAI Conferences

Twenty-Seventh International Conference on Automated Planning and Scheduling

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New Jersey (0.04)
Europe > United Kingdom > Scotland (0.04)
(3 more...)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)

Add feedback

Planning with Numeric Timed Initial Fluents

Piacentini, Chiara (King's College London) | Fox, Maria (King's College London) | Long, Derek (King's College London)

AAAI ConferencesMar-6-2015

Numeric Timed Initial Fluents represent a new feature in PDDL that extends the concept of Timed Initial Literals to numeric fluents. They are particularly useful to model independent functions that change through time and influence the actions to be applied. Although they are very useful to model real world problems, they are not systematically defined in the family of PDDL languages and they are not implemented in any generic PDDL planner, except for POPF2 and UPMurphi. In this paper we present an extension of the planner POPF2 (POPF-TIF) to handle problems with numeric Timed Initial Fluents. We propose and evaluate two contributions: the first is based on improvements of the heuristic evaluation, while the second considers alternative search algorithms based on a mixture of Enforced Hill Climbing and Best First Search.

algorithm, artificial intelligence, planning & scheduling, (13 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Europe > United Kingdom > England > Greater London > London (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Improving Delete Relaxation Heuristics Through Explicitly Represented Conjunctions

Keyder, E., Hoffmann, J., Haslum, P.

Journal of Artificial Intelligence ResearchJun-30-2014

Heuristic functions based on the delete relaxation compute upper and lower bounds on the optimal delete-relaxation heuristic h+, and are of paramount importance in both optimal and satisficing planning. Here we introduce a principled and flexible technique for improving h+, by augmenting delete-relaxed planning tasks with a limited amount of delete information. This is done by introducing special fluents that explicitly represent conjunctions of fluents in the original planning task, rendering h+ the perfect heuristic h* in the limit. Previous work has introduced a method in which the growth of the task is potentially exponential in the number of conjunctions introduced. We formulate an alternative technique relying on conditional effects, limiting the growth of the task to be linear in this number. We show that this method still renders h+ the perfect heuristic h* in the limit. We propose techniques to find an informative set of conjunctions to be introduced in different settings, and analyze and extend existing methods for lower-bounding and upper-bounding h+ in the presence of conditional effects. We evaluate the resulting heuristic functions empirically on a set of IPC benchmarks, and show that they are sometimes much more informative than standard delete-relaxation heuristics.

compilation, conditional effect, landmark, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4277

AI Access Foundation

10890

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
South America > Brazil > São Paulo (0.04)
(12 more...)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups

Lanctot, Marc, Winands, Mark H. M., Pepels, Tom, Sturtevant, Nathan R.

arXiv.org Artificial IntelligenceJun-19-2014

Monte Carlo Tree Search (MCTS) has improved the performance of game engines in domains such as Go, Hex, and general game playing. MCTS has been shown to outperform classic alpha-beta search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluations, separately. Rather than using the heuristic evaluations to replace the playouts, our technique backs them up implicitly during the MCTS simulations. These minimax values are then used to guide future simulations. We show that using implicit minimax backups leads to stronger play performance in Kalah, Breakthrough, and Lines of Action.

breakthrough, evaluation function, mct, (13 more...)

arXiv.org Artificial Intelligence

1406.0486

Country:

Europe > Netherlands > Limburg > Maastricht (0.04)
Asia > Japan (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Heuristic Evaluation Based on Lifted Relaxed Planning Graphs

Ridder, Bram (King's College London) | Fox, Maria (King's College London)

AAAI ConferencesJun-9-2014

In previous work we have shown that grounding, while used by most (if not all) modern state-of-the-art planners, is not necessary and is sometimes even undesirable. In this paper we extend this work and present a novel forward-chaining planner that does not require grounding and can solve problem instances that are too large for current planners to handle. We achieve this by exploiting equivalence relationships between objects whist constructing a lifted version of the relaxed planning graph (RPG) and extracting a relaxed plan. We compare our planner to FF and show that our approach consumes far less memory whist still being competitive. In addition we show that by not having to ground the domain we can solve much larger problem instances.

heuristic evaluation, lifted relaxed planning graph

AAAI Conferences

Twenty-Fourth International Conference on Automated Planning and Scheduling

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.40)

Add feedback

Defeasible Decisions: What the Proposal is and isn't

Loui, Ronald P.

arXiv.org Artificial IntelligenceMar-27-2013

In two recent papers, I have proposed a description of decision analysis that differs from the Bayesian picture painted by Savage, Jeffrey and other classic authors. Response to this view has been either overly enthusiastic or unduly pessimistic. In this paper I try to place the idea in its proper place, which must be somewhere in between. Looking at decision analysis as defeasible reasoning produces a framework in which planning and decision theory can be integrated, but work on the details has barely begun. It also produces a framework in which the meta-decision regress can be stopped in a reasonable way, but it does not allow us to ignore meta-level decisions. The heuristics for producing arguments that I have presented are only supposed to be suggestive; but they are not open to the egregious errors about which some have worried. And though the idea is familiar to those who have studied heuristic search, it is somewhat richer because the control of dialectic is more interesting than the deepening of search.

argument, artificial intelligence, decision support system, (18 more...)

arXiv.org Artificial Intelligence

1304.1518

Genre: Research Report (0.40)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.91)

Add feedback