AITopics | Josang, Audun

Collaborating Authors

Josang, Audun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence

Tihanyi, Norbert, Bisztray, Tamas, Dubniczky, Richard A., Toth, Rebeka, Borsos, Bertalan, Cherif, Bilel, Ferrag, Mohamed Amine, Muzsai, Lajos, Jain, Ridhi, Marinelli, Ryan, Cordeiro, Lucas C., Debbah, Merouane, Mavroeidis, Vasileios, Josang, Audun

arXiv.org Artificial IntelligenceNov-22-2024

As machine intelligence evolves, the need to test and compare the problem-solving abilities of different AI models grows. However, current benchmarks are often simplistic, allowing models to perform uniformly well and making it difficult to distinguish their capabilities. Additionally, benchmarks typically rely on static question-answer pairs that the models might memorize or guess. To address these limitations, we introduce Dynamic Intelligence Assessment (DIA), a novel methodology for testing AI models using dynamic question templates and improved metrics across multiple disciplines such as mathematics, cryptography, cybersecurity, and computer science. The accompanying dataset, DIA-Bench, contains a diverse collection of challenge templates with mutable parameters presented in various formats, including text, PDFs, compiled binaries, visual puzzles, and CTF-style cybersecurity challenges. Our framework introduces four new metrics to assess a model's reliability and confidence across multiple attempts. These metrics revealed that even simple questions are frequently answered incorrectly when posed in varying forms, highlighting significant gaps in models' reliability. Notably, API models like GPT-4o often overestimated their mathematical capabilities, while ChatGPT-4o demonstrated better performance due to effective tool usage. In self-assessment, OpenAI's o1-mini proved to have the best judgement on what tasks it should attempt to solve. We evaluated 25 state-of-the-art LLMs using DIA-Bench, showing that current models struggle with complex tasks and often display unexpectedly low confidence, even with simpler questions. The DIA framework sets a new standard for assessing not only problem-solving but also a model's adaptive intelligence and ability to assess its limitations. The dataset is publicly available on the project's page: https://github.com/DIA-Bench.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.1549

Country: Europe (1.00)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Active Learning on Neural Networks through Interactive Generation of Digit Patterns and Visual Representation

Jeong, Dong H., Cho, Jin-Hee, Chen, Feng, Josang, Audun, Ji, Soo-Yeon

arXiv.org Artificial IntelligenceOct-2-2023

Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying computation and meaning are hidden. Due to this nature, users often face difficulties in interpreting the underlying mechanism of the NNs and the benefits of using them. In this paper, to improve users' learning and understanding of NNs, an interactive learning system is designed to create digit patterns and recognize them in real time. To help users clearly understand the visual differences of digit patterns (i.e., 0 ~ 9) and their results with an NN, integrating visualization is considered to present all digit patterns in a two-dimensional display space with supporting multiple user interactions. An evaluation with multiple datasets is conducted to determine its usability for active learning. In addition, informal user testing is managed during a summer workshop by asking the workshop participants to use the system.

artificial intelligence, deep learning, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2310.0158

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (0.68)
Education > Educational Setting > Higher Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Cumulative and Averaging Fission of Beliefs

Josang, Audun

arXiv.org Artificial IntelligenceDec-7-2007

Belief fusion is the principle of combining separate beliefs or bodies of evidence originating from different sources. Depending on the situation to be modelled, different belief fusion methods can be applied. Cumulative and averaging belief fusion is defined for fusing opinions in subjective logic, and for fusing belief functions in general. The principle of fission is the opposite of fusion, namely to eliminate the contribution of a specific belief from an already fused belief, with the purpose of deriving the remaining belief. This paper describes fission of cumulative belief as well as fission of averaging belief in subjective logic. These operators can for example be applied to belief revision in Bayesian belief networks, where the belief contribution of a given evidence source can be determined as a function of a given fused belief and its other contributing beliefs.

artificial intelligence, bayesian inference, operator, (18 more...)

arXiv.org Artificial Intelligence

0712.1182

Country: North America > United States (0.94)

Genre: Research Report (0.40)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)

Add feedback