AITopics | Kantor, Paul

Collaborating Authors

Kantor, Paul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models

Wang, Wentian, Kantor, Paul, Feldman, Jacob, Gallos, Lazaros, Wang, Hao

arXiv.org Artificial IntelligenceJun-15-2024

We propose MMLU-SR, a novel dataset designed to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms. We reasoned that an agent that ``truly'' understands a concept can still evaluate it when key terms are replaced by suitably defined alternate terms, and sought to differentiate such comprehension from mere text replacement. In our study, we modified standardized test questions by replacing a key term with a dummy word along with its definition. The key term could be in the context of questions, answers, or both questions and answers. Notwithstanding the high scores achieved by recent popular LLMs on the MMLU leaderboard, we found a substantial reduction in model performance after such replacement, suggesting poor comprehension. This new benchmark provides a rigorous benchmark for testing true model comprehension, and poses a challenge to the broader scientific community.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.15468

Country: North America (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (1.00)
Education > Curriculum > Subject-Specific Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Comparing Reinforcement Learning and Human Learning using the Game of Hidden Rules

Pulick, Eric, Menkov, Vladimir, Mintz, Yonatan, Kantor, Paul, Bier, Vicki

arXiv.org Artificial IntelligenceJun-30-2023

Reliable real-world deployment of reinforcement learning (RL) methods requires a nuanced understanding of their strengths and weaknesses and how they compare to those of humans. Human-machine systems are becoming more prevalent and the design of these systems relies on a task-oriented understanding of both human learning (HL) and RL. Thus, an important line of research is characterizing how the structure of a learning task affects learning performance. While increasingly complex benchmark environments have led to improved RL capabilities, such environments are difficult to use for the dedicated study of task structure. To address this challenge we present a learning environment built to support rigorous study of the impact of task structure on HL and RL. We demonstrate the environment's utility for such study through example experiments in task structure that show performance differences between humans and RL algorithms.

experiment, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2306.17766

Country:

North America > United States > Wisconsin (0.14)
North America > United States > Massachusetts (0.14)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment > Games > Computer Games (0.68)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning

Pulick, Eric, Bharti, Shubham, Chen, Yiding, Menkov, Vladimir, Mintz, Yonatan, Kantor, Paul, Bier, Vicki M.

arXiv.org Artificial IntelligenceJul-20-2022

As machine learning (ML) is more tightly woven into society, it is imperative that we better characterize ML's strengths and limitations if we are to employ it responsibly. Existing benchmark environments for ML, such as board and video games, offer well-defined benchmarks for progress, but constituent tasks are often complex, and it is frequently unclear how task characteristics contribute to overall difficulty for the machine learner. Likewise, without a systematic assessment of how task characteristics influence difficulty, it is challenging to draw meaningful connections between performance in different benchmark environments. We introduce a novel benchmark environment that offers an enormous range of ML challenges and enables precise examination of how task elements influence practical difficulty. The tool frames learning tasks as a "board-clearing game," which we call the Game of Hidden Rules (GOHR). The environment comprises an expressive rule language and a captive server environment that can be installed locally. We propose a set of benchmark rule-learning tasks and plan to support a performance leader-board for researchers interested in attempting to learn our rules. GOHR complements existing environments by allowing fine, controlled modifications to tasks, enabling experimenters to better understand how each facet of a given learning task contributes to its practical difficulty for an arbitrary ML algorithm.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2207.10218

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback