AITopics

2505.19915

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.69)
Government > Military (0.67)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.51)

arXiv.org Artificial IntelligenceFeb-26-2025

GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration

Sung, Yoo Yeon, Fleisig, Eve, Hou, Yu, Upadhyay, Ishan, Boyd-Graber, Jordan Lee

Language models are often miscalibrated, leading to confidently incorrect answers. We introduce GRACE, a benchmark for language model calibration that incorporates comparison with human calibration. GRACE consists of question-answer pairs, in which each question contains a series of clues that gradually become easier, all leading to the same answer; models must answer correctly as early as possible as the clues are revealed. This setting permits granular measurement of model calibration based on how early, accurately, and confidently a model answers. After collecting these questions, we host live human vs. model competitions to gather 1,749 data points on human and model teams' timing, accuracy, and confidence. We propose a metric, CalScore, that uses GRACE to analyze model calibration errors and identify types of model miscalibration that differ from human behavior. We find that although humans are less accurate than models, humans are generally better calibrated. Since state-of-the-art models struggle on GRACE, it effectively evaluates progress on improving model calibration.

calibration, computational linguistic, probability, (15 more...)

2502.19684

Country:

Asia > Middle East > Jordan (0.05)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.67)
Leisure & Entertainment > Games (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)

Soares, Paulo, Pyarelal, Adarsh, Barnard, Kobus

Probabilistic Modeling of Human Teams to Infer False Beliefs

arXiv.org Artificial IntelligenceOct-19-2023

We develop a probabilistic graphical model (PGM) for artificially intelligent (AI) agents to infer human beliefs during a simulated urban search and rescue (USAR) scenario executed in a Minecraft environment with a team of three players. The PGM approach makes observable states and actions explicit, as well as beliefs and intentions grounded by evidence about what players see and do over time. This approach also supports inferring the effect of interventions, which are vital if AI agents are to assist human teams. The experiment incorporates manipulations of players' knowledge, and the virtual Minecraft-based testbed provides access to several streams of information, including the objects in the players' field of view. The participants are equipped with a set of marker blocks that can be placed near room entrances to signal the presence or absence of victims in the rooms to their teammates. In each team, one of the members is given a different legend for the markers than the other two, which may mislead them about the state of the rooms; that is, they will hold a false belief. We extend previous works in this field by introducing ToMCAT, an AI agent that can reason about individual and shared mental states. We find that the players' behaviors are affected by what they see in their in-game field of view, their beliefs about the meaning of the markers, and their beliefs about which meaning the team decided to adopt. In addition, we show that ToMCAT's beliefs are consistent with the players' actions and that it can infer false beliefs with accuracy significantly better than chance and comparable to inferences made by human observers.

human team, infer false belief, probabilistic modeling

2310.12929

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.73)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.40)

#artificialintelligenceDec-27-2022, 00:30:28 GMT

Yes, ChatGPT Is Sentient -- Because It's Really Humans in the Loop

OpenAI, recently released a new AI program called ChatGPT. It left the internet gobsmacked, though some were skeptical, and concerned about its abilities. The really amazing thing is ChatGPT's humanlike responses. They gives an observer an unnerving suspicion that the AI is actually sentient. Maybe it is actually sentient.

chatbot, chatgpt, conversation history, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceMay-27-2022, 01:51:03 GMT

Why AI and autonomous response are crucial for cybersecurity (VB On-Demand)

Today, cybersecurity is in a state of continuous growth and improvement. In this on-demand webinar, learn how two organizations use a continuous AI feedback loop to identify vulnerabilities, harden defenses and improve the outcomes of their cybersecurity programs. The security risk landscape is in tremendous flux, and the traditional on-premises approach to cybersecurity is no longer enough. Remote work has become the norm, and outside the office walls, employees are letting down their personal security defenses. Cyber risks introduced by the supply chain via third parties are still a major vulnerability, so organizations need to think about not only their defenses but those of their suppliers to protect their priority assets and information from infiltration and exploitation.

autonomous response, cybersecurity, lorimer, (14 more...)

Country:

Europe > Ukraine (0.15)
Asia > Russia (0.15)
Europe > Russia (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.35)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Collaboration (0.35)

#artificialintelligenceOct-29-2021, 00:50:51 GMT

Inside the Air Force Training Program that Will Pit Human Pilots Against AI

Air Force fighter pilots will soon face new opponents in their training: artificial intelligence-based enemy pilots that can match humans based on their personal learning needs. After steering the production of numerous AI-enabled pilot agents for years, Aptima, Inc. confirmed it landed a four-year contract with the Air Force Research Laboratory to build an "automated librarian" that will categorize those AI pilots and pair them with military trainees in scenarios that are right to advance their skillsets. "The best case outcome is that AFRL determines that the products of this research are so promising that they create a library into which AI training technologies are shelved like books are shelved and they refine the sort of librarian that we're trying to build here so that it can sweep through that enormous library of AI, sweep through a library of scenarios--and for each individual student--pick out just the right pairing to advance them to expertise reliably and more quickly than we can do today," Aptima's Chief Scientist Jared Freeman told Nextgov during an interview on Tuesday. Freeman joined the company in 1999, four years after its launch. Aptima's project portfolio has grown increasingly diverse since then, he noted. Now, much of it concerns AI support for human teams, like forming and measuring them, and helping people and AI to manage those groups.

air force training program, aptima, freeman, (14 more...)

Genre: Instructional Material (0.36)

Industry: Government > Military > Air Force (1.00)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.40)

Scheikl, Paul Maria, Gyenes, Balázs, Davitashvili, Tornike, Younis, Rayan, Schulze, André, Müller-Stich, Beat P., Neumann, Gerhard, Wagner, Martin, Mathis-Ullrich, Franziska

Cooperative Assistance in Robotic Surgery through Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceOct-10-2021

Cognitive cooperative assistance in robot-assisted surgery holds the potential to increase quality of care in minimally invasive interventions. Automation of surgical tasks promises to reduce the mental exertion and fatigue of surgeons. In this work, multi-agent reinforcement learning is demonstrated to be robust to the distribution shift introduced by pairing a learned policy with a human team member. Multi-agent policies are trained directly from images in simulation to control multiple instruments in a sub task of the minimally invasive removal of the gallbladder. These agents are evaluated individually and in cooperation with humans to demonstrate their suitability as autonomous assistants. Compared to human teams, the hybrid teams with artificial agents perform better considering completion time (44.4% to 71.2% shorter) as well as number of collisions (44.7% to 98.0% fewer). Path lengths, however, increase under control of an artificial agent (11.4% to 33.5% longer). A multi-agent formulation of the learning problem was favored over a single-agent formulation on this surgical sub task, due to the sequential learning of the two instruments. This approach may be extended to other tasks that are difficult to formulate within the standard reinforcement learning framework. Multi-agent reinforcement learning may shift the paradigm of cognitive robotic surgery towards seamless cooperation between surgeons and assistive technologies.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

doi: 10.1109/IROS51168.2021.9636193

2110.04857

Country:

North America > United States (0.15)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)
Leisure & Entertainment > Games > Computer Games (0.47)
Health & Medicine > Therapeutic Area > Gastroenterology (0.38)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceMay-30-2021, 14:15:33 GMT

Can AI be used in cybersecurity? You asked, we answered!

How AI enhances security for IoT environments. Elon Musk's prediction that AI will outsmart humans in less than 5 years is a bold statement, predicting that machines will possess super-human qualities which help boost organizations' profits and goals. For many, these ideas belong in sci-fi fantasies rather than as a future fixture of working practices. In the broadest sense, there are no signs that AI comes close to human consciousness or sentience. When we talk about the power of AI, it's more helpful to consider the specific use cases and sectors where it will, and is having, a transformative effect – and there is one area in particular where AI has been seen to mimic the capabilities of complex human thought processes: cyber security.

investigation, security team, threat, (11 more...)

Country: Europe (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.51)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Issues (0.50)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

#artificialintelligenceNov-10-2020, 20:30:29 GMT

Can AI be used in cybersecurity? You asked, we answered!

Elon Musk's prediction that AI will outsmart humans in less than 5 years is a bold statement, predicting that machines will possess super-human qualities which help boost organizations' profits and goals. For many, these ideas belong in sci-fi fantasies rather than as a future fixture of working practices. In the broadest sense, there are no signs that AI comes close to human consciousness or sentience. When we talk about the power of AI, it's more helpful to consider the specific use cases and sectors where it will, and is having, a transformative effect – and there is one area in particular where AI has been seen to mimic the capabilities of complex human thought processes: cyber security. For organizations seeing more and more attacks against their digital infrastructure, cyber security is a top priority.

investigation, security team, threat, (11 more...)

Country: Europe (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.51)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Issues (0.50)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

#artificialintelligenceJun-13-2020, 00:21:02 GMT

AI Emerges As A Major Player In The Race To Find Covid-19 Therapies And Vaccines

Covid-19 is the new Manhattan Project and AI emerges as a major player in it. Covid-19 research has quickly created unprecedented amounts of publicly available research data from federal governments, industry, and university research labs at record rates. For example, the Covid-19 Open Research Dataset (CORD-19) created by the Allen Institute for AI in collaboration with government agencies, universities, and industry partners started with 13,000 Covid-19 scholarly articles. Two months later, it had grown to over 128K articles. Research data on a topic normally takes years, not months to grow that large.

artificial intelligence, covid-19 research, data mining, (16 more...)

Country: Europe > Switzerland (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Security & Privacy (0.99)
Information Technology > Data Science > Data Mining (0.33)