AITopics | roscoe

Collaborating Authors

roscoe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction

Sheng, Huanxin, Liu, Xinyi, He, Hangfeng, Zhao, Jieyu, Kang, Jian

arXiv.org Artificial IntelligenceSep-24-2025

LLM-as-a-judge has become a promising paradigm for using large language models (LLMs) to evaluate natural language generation (NLG), but the uncertainty of its evaluation remains underexplored. This lack of reliability may limit its deployment in many applications. This work presents the first framework to analyze the uncertainty by offering a prediction interval of LLM-based scoring via conformal prediction. Conformal prediction constructs continuous prediction intervals from a single evaluation run, and we design an ordinal boundary adjustment for discrete rating tasks. We also suggest a midpoint-based score within the interval as a low-bias alternative to raw model score and weighted average. We perform extensive experiments and analysis, which show that conformal prediction can provide valid prediction interval with coverage guarantees. We also explore the usefulness of interval midpoint and judge reprompting for better judgment.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.18658

Country: North America > United States (0.92)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education > Assessment & Standards (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

This Talking Pet Collar Is Like a Chatbot for Your Dog

WIREDOct-15-2024, 17:00:00 GMT

Humans have been trying to talk to animals ever since we figured out how to form words. In modern times, we turn to technology for the solution--giving our dogs talking buttons to paw at, or trying to use artificial intelligence to help us understand whales. The latest and perhaps most direct approach at human-animal communication is a voice-activated collar that gives your pet the power to talk back to you. John McHale, a self-described "tech guy" based out of Austin, Texas, has a company called Personifi AI. The startup's goal, as the name implies, is to create tech that will "personify everything," as McHale puts it.

mchale, roscoe, shazam, (3 more...)

WIRED

Country: North America > United States > Texas > Travis County > Austin (0.26)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.42)

Add feedback

Suspect shoots robotic police dog in Massachusetts standoff; manufacturer says it's a first

FOX NewsMar-27-2024, 22:02:04 GMT

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. A robotic dog is being thanked by state police in Massachusetts for helping avert a tragedy involving a person barricaded in a home. The dog named Roscoe was part of the Massachusetts State Police Bomb Squad and deployed on March 6 in a Barnstable house after police were fired upon. Police sent in two other robots often used for bomb disposal into the house to find the suspect along with Roscoe.

massachusetts standoff, roscoe, suspect shoot robotic police dog, (4 more...)

FOX News

Country:

South America > Brazil (0.07)
North America > United States > Massachusetts > Barnstable County > Barnstable (0.07)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness

Prasad, Archiki, Saha, Swarnadeep, Zhou, Xiang, Bansal, Mohit

arXiv.org Artificial IntelligenceNov-30-2023

Multi-step reasoning ability is fundamental to many natural language tasks, yet it is unclear what constitutes a good reasoning chain and how to evaluate them. Most existing methods focus solely on whether the reasoning chain leads to the correct conclusion, but this answer-oriented view may confound reasoning quality with other spurious shortcuts to predict the answer. To bridge this gap, we evaluate reasoning chains by viewing them as informal proofs that derive the final answer. Specifically, we propose ReCEval (Reasoning Chain Evaluation), a framework that evaluates reasoning chains via two key properties: (1) correctness, i.e., each step makes a valid inference based on information contained within the step, preceding steps, and input context, and (2) informativeness, i.e., each step provides new information that is helpful towards deriving the generated answer. We evaluate these properties by developing metrics using natural language inference models and V-Information. On multiple datasets, we show that ReCEval effectively identifies various error types and yields notable improvements compared to prior methods. We analyze the impact of step boundaries, and previous steps on evaluating correctness and demonstrate that our informativeness metric captures the expected flow of information in high-quality reasoning chains. Finally, we show that scoring reasoning chains based on ReCEval improves downstream task performance. Our code is publicly available at: https://github.com/archiki/ReCEval

computational linguistic, correctness, reasoning chain, (12 more...)

arXiv.org Artificial Intelligence

2304.10703

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Asia > China > Hong Kong (0.04)
(7 more...)

Genre:

Workflow (0.68)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.47)

Add feedback

Can Language Models Laugh at YouTube Short-form Videos?

Ko, Dayoon, Lee, Sangho, Kim, Gunhee

arXiv.org Artificial IntelligenceOct-26-2023

As short-form funny videos on social networks are gaining popularity, it becomes demanding for AI models to understand them for better communication with humans. Unfortunately, previous video humor datasets target specific domains, such as speeches or sitcoms, and mostly focus on verbal cues. We curate a user-generated dataset of 10K multimodal funny videos from YouTube, called ExFunTube. Using a video filtering pipeline with GPT-3.5, we verify both verbal and visual elements contributing to humor. After filtering, we annotate each video with timestamps and text explanations for funny moments. Our ExFunTube is unique over existing datasets in that our videos cover a wide range of domains with various types of humor that necessitate a multimodal understanding of the content. Also, we develop a zero-shot video-to-text prompting to maximize video humor understanding of large language models (LLMs). With three different evaluation methods using automatic scores, rationale quality experiments, and human evaluations, we show that our prompting significantly improves LLMs' ability for humor explanation.

explanation, gpt-3, video, (12 more...)

arXiv.org Artificial Intelligence

2310.14159

Country:

North America > United States > New York (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

[2212.07919] ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

#artificialintelligenceDec-16-2022, 01:14:26 GMT

Large language models show improved downstream task performance when prompted to generate step-by-step reasoning to justify their final answers. These reasoning steps greatly improve model interpretability and verification, but objectively studying their correctness (independent of the final answer) is difficult without reliable methods for automatic evaluation. We simply do not know how often the stated reasoning steps actually support the final end task predictions. In this work, we present ROSCOE, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics. To evaluate ROSCOE against baseline metrics, we design a typology of reasoning errors and collect synthetic and human evaluation scores on commonly used reasoning datasets. In contrast with existing metrics, ROSCOE can measure semantic consistency, logicality, informativeness, fluency, and factuality - among other traits - by leveraging properties of step-by-step rationales. We empirically verify the strength of our metrics on five human annotated and six programmatically perturbed diagnostics datasets - covering a diverse set of tasks that require reasoning skills and show that ROSCOE can consistently outperform baseline metrics.

metric, roscoe, scoring step-by-step reasoning

#artificialintelligence

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

Donkey: Building Self Driving Cars with Will Roscoe - Episode 132

#artificialintelligenceOct-23-2017, 04:50:35 GMT

Do you wish that you had a self-driving car of your own? With Donkey you can make that dream a reality. This week Will Roscoe shares the story of how he got involved in the arena of self-driving car hobbyists and ended up building a Python library to act as his pilot. We talked about the hardware involved, how he has evolved the code to meet unexpected challenges, and how he plans to improve it in the future. So go build your own self driving car and take it for a spin!

artificial intelligence, building self driving car, episode 132, (7 more...)

#artificialintelligence

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

The near-futurism of Disney Channel original movies -- does it hold up?

#artificialintelligenceMar-5-2017, 20:22:07 GMT

Does It Hold Up is a chance to re-experience childhood favorites of books, movies, TV shows, video games, and other cultural phenomenon decades later. Have they gotten better like a fine wine, or are we drinking cork? A cornerstone of any pre-teen's life between 1998 to 2007 was the Disney Channel original movie. If you grew up during that time you do not need a refresher on why movies like Halloweentown or Zenon: Girl of the 21st Century were popular -- they were your main option for entertainment because you were constantly at home! (That is what it is like to not have a driver's license.) But you may need a refresher on their content, because I just revisited a bunch of them and they are not what I thought.

artificial intelligence, movie, smart house, (12 more...)

#artificialintelligence

Country:

North America > United States > California (0.05)
North America > United States > Florida > Orange County (0.04)

Industry:

Media > Television (1.00)
Media > Music (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

The Wager

Cherniak, Christopher

AI MagazineJul-30-1986

The Portrait Programs Project grew out of hyperinterdisciplinarianism of the famed Gigabase Sculpture Group, in turn stimulated by recent cutbacks in government support for the arts. The National Endowment for the Humanities and the National Science Foundation had jointly funded the Gigabase Sculpture Project to foster the literary/musical genre of composing genetic codes for novel organisms. Later, artists trained in recombinant DNA technology designed massive Brancusi-esque statues of living cytoplasmic jelly. However, Art For Art's Sake objectives of these giblet sculptors were compromised by precautions necessary after discovery of the "Gogol's-Theorem Bomb" that threatened to get loose and jam all DNA replication in the biosphere; not even viruses would have survived.

artificial intelligence, natural language, portraitprogram, (19 more...)

AI Magazine

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Maryland (0.04)

Industry:

Government > Regional Government > North America Government > United States Government (0.86)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Law > Statutes (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.46)
Information Technology > Artificial Intelligence > Issues (0.46)

Add feedback