jeopardy
em Jeopardy! /em 's Most Infamous Moment Haunted the Show's Fans, Its Stars, and Even Alex Trebek. It's Clear Why Now.
's most controversial moment was years in the making. It took many more for the fallout to come into full view. One morning in 2010, Alex Trebek walked onto the IBM campus not far outside New York City and prepared to inspect what would become the most unusual player in's history. The trip, clear across the country from the show's Culver City set, had been carefully planned. David Ferrucci, a computer scientist at IBM, had spent years leading a team to develop what would become the first and, so far, last nonhuman ever to compete on Longtime host Trebek would watch three practice games played with "Watson," as the system was named, and two human contestants. Then the team would be taken to lunch nearby, and Trebek would ultimately take the stage and host two more Watson practice games himself. By then the preparations for a future televised contest with IBM's creation were well underway, but this was the first time Trebek would encounter the technology in person, and his approval was crucial. Ferrucci was eager to show off one element in particular: the display, which had been rigged to show Watson's top three guesses whenever it answered, along with the numerical confidence rate it had in each one. For Ferrucci, this feature was central to demonstrating the computer's language-processing capabilities, because it showed that Watson wasn't just spitting out answers--it was reasoning. If Watson were ever going to be deployed to industries like health care, its human users wouldn't just want to know its best guess. It would be infinitely more valuable to know if Watson was 95 percent confident or just 30 percent, and whether those confidence levels were in line with its actual accuracy rate. It also made for better viewing. Ferrucci had brought his young daughter to the lab earlier in the process and showed her Watson as it played against human opponents. When Watson declined to ring in, Ferrucci's daughter turned to him and asked if the computer had crashed. He struggled to explain that it hadn't--it just wasn't confident enough to hazard a guess.
- North America > United States > California > Los Angeles County > Culver City (0.24)
- North America > United States > New York > Westchester County (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
Fox News AI Newsletter: The school where AI runs the classroom
Alpha School co-founder Mackenzie Price and a junior at the school Elle Kristine join'Fox & Friends' to discuss the benefits of incorporating artificial intelligence into the classroom. Alpha School uses AI-powered software and devices like these to deliver personalized instruction in just two hours of classroom time per day. TOP OF THE CLASS: At a time when many American students are struggling to keep up, a private school in Texas is doing more with less, much less. At Alpha School, students spend just two hours a day in class, guided by an Artificial Intelligence (AI) tutor. But results are impressive: students are testing in the top 1 to 2% nationally.
- North America > United States > Texas (0.30)
- North America > United States > Arizona (0.06)
'Jeopardy' host Ken Jennings 'deeply skeptical' of AI, years after losing to supercomputer
"Jeopardy!" host Ken Jennings tells Fox News Digital he wants to know a human is behind any creative projects, not AI. "I'm deeply skeptical of AI," Jennings told Fox News Digital at the TCM Classic Film Festival. "Obviously, these current iterations of LLMs [Large Language Models] would clean Watson's clock at'Jeopardy!' The technology has moved on. I've played with chatbots and'Jeopardy!' clues, and they're very hard to stump," he said.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > Canada > Ontario > Toronto (0.05)
A Russian Jeopardy! Data Set for Question-Answering Systems
Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much appreciated in chatbots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Chgk (che ge ka). The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! - "Own Game". We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA competition based on the data set collected from this database.
- Europe > Russia (0.14)
- Asia > Russia > Ural Federal District > Tyumen Oblast > Tyumen (0.05)
- Europe > United Kingdom > England (0.04)
- (8 more...)
PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering
Li, Zongxia, Mondal, Ishani, Liang, Yijun, Nghiem, Huy, Boyd-Graber, Jordan Lee
Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current efficient answer correctness (AC) metrics do not align with human judgments, particularly verbose, free-form answers from large language models (LLMs). There are two challenges: a lack of diverse evaluation data and that models are too big and non-transparent; LLM-based scorers correlate better with humans, but this expensive task has only been tested on limited QA datasets. We rectify these issues by providing guidelines and datasets for evaluating machine QA adopted from human QA community. We also propose an efficient, low-resource, and interpretable QA evaluation method more stable than an exact match and neural methods.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
- (12 more...)
- Transportation > Air (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Leisure & Entertainment > Games > Jeopardy! (0.93)
- (2 more...)
CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering
Li, Zongxia, Mondal, Ishani, Liang, Yijun, Nghiem, Huy, Boyd-Graber, Jordan
Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current evaluation metrics to determine answer equivalence (AE) often do not align with human judgments, particularly more verbose, free-form answers from large language models (LLM). There are two challenges: a lack of data and that models are too big: LLM-based scorers can correlate better with human judges, but this task has only been tested on limited QA datasets, and even when available, update of the model is limited because LLMs are large and often expensive. We rectify both of these issues by providing clear and consistent guidelines for evaluating AE in machine QA adopted from professional human QA contests. We also introduce a combination of standard evaluation and a more efficient, robust, and lightweight discriminate AE classifier-based matching method (CFMatch, smaller than 1 MB), trained and validated to more accurately evaluate answer correctness in accordance with adopted expert AE rules that are more aligned with human judgments.
- Asia > Middle East > Jordan (0.04)
- Asia > South Korea (0.04)
- Africa > Eritrea > Maekel > Asmara (0.04)
- (15 more...)
- Health & Medicine (1.00)
- Education (0.67)
- Transportation > Air (0.67)
- (2 more...)
DsDm: Model-Aware Dataset Selection with Datamodels
Engstrom, Logan, Feldmann, Axel, Madry, Aleksander
When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior. However, in practice the opposite can often happen: we find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data. To develop better methods for selecting data, we start by framing dataset selection as an optimization problem that we can directly solve for: given target tasks, a learning algorithm, and candidate data, select the subset that maximizes model performance. This framework thus avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks. Our resulting method greatly improves language model (LM) performance on both pre-specified tasks and previously unseen tasks. Specifically, choosing target tasks representative of standard LM problems and evaluating on diverse held-out benchmarks, our selected datasets provide a 2x compute multiplier over baseline methods.
- North America > United States > New York > Albany County > Albany (0.14)
- Europe > Ireland (0.05)
- Europe > Russia (0.04)
- (67 more...)
- Leisure & Entertainment (1.00)
- Education (1.00)
- Media (0.92)
- (4 more...)
Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration
Mehta, Viraj, Das, Vikramjeet, Neopane, Ojash, Dai, Yijia, Bogunovic, Ilija, Schneider, Jeff, Neiswanger, Willie
Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that one can often choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and formalize this as an offline contextual dueling bandit problem. We give an upper-confidence-bound style algorithm for this problem and prove a polynomial worst-case regret bound. We then provide empirical confirmation in a synthetic setting that our approach outperforms existing methods. After, we extend the setting and methodology for practical use in RLHF training of large language models. Here, our method is able to reach better performance with fewer samples of human preferences than multiple baselines on three real-world datasets.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (2 more...)
'Jeopardy' fans furious over 'petty' ruling that ended contestants 9-day winning streak
Fox Nation's'Who Can Forget 2021?' revisits the year's biggest headlines. To watch the full program, visit foxnation.com "Jeopardy" fans are angry on behalf of nine-day champion Ben Chan after a spelling error caused his winning streak to come to an end. On Tuesday night's episode, Chan reached the Final Jeopardy category after a rocky start with a Daily Double loss that put him close with his opponents, Lynn Di Vito and Danny Lesserman. The category was "Shakespeare's Characters," and the clue was "Both of the names of these 2 lovers in a Shakespeare play come from Latin words for'blessed.'"
Why do men dominate on 'Jeopardy!'?
'Gutfeld!' guests discuss a'Jeopardy!' question that used alleged murderer Brian Laundrie as the clue. My husband and I watch "Jeopardy!" I've noticed that the strongest players are almost always men. Last week, the show hosted its Masters Tournament, and, consistent with my observation, four of the six "masters" are male. The sixth master contestant is a woman, Mattea Roach, of Canada.