Goto

Collaborating Authors

 Jeopardy!


PEDANTS (Precise Evaluations of Diverse Answer Nominee Text for Skinflints): Efficient Evaluation Analysis and Benchmarking for Open-Domain Question Answering

arXiv.org Artificial Intelligence

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current efficient answer correctness (AC) metrics do not align with human judgments, particularly verbose, free-form answers from large language models (LLMs). There are two challenges: a lack of diverse evaluation data and that models are too big and non-transparent; LLM-based scorers correlate better with humans, but this expensive task has only been tested on limited QA datasets. We rectify these issues by providing guidelines and datasets for evaluating machine QA adopted from human QA community. We also propose an efficient, low-resource, and interpretable QA evaluation method more stable than an exact match and neural methods.


CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering

arXiv.org Artificial Intelligence

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current evaluation metrics to determine answer equivalence (AE) often do not align with human judgments, particularly more verbose, free-form answers from large language models (LLM). There are two challenges: a lack of data and that models are too big: LLM-based scorers can correlate better with human judges, but this task has only been tested on limited QA datasets, and even when available, update of the model is limited because LLMs are large and often expensive. We rectify both of these issues by providing clear and consistent guidelines for evaluating AE in machine QA adopted from professional human QA contests. We also introduce a combination of standard evaluation and a more efficient, robust, and lightweight discriminate AE classifier-based matching method (CFMatch, smaller than 1 MB), trained and validated to more accurately evaluate answer correctness in accordance with adopted expert AE rules that are more aligned with human judgments.


'Jeopardy!' contestant torn apart by fans after huge mistake: 'Such a buffoon'

FOX News

'Gutfeld!' guests discuss a Jeopardy question that used alleged murderer Brian Laundrie as the clue. A "Jeopardy!" contestant is going viral this week after making what many fans are considering one of the biggest blunders in the show's history. On Wednesday's episode, a woman named Karen had a huge lead over the other two contestants as they neared the end of the second round โ€“ she had earned $21,800, while her competitors had earned $7,100 and $6,400. When there were only a few clues left on the Double Jeopardy board, Karen found a Daily Double in the "Hans, Solo" category. If she had made a modest bet, she would have been sure to win the entire game after Final Jeopardy, as the other players couldn't possibly catch up to her lead.


A Decade Of Advancements As We Enter A New Age Of AI

#artificialintelligence

As we embark on the next decade of innovations in AI, Daniel Pitchford looks back at the five biggest industry milestones of the 2010s, how they impacted investment in the sector and how they've shaped the advance of technology. The 2010s will be known for the advent of one of the most powerful technologies on the planet โ€“ Artificial Intelligence. Over the next decade, as more funding is made available for its development and it becomes more accepted by companies and consumers alike, it is worth reviewing some of the major milestones over the last decade that have made this advancement possible. The game is on, Watson: IBM's Jeopardy triumph The first major milestone of AI hitting the mainstream was when IBM's "super-computer" Watson beat long-standing Jeopardy champions Ken Jennings and Brad Rutter in 2011. Watson won the $1m TV game show with $77,147, leaving Jennings and Ruttner far behind at $24,000 and $21,600 respectively.


Playing Games with AI

#artificialintelligence

"The challenges of machine learning have long been tied to games as a testbed for computer intelligence." Jeopardy Champion Emma Boettcher's Master's paper on using text mining to predict how hard a Jeopardy clue might be didn't win her a title on its own, but it is an interesting thought experiment. Futurism's mission is to empower our readers and drive the development of transformative technologies towards maximizing human potential.



The Secret Farm Team for em Jeopardy! /em Players

Slate

As she met her fellow captains and competitors, all multiweek winners on the game show (including me), she was surprised how familiar everyone seemed to be with each other. Back in 2014, when she made her first appearance, "I didn't know a single person who had ever been on the show," Julia told me. But this time, she marveled, "everyone else seems to have known each other, either personally or by reputation, for decades." They shared years of experience on Jeopardy's secret farm team: quiz bowl. Of the 18 "All-Stars" in the tourney, all but Julia and two others had played the academic competition known as quiz bowl in high school or college.


Eric Trump got a 'Jeopardy!' question correct, but that didn't convince people he was smart

Mashable

Last night was full of surprises. Surprise number two: Eric Trump can successfully answer a Jeopardy! SEE ALSO: Just 13 very upsetting photos of Donald Trump Jr. Not only does this famously intelligent person get the answer correct (brother in law), he also answers the question in the form of a question. He goes on to add suggestive emoji of a fist punching the American flag.


Building Watson: An Overview of the DeepQA Project

AI Magazine

IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz show, Jeopardy. The extent of the challenge includes fielding a real-time automatic contestant on the show, not merely a laboratory exercise. The Jeopardy Challenge helped us address requirements that led to the design of the DeepQA architecture and the implementation of Watson. After three years of intense research and development by a core team of about 20 researchers, Watson is performing at human expert levels in terms of precision, confidence, and speed at the Jeopardy quiz show. Our results strongly suggest that DeepQA is an effective and extensible architecture that can be used as a foundation for combining, deploying, evaluating, and advancing a wide range of algorithmic techniques to rapidly advance the field of question answering (QA).


Can Watson, the Jeopardy champion, solve Parkinson's? Toronto Star

#artificialintelligence

Of course this is the Watson that was built by IBM to understand answers on Jeopardy and come up with the right questions. Since his appearance on the game show in 2011, IBM has expanded Watson's talents, building on the algorithms that allow him to read and derive meaning from natural language. And among other functions, IBM adapted Watson for use in medicine. Toronto Western, part of the University Health Network, is the first hospital in Canada to use Watson for research in Parkinson's, a neurological disorder. The centre has a track record of running clinical trials for off-label drug use, which means taking a drug approved for treatment of one condition and repurposing it for another.