winograd schema
Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning
Commonsense reasoning is one of the important aspect of natural language understanding, with several benchmarks developed to evaluate it. However, only a few of these benchmarks are available in languages other than English. Developing parallel benchmarks facilitates cross-lingual evaluation, enabling a better understanding of different languages. This research introduces a collection of Winograd Schemas in Thai, a novel dataset designed to evaluate commonsense reasoning capabilities in the context of the Thai language. Through a methodology involving native speakers, professional translators, and thorough validation, the schemas aim to closely reflect Thai language nuances, idioms, and cultural references while maintaining ambiguity and commonsense challenges. We evaluate the performance of popular large language models on this benchmark, revealing their strengths, limitations, and providing insights into the current state-of-the-art. Results indicate that while models like GPT-4 and Claude-3-Opus achieve high accuracy in English, their performance significantly drops in Thai, highlighting the need for further advancements in multilingual commonsense reasoning.
- North America > United States > New York (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)
A Human-Machine Collaboration Framework for the Development of Schemas
The Winograd Schema Challenge (WSC), a seemingly well-thought-out test for machine intelligence, has been proposed to shed light on developing systems that exhibit human behavior. Since its introduction, it aimed to pivot the focus of the AI community from the technology to the science of AI. While common and trivial for humans, studies show that it is still challenging for machines, especially when they have to deal with novel schemas, that is, well-designed sentences that require the resolving of definite pronouns. As researchers have become increasingly interested in the challenge itself, this presumably necessitates the availability of an extensive collection of Winograd schemas, which goes beyond what human experts can reasonably develop themselves, especially after proposed ways of utilizing them as novel forms of CAPTCHAs. To address this necessity, we propose a novel framework that explicitly focuses on how humans and machines can collaborate as teammates to design novel schemas from scratch. This is being accomplished by combining two recent studies from the literature: i) Winventor, a machine-driven approach for the development of large amounts of Winograd schemas, albeit not of high quality, and ii) WinoFlexi, an online crowdsourcing system that allows crowd workers to develop a limited number of schemas often of similar quality to that of experts. Our proposal crafts a new road map toward developing a novel collaborative platform that amplifies human and machine intelligence by combining their complementary strengths.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)
- Europe > Middle East > Cyprus (0.04)
Developments in Sheaf-Theoretic Models of Natural Language Ambiguities
Lo, Kin Ian, Sadrzadeh, Mehrnoosh, Mansfield, Shane
Sheaves are mathematical objects consisting of a base which constitutes a topological space and the data associated with each open set thereof, e.g. continuous functions defined on the open sets. Sheaves have originally been used in algebraic topology and logic. Recently, they have also modelled events such as physical experiments and natural language disambiguation processes. We extend the latter models from lexical ambiguities to discourse ambiguities arising from anaphora. To begin, we calculated a new measure of contextuality for a dataset of basic anaphoric discourses, resulting in a higher proportion of contextual models--82.9%--compared to previous work which only yielded 3.17% contextual models. Then, we show how an extension of the natural language processing challenge, known as the Winograd Schema, which involves anaphoric ambiguities can be modelled on the Bell-CHSH scenario with a contextual fraction of 0.096.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Netherlands (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Generalised Winograd Schema and its Contextuality
Lo, Kin Ian, Sadrzadeh, Mehrnoosh, Mansfield, Shane
Ambiguities in natural language give rise to probability distributions over interpretations. The distributions are often over multiple ambiguous words at a time; a multiplicity which makes them a suitable topic for sheaf-theoretic models of quantum contextuality. Previous research showed that different quantitative measures of contextuality correlate well with Psycholinguistic research on lexical ambiguities. In this work, we focus on coreference ambiguities and investigate the Winograd Schema Challenge (WSC), a test proposed by Levesque in 2011 to evaluate the intelligence of machines. The WSC consists of a collection of multiple-choice questions that require disambiguating pronouns in sentences structured according to the Winograd schema, in a way that makes it difficult for machines to determine the correct referents but remains intuitive for human comprehension. In this study, we propose an approach that analogously models the Winograd schema as an experiment in quantum physics. However, we argue that the original Winograd Schema is inherently too simplistic to facilitate contextuality. We introduce a novel mechanism for generalising the schema, rendering it analogous to a Bell-CHSH measurement scenario. We report an instance of this generalised schema, complemented by the human judgements we gathered via a crowdsourcing platform. The resulting model violates the Bell-CHSH inequality by 0.192, thus exhibiting contextuality in a coreference resolution setting.
- North America > United States > Indiana (0.04)
- North America > Dominican Republic (0.04)
- North America > Canada (0.04)
- (4 more...)
So, Can a Computer Really Be Irrational?
In a recent episode at Mind Matters News podcasting, "Can a computer be a person?" Wesley J. Smith: Let me ask the question in a different way. Can an AI ever be irrational? A classic example, and this happened a number of years ago, was that the Soviets during the Cold War developed a high technology to decide whether the US was being attacked by… I'm sorry, whether the Soviet Union was being attacked by the United States. And so they had these missile detectors.
- Europe > Russia (0.27)
- Asia > Russia (0.27)
- North America > United States (0.26)
Levesque
In this paper, we present an alternative to the Turing Test that has some conceptual and practical advantages. A Winograd schema is a pair of sentences that differ only in one or two words and that contain a referential ambiguity that is resolved in opposite directions in the two sentences. We have compiled a collection of Winograd schemas, designed so that the correct answer is obvious to the human reader, but cannot easily be found using selectional restrictions or statistical techniques over text corpora. A contestant in the Winograd Schema Challenge is presented with a collection of one sentence from each pair, and required to achieve human-level accuracy in choosing the correct disambiguation.
What Does It Mean for AI to Understand?
Remember IBM's Watson, the AI Jeopardy! A 2010 promotion proclaimed, "Watson understands natural language with all its ambiguity and complexity." However, as we saw when Watson subsequently failed spectacularly in its quest to "revolutionize medicine with artificial intelligence," a veneer of linguistic facility is not the same as actually comprehending human language. Natural language understanding has long been a major goal of AI research. At first, researchers tried to manually program everything a machine would need to make sense of news stories, fiction or anything else humans might write.
- Information Technology (0.50)
- Leisure & Entertainment (0.30)
Tackling Domain-Specific Winograd Schemas with Knowledge-Based Reasoning and Machine Learning
Hong, Suk Joon, Bennett, Brandon
The Winograd Schema Challenge (WSC) is a common-sense reasoning task that requires background knowledge. In this paper, we contribute to tackling WSC in four ways. Firstly, we suggest a keyword method to define a restricted domain where distinctive high-level semantic patterns can be found. A thanking domain was defined by key-words, and the data set in this domain is used in our experiments. Secondly, we develop a high-level knowledge-based reasoning method using semantic roles which is based on the method of Sharma [2019]. Thirdly, we propose an ensemble method to combine knowledge-based reasoning and machine learning which shows the best performance in our experiments. As a machine learning method, we used Bidirectional Encoder Representations from Transformers (BERT) [Kocijan et al., 2019]. Lastly, in terms of evaluation, we suggest a "robust" accuracy measurement by modifying that of Trichelair et al. [2018]. As with their switching method, we evaluate a model by considering its performance on trivial variants of each sentence in the test set.
- North America > United States (0.04)
- Europe > Italy (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Turing Test: Why It Still Matters - Liwaiwai
And as AI programs gets better and better at acting like humans, we will increasingly be faced with the question of whether there's really anything that special about our own intelligence, or if we are just machines of a different kind. Could everything we know and do one day be reproduced by a complicated enough computer program installed in a complicated enough robot? In 1950, computer pioneer and wartime codebreaker Alan Turing made one of the most influential attempts to tackle this issue. In a landmark paper, he suggested that the vagueness could be taken out of the question of human and machine intelligence with a simple test. This "Turing Test" assesses the ability of a computer to mimic a human, as judged by another human who could not see the machine but could ask it written questions.
Turing Test: why it still matters
And as AI programs gets better and better at acting like humans, we will increasingly be faced with the question of whether there's really anything that special about our own intelligence, or if we are just machines of a different kind. Could everything we know and do one day be reproduced by a complicated enough computer program installed in a complicated enough robot? In 1950, computer pioneer and wartime codebreaker Alan Turing made one of the most influential attempts to tackle this issue. In a landmark paper, he suggested that the vagueness could be taken out of the question of human and machine intelligence with a simple test. This "Turing Test" assesses the ability of a computer to mimic a human, as judged by another human who could not see the machine but could ask it written questions.