Goto

Collaborating Authors

 kiwis


DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module

Sharma, Krish, Barman, Niyar R, Chaturvedi, Akshay, Asher, Nicholas

arXiv.org Artificial Intelligence

We look at reasoning on GSM8k, a dataset of short texts presenting primary school, math problems. We find, with Mirzadeh et al. (2024), that current LLM progress on the data set may not be explained by better reasoning but by exposure to a broader pretraining data distribution. We then introduce a novel information source for helping models with less data or inferior training reason better: discourse structure. We show that discourse structure improves performance for models like Llama2 13b by up to 160%. Even for models that have most likely memorized the data set, adding discourse structural information to the model still improves predictions and dramatically improves large model performance on out of distribution examples.


A Semantic Parsing Algorithm to Solve Linear Ordering Problems

Alkhairy, Maha, Homer, Vincent, O'Connor, Brendan

arXiv.org Artificial Intelligence

We develop an algorithm to semantically parse linear ordering problems, which require a model to arrange entities using deductive reasoning. Our method takes as input a number of premises and candidate statements, parsing them to a first-order logic of an ordering domain, and then utilizes constraint logic programming to infer the truth of proposed statements about the ordering. Our semantic parser transforms Heim and Kratzer's syntax-based compositional formal semantic rules to a computational algorithm. This transformation involves introducing abstract types and templates based on their rules, and introduces a dynamic component to interpret entities within a contextual framework. Our symbolic system, the Formal Semantic Logic Inferer (FSLI), is applied to answer multiple choice questions in BIG-bench's logical_deduction multiple choice problems, achieving perfect accuracy, compared to 67.06% for the best-performing LLM (GPT-4) and 87.63% for the hybrid system Logic-LM. These promising results demonstrate the benefit of developing a semantic parsing algorithm driven by first-order logic constructs.


GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Mirzadeh, Iman, Alizadeh, Keivan, Shahrokhi, Hooman, Tuzel, Oncel, Bengio, Samy, Farajtabar, Mehrdad

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics. To address these concerns, we conduct a large-scale study on several SOTA open and closed models. To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions. GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of models.Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question. Specifically, the performance of all models declines when only the numerical values in the question are altered in the GSM-Symbolic benchmark. Furthermore, we investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data. Adding a single clause that seems relevant to the question causes significant performance drops (up to 65%) across all state-of-the-art models, even though the clause doesn't contribute to the reasoning chain needed for the final answer. Overall, our work offers a more nuanced understanding of LLMs' capabilities and limitations in mathematical reasoning.


AgNext, NAFED apply AI-based traceability in Arunachal's kiwi supply chain - Agriculture Post

#artificialintelligence

AgNext Technologies, a leading agritech company, in association with NAFED and Arunachal Pradesh Agriculture Marketing Board (APAMB) is using Artificial Intelligence (AI)-based rapid quality assessment of organic kiwi from the eastern Himalayan state. Through its quality monitoring platform'Qualix', AgNext has also enabled deep-tech based comprehensive traceability. Using QR code mapping, the end-consumers can trace all the steps in the supply chain down to the origin. These organic kiwis were launched in New Delhi last week by APAMB along with the National Agricultural Cooperative Marketing Federation of India (NAFED) as the marketing partner, under the'One District One Product' scheme. Speaking on this partnership, Taranjeet Singh Bhamra, CEO & Founder, AgNext said, "It is an honour to work with NAFED on India's first AI-based initiative to bring complete quality and traceability management solutions for Arunachal's renowned kiwi trade. To date, over 6000 kgs of kiwi have been assessed and moved through our'Qualix' platform, with proven benefits for the farmers, NAFED and the end-consumers. The impact of facilitating quality-based and traceability-enabled trade for kiwis will not only improve domestic demand but also help in boosting export potential of the fruit."


Artificial Intelligence helping Kiwis every day: NZTech

#artificialintelligence

NZTech CEO Graeme Muller says New Zealanders are engaging with artificial intelligence (AI) everyday, probably without even realising it. Muller made the comments on the eve of the launch of the New Zealand AI Forum in Wellington, describing AI as "the fastest growing impactful technology spreading the globe." The forum was announced in February with further details being provided in late May, including the naming of Stu Christie, investment manager at the New Zealand Venture Investment Fund, as its chairman. NZTech says dozens of New Zealand's leading tech companies are joining the forum, which has been initiated via a collaboration between NZTech, the government and AI tech leaders. According to Muller, one of the most recent examples of AI in New Zealand is the chat function on the Air New Zealand website that helps with ticket bookings: users of that function are chatting with AI not a human, Muller said.


Machine Learning: The Bigger Picture, Part II - DZone Big Data

#artificialintelligence

This is the second part of Tamis van der Laan's article featured in the new DZone Guide to Big Data Processing, Volume III. Get your free copy for more insightful articles, industry statistics, and more. So far we have assumed we only have a machine learning model, a training set of samples, and a optimization algorithm to learn from these examples. The next thing we will talk about is the problem of overfitting. If we take our example of a discriminative classifier, we see that it splits the space into two distinct regions for each class.