AITopics | name1

Collaborating Authors

name1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

Chen, Xiaojun, Wang, Tianle, Qiu, Tianhao, Qin, Jianbin, Yang, Min

arXiv.org Artificial IntelligenceMay-4-2024

Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.

code llama, fine-tuning, sqlite sql table, (10 more...)

arXiv.org Artificial Intelligence

2405.06674

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

Huang, Yufei, Xiong, Deyi

arXiv.org Artificial IntelligenceJun-28-2023

Holistically measuring societal biases of large language models is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values. The curation process contains 4 essential steps: bias identification via extensive literature review, ambiguous context generation, AI-assisted disambiguous context generation, snd manual review \& recomposition. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. The dataset exhibits wide coverage and high diversity. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories. Additionally, we observe from our experiments that fine-tuned models could, to a certain extent, heed instructions and avoid generating outputs that are morally harmful in some types, in the way of "moral self-correction". Our dataset and results are publicly available at \href{https://github.com/YFHuangxxxx/CBBQ}{https://github.com/YFHuangxxxx/CBBQ}, offering debiasing research opportunities to a widened community.

large language model, machine learning, name1, (18 more...)

arXiv.org Artificial Intelligence

2306.16244

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Germany (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(36 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI

Tarunesh, Ishan, Aditya, Somak, Choudhury, Monojit

arXiv.org Artificial IntelligenceDec-4-2021

Natural Language Inference (NLI) is considered a representative task to test natural language understanding (NLU). In this work, we propose an extensible framework to collectively yet categorically test diverse Logical reasoning capabilities required for NLI (and by extension, NLU). Motivated by behavioral testing, we create a semi-synthetic large test-bench (363 templates, 363k examples) and an associated framework that offers following utilities: 1) individually test and analyze reasoning capabilities along 17 reasoning dimensions (including pragmatic reasoning), 2) design experiments to study cross-capability information content (leave one out or bring one in); and 3) the synthetic nature enable us to control for artifacts and biases. The inherited power of automated test case instantiation from free-form natural language templates (using CheckList), and a well-defined taxonomy of capabilities enable us to extend to (cognitively) harder test cases while varying the complexity of natural language. Through our analysis of state-of-the-art NLI systems, we observe that our benchmark is indeed hard (and non-trivial even with training on additional resources). Some capabilities stand out as harder. Further fine-grained analysis and fine-tuning experiments reveal more insights about these capabilities and the models -- supporting and extending previous observations. Towards the end we also perform an user-study, to investigate whether behavioral information can be utilised to generalize much better for some models compared to others.

machine learning, natural language, template, (20 more...)

arXiv.org Artificial Intelligence

2112.02333

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(17 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Trusting RoBERTa over BERT: Insights from CheckListing the Natural Language Inference Task

Tarunesh, Ishan, Aditya, Somak, Choudhury, Monojit

arXiv.org Artificial IntelligenceJul-15-2021

The recent state-of-the-art natural language understanding (NLU) systems often behave unpredictably, failing on simpler reasoning examples. Despite this, there has been limited focus on quantifying progress towards systems with more predictable behavior. We think that reasoning capability-wise behavioral summary is a step towards bridging this gap. We create a CheckList test-suite (184K examples) for the Natural Language Inference (NLI) task, a representative NLU task. We benchmark state-of-the-art NLI systems on this test-suite, which reveals fine-grained insights into the reasoning abilities of BERT and RoBERTa. Our analysis further reveals inconsistencies of the models on examples derived from the same template or distinct templates but pertaining to same reasoning capability, indicating that generalizing the models' behavior through observations made on a CheckList is non-trivial. Through an user-study, we find that users were able to utilize behavioral information to generalize much better for examples predicted from RoBERTa, compared to that of BERT.

name1, name2, template, (14 more...)

arXiv.org Artificial Intelligence

2107.07229

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia (0.05)
Asia > Russia (0.05)
(15 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)

Add feedback