AITopics | Baldwin, Timothy

Plotting

Baldwin, Timothy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Location Aware Modular Biencoder for Tourism Question Answering

Li, Haonan, Tomko, Martin, Baldwin, Timothy

arXiv.org Artificial IntelligenceJan-4-2024

Answering real-world tourism questions that seek Point-of-Interest (POI) recommendations is challenging, as it requires both spatial and non-spatial reasoning, over a large candidate pool. The traditional method of encoding each pair of question and POI becomes inefficient when the number of candidates increases, making it infeasible for real-world applications. To overcome this, we propose treating the QA task as a dense vector retrieval problem, where we encode questions and POIs separately and retrieve the most relevant POIs for a question by utilizing embedding space similarity. We use pretrained language models (PLMs) to encode textual information, and train a location encoder to capture spatial information of POIs. Experiments on a real-world tourism QA dataset demonstrate that our approach is effective, efficient, and outperforms previous methods across all metrics. Enabled by the dense retrieval architecture, we further build a global evaluation baseline, expanding the search space by 20 times compared to previous work. We also explore several factors that impact on the model's performance through follow-up experiments. Our code and model are publicly available at https://github.com/haonan-li/LAMB.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.02187

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry:

Consumer Products & Services > Restaurants (1.00)
Consumer Products & Services > Travel (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Psychometric Predictive Power of Large Language Models

Kuribayashi, Tatsuki, Oseki, Yohei, Baldwin, Timothy

arXiv.org Artificial IntelligenceNov-13-2023

Next-word probabilities from language models have been shown to successfully simulate human reading behavior. Building on this, we show that, interestingly, instruction-tuned large language models (LLMs) yield worse psychometric predictive power (PPP) for human reading behavior than base LLMs with equivalent perplexities. In other words, instruction tuning, which helps LLMs provide human-preferred responses, does not always make them human-like from the computational psycholinguistics perspective. In addition, we explore prompting methodologies in simulating human reading behavior with LLMs, showing that prompts reflecting a particular linguistic hypothesis lead LLMs to exhibit better PPP but are still worse than base LLMs. These highlight that recent instruction tuning and prompting do not offer better estimates than direct probability measurements from base LLMs in cognitive modeling.

large language model, natural language, psychometric predictive power, (1 more...)

arXiv.org Artificial Intelligence

2311.07484

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LM-Polygraph: Uncertainty Estimation for Language Models

Fadeeva, Ekaterina, Vashurin, Roman, Tsvigun, Akim, Vazhentsev, Artem, Petrakov, Sergey, Fedyanin, Kirill, Vasilev, Daniil, Goncharova, Elizaveta, Panchenko, Alexander, Panov, Maxim, Baldwin, Timothy, Shelmanov, Artem

arXiv.org Artificial IntelligenceNov-13-2023

Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields. However, a significant challenge arises as these models often "hallucinate", i.e., fabricate facts without providing users an apparent means to discern the veracity of their statements. Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of LLMs. However, to date, research on UE methods for LLMs has been focused primarily on theoretical rather than engineering contributions. In this work, we tackle this issue by introducing LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. Additionally, it introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores, empowering end-users to discern unreliable responses. LM-Polygraph is compatible with the most recent LLMs, including BLOOMz, LLaMA-2, ChatGPT, and GPT-4, and is designed to support future releases of similarly-styled LMs.

large language model, machine learning, uncertainty estimation, (4 more...)

arXiv.org Artificial Intelligence

2311.07383

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

Yang, Jinrui, Baldwin, Timothy, Cohn, Trevor

arXiv.org Artificial IntelligenceNov-3-2023

We present Multi-EuP, a new multilingual benchmark dataset, comprising 22K multi-lingual documents collected from the European Parliament, spanning 24 languages. This dataset is designed to investigate fairness in a multilingual information retrieval (IR) context to analyze both language and demographic bias in a ranking context. It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages, as well as cross-lingual relevance judgments. Furthermore, it offers rich demographic information associated with its documents, facilitating the study of demographic bias. We report the effectiveness of Multi-EuP for benchmarking both monolingual and multilingual IR. We also conduct a preliminary experiment on language bias caused by the choice of tokenization strategy.

artificial intelligence, information retrieval, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.0187

Country:

North America > United States (0.28)
Europe > Spain > Valencian Community (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government > Europe Government (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)

Add feedback

Unsupervised Lexical Simplification with Context Augmentation

Wada, Takashi, Baldwin, Timothy, Lau, Jey Han

arXiv.org Artificial IntelligenceNov-1-2023

We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models. Given a target word and its context, our method generates substitutes based on the target context and also additional contexts sampled from monolingual data. We conduct experiments in English, Portuguese, and Spanish on the TSAR-2022 shared task, and show that our model substantially outperforms other unsupervised systems across all languages. We also establish a new state-of-the-art by ensembling our model with GPT-3.5. Lastly, we evaluate our model on the SWORDS lexical substitution data set, achieving a state-of-the-art result.

large language model, machine learning, unsupervised lexical simplification, (5 more...)

arXiv.org Artificial Intelligence

2311.0031

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks

Huang, Yichen, Baldwin, Timothy

arXiv.org Artificial IntelligenceNov-1-2023

We investigate MT evaluation metric performance on adversarially-synthesized texts, to shed light on metric robustness. We experiment with word- and character-level attacks on three popular machine translation metrics: BERTScore, BLEURT, and COMET. Our human experiments validate that automatic metrics tend to overpenalize adversarially-degraded translations. We also identify inconsistencies in BERTScore ratings, where it judges the original sentence and the adversarially-degraded one as similar, while judging the degraded translation as notably worse than the original with respect to the reference. We identify patterns of brittleness that motivate more robust metric development.

adversarial attack, automatic machine translation metric, robustness test

arXiv.org Artificial Intelligence

2311.00508

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (0.40)
Government > Military (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)
Information Technology > Security & Privacy (0.40)

Add feedback

Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU

Koto, Fajri, Aisyah, Nurul, Li, Haonan, Baldwin, Timothy

arXiv.org Artificial IntelligenceOct-21-2023

Although large language models (LLMs) are often pre-trained on large-scale multilingual texts, their reasoning abilities and real-world knowledge are mainly evaluated based on English datasets. Assessing LLM capabilities beyond English is increasingly vital but hindered due to the lack of suitable datasets. In this work, we introduce IndoMMLU, the first multi-task language understanding benchmark for Indonesian culture and languages, which consists of questions from primary school to university entrance exams in Indonesia. By employing professional teachers, we obtain 14,981 questions across 64 tasks and education levels, with 46% of the questions focusing on assessing proficiency in the Indonesian language and knowledge of nine local languages and cultures in Indonesia. Our empirical evaluations show that GPT-3.5 only manages to pass the Indonesian primary school level, with limited knowledge of local Indonesian languages and culture. Other smaller models such as BLOOMZ and Falcon perform at even lower levels.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2310.04928

Country:

Asia > Indonesia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > K-12 Education > Primary School (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Bactrian-X: Multilingual Replicable Instruction-Following Models with Low-Rank Adaptation

Li, Haonan, Koto, Fajri, Wu, Minghao, Aji, Alham Fikri, Baldwin, Timothy

arXiv.org Artificial IntelligenceOct-10-2023

Instruction tuning has shown great promise in improving the performance of large language models. However, research on multilingual instruction tuning has been limited due to the scarcity of high-quality instruction-response datasets across different languages. To bridge this gap, we present Bactrian-X, a comprehensive multilingual parallel dataset of 3.4 million instruction-response pairs across 52 languages. Leveraging this dataset, we train a set of adapters using low-rank adaptation (LoRA), which are lightweight components that seamlessly integrate with large language models. These adapters have a substantially lower parameter count than the base model, making them easily replaceable and usable as plug-ins for different languages or language groups. Extensive experiments in various multilingual evaluation settings demonstrate that models derived from LoRA-based training over Bactrian-X outperform both the vanilla models and existing instruction-tuned models. The code and models are publicly available at https://github.com/mbzuai-nlp/bactrian-x

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2305.15011

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Factuality Challenges in the Era of Large Language Models

Augenstein, Isabelle, Baldwin, Timothy, Cha, Meeyoung, Chakraborty, Tanmoy, Ciampaglia, Giovanni Luca, Corney, David, DiResta, Renee, Ferrara, Emilio, Hale, Scott, Halevy, Alon, Hovy, Eduard, Ji, Heng, Menczer, Filippo, Miguez, Ruben, Nakov, Preslav, Scheufele, Dietram, Sharma, Shivam, Zagni, Giovanni

arXiv.org Artificial IntelligenceOct-9-2023

The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention. These incredibly useful, natural-sounding tools mark significant advances in natural language generation, yet they exhibit a propensity to generate false, erroneous, or misleading content -- commonly referred to as "hallucinations." Moreover, LLMs can be exploited for malicious applications, such as generating false but credible-sounding content and profiles at scale. This poses a significant challenge to society in terms of the potential deception of users and the increasing dissemination of inaccurate information. In light of these risks, we explore the kinds of technological innovations, regulatory reforms, and AI literacy initiatives needed from fact-checkers, news organizations, and the broader research and policy communities. By identifying the risks, the imminent threats, and some viable solutions, we seek to shed light on navigating various aspects of veracity in the era of generative AI.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2310.05189

Country:

Asia (1.00)
Europe > United Kingdom (0.68)
North America > United States > Wisconsin > Dane County > Madison (0.14)
(4 more...)

Genre: Research Report (1.00)

Industry:

Media > News (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

Sengupta, Neha, Sahu, Sunil Kumar, Jia, Bokang, Katipomu, Satheesh, Li, Haonan, Koto, Fajri, Marshall, William, Gosal, Gurpreet, Liu, Cynthia, Chen, Zhiming, Afzal, Osama Mohammed, Kamboj, Samta, Pandit, Onkar, Pal, Rahul, Pradhan, Lalit, Mujahid, Zain Muhammad, Baali, Massa, Han, Xudong, Bsharat, Sondos Mahmoud, Aji, Alham Fikri, Shen, Zhiqiang, Liu, Zhengzhong, Vassilieva, Natalia, Hestness, Joel, Hock, Andy, Feldman, Andrew, Lee, Jonathan, Jackson, Andrew, Ren, Hector Xuguang, Nakov, Preslav, Baldwin, Timothy, Xing, Eric

arXiv.org Artificial IntelligenceSep-29-2023

We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code in various programming languages. With 13 billion parameters, they demonstrate better knowledge and reasoning capabilities in Arabic than any existing open Arabic and multilingual models by a sizable margin, based on extensive evaluation. Moreover, the models are competitive in English compared to English-centric open models of similar size, despite being trained on much less English data. We provide a detailed description of the training, the tuning, the safety alignment, and the evaluation of the models. We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs. Available at https://huggingface.co/inception-mbzuai/jais-13b-chat

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.16149

Country:

Europe (1.00)
Asia > Middle East > UAE (1.00)
Africa (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback