Goto

Collaborating Authors

 Tuvalu


A Appendix

Neural Information Processing Systems

The complete list may be seen in Table 8. Here are a few general notes about these strings: 1. Based on their recommendations, we did the following: 1. zh, zh_Latn: This resulted in the special filters described below. URLs) the corpora were in languages different from the LangID predictions. This is mainly mis-rendered PDFs and may have practical applications for denoising, or for decoding such garbled PDFs.


Belgian police arrest three for plotting drone attack on prime minister

Al Jazeera

Belgian authorities say they have arrested three people in connection with a plot to attack Prime Minister Bart De Wever and other politicians using drone-mounted explosives. Federal prosecutor Ann Fransen announced the arrests on Thursday and said the group were under investigation for an "attempted terrorist murder and participation in the activities of a terrorist group", according to Belgian public broadcaster RTBF. "There are also indications that the suspects aimed to construct a drone to which a payload could be attached," she added. Fransen did not name their intended targets, but social media posts from senior figures in De Wever's government indicate that he was on the list. "The news of a planned attack targeting Prime Minister Bart De Wever is deeply shocking," wrote Deputy Prime Minister Maxime Prevot in a post on X. "I express my full support to the Prime Minister, his wife, and his family, as well as my gratitude to the security and justice services whose swift action prevented the worst."


A Appendix A.1 LangID Details

Neural Information Processing Systems

The complete list may be seen in Table 8. Here are a few general notes about these strings: 1. Based on their recommendations, we did the following: 1. zh, zh_Latn: This resulted in the special filters described below. URLs) the corpora were in languages different from the LangID predictions. This is mainly mis-rendered PDFs and may have practical applications for denoising, or for decoding such garbled PDFs.


Hell is not other people – it's being stuck in the ninth circle of an automated telephone service Hilary Freeman

The Guardian

Life is about to change on the remote island nation of Tuvalu. To great fanfare, Tuvalu – an entirely cash-based society – has unveiled its first ever ATM, marking its move towards financial modernisation. But while the 10,000 people living in that country may be celebrating no longer having to queue at the bank, I fear their happiness will be short-lived. The world's first ATM was introduced in Britain in 1967, but for me the tyranny of machines that promise convenience but erode human contact really began about 20 years ago, in the form of self-checkouts in our local Sainsbury's. Having watched the Terminator movie franchise during my formative years, I railed prophetically against them, aware that it was just a small slippery slope from "unexpected item in the bagging area" to the extinction of the human race.


An Expanded Massive Multilingual Dataset for High-Performance Language Technologies

arXiv.org Artificial Intelligence

Training state-of-the-art large language models requires vast amounts of clean and diverse textual data. However, building suitable multilingual datasets remains a challenge. In this work, we present HPLT v2, a collection of high-quality multilingual monolingual and parallel corpora. The monolingual portion of the data contains 8T tokens covering 193 languages, while the parallel data contains 380M sentence pairs covering 51 languages. We document the entire data pipeline and release the code to reproduce it. We provide extensive analysis of the quality and characteristics of our data. Finally, we evaluate the performance of language models and machine translation systems trained on HPLT v2, demonstrating its value.


CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries

arXiv.org Artificial Intelligence

Vision-language models (VLMs) have advanced human-AI interaction but struggle with cultural understanding, often misinterpreting symbols, gestures, and artifacts due to biases in predominantly Western-centric training data. In this paper, we construct CultureVerse, a large-scale multimodal benchmark covering 19, 682 cultural concepts, 188 countries/regions, 15 cultural concepts, and 3 question types, with the aim of characterizing and improving VLMs' multicultural understanding capabilities. Then, we propose CultureVLM, a series of VLMs fine-tuned on our dataset to achieve significant performance improvement in cultural understanding. Our evaluation of 16 models reveals significant disparities, with a stronger performance in Western concepts and weaker results in African and Asian contexts. Fine-tuning on our CultureVerse enhances cultural perception, demonstrating cross-cultural, cross-continent, and cross-dataset generalization without sacrificing performance on models' general VLM benchmarks. We further present insights on cultural generalization and forgetting. We hope that this work could lay the foundation for more equitable and culturally aware multimodal AI systems.


Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains

arXiv.org Artificial Intelligence

Hate speech relies heavily on cultural influences, leading to varying individual interpretations. For that reason, we propose a Semantic Componential Analysis (SCA) framework for a cross-cultural and cross-domain analysis of hate speech definitions. We create the first dataset of definitions derived from five domains: online dictionaries, research papers, Wikipedia articles, legislation, and online platforms, which are later analyzed into semantic components. Our analysis reveals that the components differ from definition to definition, yet many domains borrow definitions from one another without taking into account the target culture. We conduct zero-shot model experiments using our proposed dataset, employing three popular open-sourced LLMs to understand the impact of different definitions on hate speech detection. Our findings indicate that LLMs are sensitive to definitions: responses for hate speech detection change according to the complexity of definitions used in the prompt.


How AI Is Being Used to Respond to Natural Disasters in Cities

TIME - Tech

The number of people living in urban areas has tripled in the last 50 years, meaning when a major natural disaster such as an earthquake strikes a city, more lives are in danger. Meanwhile, the strength and frequency of extreme weather events has increased--a trend set to continue as the climate warms. That is spurring efforts around the world to develop a new generation of earthquake monitoring and climate forecasting systems to make detecting and responding to disasters quicker, cheaper, and more accurate than ever. On Nov. 6, at the Barcelona Supercomputing Center in Spain, the Global Initiative on Resilience to Natural Hazards through AI Solutions will meet for the first time. The new United Nations initiative aims to guide governments, organizations, and communities in using AI for disaster management.


Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic

arXiv.org Artificial Intelligence

Formal logic has long been applied to natural language reasoning, but this approach can sometimes lead to conclusions that, while logically entailed, are factually inconsistent with the premises or are not typically inferred by humans. This study introduces the concept of "rulebreakers", which refers to instances where logical entailment diverges from factually acceptable inference. We present RULEBREAKERS, a novel dataset for evaluating Large Language Models' (LLMs) ability to distinguish between rulebreakers and non-rulebreakers. Focusing on modus tollens and disjunctive syllogism, we assess six state-of-the-art LLMs using RULEBREAKERS, measuring their performance in terms of token-level exact accuracy and model confidence. Our findings reveal that while most models perform poorly to moderately in recognizing rulebreakers, they demonstrate a latent ability to distinguish rulebreakers when assessed by their confidence levels. Further analysis suggests that the failure to recognize rulebreakers is potentially associated with the models' world knowledge and their attention distribution patterns. This research highlights the limitation of LLMs' reasoning capabilities, and contributes to the ongoing discussion on reasoning in LLMs.


MIRAI: Evaluating LLM Agents for Event Forecasting

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.