Goto

Collaborating Authors

 Indian Ocean


U.S. Strikes Iran-Linked Facility in Syria in Round of Retaliation

NYT > Middle East

For the second time in nearly two weeks, the United States carried out airstrikes against a facility used by Iran's Islamic Revolutionary Guards Corps and its proxies in eastern Syria early Thursday, ratcheting up retaliation for a steady stream of rocket and drone attacks against American forces in Iraq and Syria. The strikes by two Air Force F-15E jets against a weapons warehouse in Deir al Zour Province, Syria, came after U.S. airstrikes on Oct. 27 against similar targets in eastern Syria failed to deter Iran or its proxies in Syria and Iraq, which the Biden administration has blamed for the attacks. Not only have the attacks continued -- there have been at least 22 more since the American retaliatory strikes last month -- but Pentagon officials said they have become more dangerous. Iran-backed militias have packed even larger loads of explosives -- more than 80 pounds -- onto drones launched at American bases, U.S. officials said. "This precision self-defense strike is a response to a series of attacks against U.S. personnel in Iraq and Syria by I.R.G.C.-Quds Force affiliates," Defense Secretary Lloyd J. Austin III said in a statement.


Houthi Rebels Shot Down a U.S. Drone Off Yemen's Coast, Pentagon Says

NYT > Middle East

A U.S. military surveillance drone was shot down off the coast of Yemen on Wednesday by Iran-backed Houthi rebels, the Pentagon said. Pentagon officials, speaking on the condition of anonymity to discuss operational matters, confirmed that the drone, an MQ-9 Reaper, had been shot down. But they would not say if the aircraft was armed, where it was flying from or other details. The downing of a Reaper drone, the mainstay of the American military's aerial surveillance fleet, was the latest escalation of violence between the United States and Iran-backed groups in Yemen, Iraq and Syria. The episodes have underscored the risks that the conflict between Israel and the Palestinian group Hamas could spiral into a wider war.


M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models

arXiv.org Artificial Intelligence

Despite the existence of various benchmarks for evaluating natural language processing models, we argue that human exams are a more suitable means of evaluating general intelligence for large language models (LLMs), as they inherently demand a much wider range of abilities such as language understanding, domain knowledge, and problem-solving skills. To this end, we introduce M3Exam, a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. M3Exam exhibits three unique characteristics: (1) multilingualism, encompassing questions from multiple countries that require strong multilingual proficiency and cultural knowledge; (2) multimodality, accounting for the multimodal nature of many exam questions to test the model's multimodal understanding capability; and (3) multilevel structure, featuring exams from three critical educational periods to comprehensively assess a model's proficiency at different levels. In total, M3Exam contains 12,317 questions in 9 diverse languages with three educational levels, where about 23\% of the questions require processing images for successful solving. We assess the performance of top-performing LLMs on M3Exam and find that current models, including GPT-4, still struggle with multilingual text, particularly in low-resource and non-Latin script languages. Multimodal LLMs also perform poorly with complex multimodal questions. We believe that M3Exam can be a valuable resource for comprehensively evaluating LLMs by examining their multilingual and multimodal abilities and tracking their development. Data and evaluation code is available at \url{https://github.com/DAMO-NLP-SG/M3Exam}.


Responsible Emergent Multi-Agent Behavior

arXiv.org Artificial Intelligence

Responsible AI has risen to the forefront of the AI research community. As neural network-based learning algorithms continue to permeate real-world applications, the field of Responsible AI has played a large role in ensuring that such systems maintain a high-level of human-compatibility. Despite this progress, the state of the art in Responsible AI has ignored one crucial point: human problems are multi-agent problems. Predominant approaches largely consider the performance of a single AI system in isolation, but human problems are, by their very nature, multi-agent. From driving in traffic to negotiating economic policy, human problem-solving involves interaction and the interplay of the actions and motives of multiple individuals. This dissertation develops the study of responsible emergent multi-agent behavior, illustrating how researchers and practitioners can better understand and shape multi-agent learning with respect to three pillars of Responsible AI: interpretability, fairness, and robustness. First, I investigate multi-agent interpretability, presenting novel techniques for understanding emergent multi-agent behavior at multiple levels of granularity. With respect to low-level interpretability, I examine the extent to which implicit communication emerges as an aid to coordination in multi-agent populations. I introduce a novel curriculum-driven method for learning high-performing policies in difficult, sparse reward environments and show through a measure of position-based social influence that multi-agent teams that learn sophisticated coordination strategies exchange significantly more information through implicit signals than lesser-coordinated agents. Then, at a high-level, I study concept-based interpretability in the context of multi-agent learning. I propose a novel method for learning intrinsically interpretable, concept-based policies and show that it enables...


Yemen's Houthi Militia Says It Launched Missiles and Drones Toward Israel

NYT > Middle East

Yemen's Houthi militia claimed an attempted attack on southern Israel on Tuesday, saying it had launched a "large batch" of ballistic and cruise missiles as well as drones toward Israeli targets. The Iran-backed militia carried out the attempted assault in response to what it called "brutal Israeli-American aggression" in Gaza, the Houthi military spokesman, Yahya Sarea, said on the social media platform X. Mr. Sarea said the attack was the third operation conducted by the Houthis "in support of our persecuted brothers in Palestine," and threatened further missile and drone assaults. The Times could not independently verify the Houthi claims. On Tuesday, the Israeli military said its aerial defense system had intercepted a surface-to-surface missile fired toward Israel "from the area of the Red Sea."


A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

arXiv.org Artificial Intelligence

The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.


Forecasting Tropical Cyclones with Cascaded Diffusion Models

arXiv.org Artificial Intelligence

As cyclones become more intense due to climate change, the rise of AI-based modelling provides a more affordable and accessible approach compared to traditional methods based on mathematical models. This work leverages diffusion models to forecast cyclone trajectories and precipitation patterns by integrating satellite imaging, remote sensing, and atmospheric data, employing a cascaded approach that incorporates forecasting, super-resolution, and precipitation modelling, with training on a dataset of 51 cyclones from six major basins. Experiments demonstrate that the final forecasts from the cascaded models show accurate predictions up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks. This work also highlights the promising efficiency of AI methods such as diffusion models for high-performance needs, such as cyclone forecasting, while remaining computationally affordable, making them ideal for highly vulnerable regions with critical forecasting needs and financial limitations.


Disentangling Structure and Style: Political Bias Detection in News by Inducing Document Hierarchy

arXiv.org Artificial Intelligence

We address an important gap in detecting political bias in news articles. Previous works that perform document classification can be influenced by the writing style of each news outlet, leading to overfitting and limited generalizability. Our approach overcomes this limitation by considering both the sentence-level semantics and the document-level rhetorical structure, resulting in a more robust and style-agnostic approach to detecting political bias in news articles. We introduce a novel multi-head hierarchical attention model that effectively encodes the structure of long documents through a diverse ensemble of attention heads. While journalism follows a formalized rhetorical structure, the writing style may vary by news outlet. We demonstrate that our method overcomes this domain dependency and outperforms previous approaches for robustness and accuracy. Further analysis and human evaluation demonstrate the ability of our model to capture common discourse structures in journalism. Our code is available at: https://github.com/xfactlab/emnlp2023-Document-Hierarchy


U.S. Shoots Down Several Missiles and Drones Launched From Yemen

NYT > Middle East

A U.S. Navy warship in the northern Red Sea on Thursday shot down three cruise missiles and several drones launched from Yemen that the Pentagon said might have been headed toward Israel. "We cannot say for certain what these missiles and drones were targeting, but they were launched from Yemen heading north along the Red Sea, potentially towards targets in Israel," Brig. Gen. Patrick Ryder, the Pentagon spokesman, told reporters. The missiles and drones were launched by pro-Iranian Houthi rebels in Yemen amid a flurry of drone attacks against American troops in Iraq and Syria over the past three days, General Ryder said. The incidents underscored the risks that the conflict between Israel and the Palestinian group Hamas could spiral into a wider war.


Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification

arXiv.org Artificial Intelligence

Automatic Arabic Dialect Identification (ADI) of text has gained great popularity since it was introduced in the early 2010s. Multiple datasets were developed, and yearly shared tasks have been running since 2018. However, ADI systems are reported to fail in distinguishing between the micro-dialects of Arabic. We argue that the currently adopted framing of the ADI task as a single-label classification problem is one of the main reasons for that. We highlight the limitation of the incompleteness of the Dialect labels and demonstrate how it impacts the evaluation of ADI systems. A manual error analysis for the predictions of an ADI, performed by 7 native speakers of different Arabic dialects, revealed that $\approx$ 66% of the validated errors are not true errors. Consequently, we propose framing ADI as a multi-label classification task and give recommendations for designing new ADI datasets.