AITopics | Song, Yueqi

Collaborating Authors

Song, Yueqi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Cahyawijaya, Samuel, Lovenia, Holy, Moniz, Joel Ruben Antony, Wong, Tack Hwa, Farhansyah, Mohammad Rifqi, Maung, Thant Thiri, Hudi, Frederikus, Anugraha, David, Habibi, Muhammad Ravi Shulthan, Qorib, Muhammad Reza, Agarwal, Amit, Imperial, Joseph Marvin, Patel, Hitesh Laxmichand, Feliren, Vicky, Nasution, Bahrul Ilmi, Rufino, Manuel Antonio, Winata, Genta Indra, Rajagede, Rian Adam, Catalan, Carlos Rafael, Imam, Mohamed Fazli, Pattnayak, Priyaranjan, Pranida, Salsabila Zahirah, Pratama, Kevin, Bangera, Yeshil, Na-Thalang, Adisai, Monderin, Patricia Nicole, Song, Yueqi, Simon, Christian, Ng, Lynnette Hui Xian, Sapan, Richardy Lobo', Rafi, Taki Hasan, Wang, Bin, Supryadi, null, Veerakanjana, Kanyakorn, Ittichaiwong, Piyalitt, Roque, Matthew Theodore, Vincentio, Karissa, Kreangphet, Takdanai, Artkaew, Phakphum, Palgunadi, Kadek Hendrawan, Yu, Yanzhi, Hastuti, Rochana Prih, Nixon, William, Bangera, Mithil, Lim, Adrian Xuan Wei, Khine, Aye Hninn, Zhafran, Hanif Muhammad, Ferdinan, Teddy, Izzani, Audra Aurora, Singh, Ayushman, Evan, null, Krito, Jauza Akbar, Anugraha, Michael, Ilasariya, Fenal Ashokbhai, Li, Haochen, Daniswara, John Amadeo, Tjiaranata, Filbert Aurelian, Yulianrifat, Eryawan Presma, Udomcharoenchaikit, Can, Ansori, Fadil Risdian, Ihsani, Mahardika Krisna, Nguyen, Giang, Barik, Anab Maulana, Velasco, Dan John, Genadi, Rifo Ahmad, Saha, Saptarshi, Wei, Chengwei, Flores, Isaiah, Chen, Kenneth Ko Han, Santos, Anjela Gail, Lim, Wan Shen, Phyo, Kaung Si, Santos, Tim, Dwiastuti, Meisyarah, Luo, Jiayun, Cruz, Jan Christian Blaise, Hee, Ming Shan, Hanif, Ikhlasul Akmal, Hakim, M. Alif Al, Sya'ban, Muhammad Rizky, Kerdthaisong, Kun, Miranda, Lester James V., Koto, Fajri, Fatyanosa, Tirana Noor, Aji, Alham Fikri, Rosal, Jostin Jerico, Kevin, Jun, Wijaya, Robert, Kampman, Onno P., Zhang, Ruochen, Karlsson, Börje F., Limkonchotiwat, Peerat

arXiv.org Artificial IntelligenceMar-18-2025

Southeast Asia (SEA) is a region of extraordinary linguistic and cultural diversity, yet it remains significantly underrepresented in vision-language (VL) research. This often results in artificial intelligence (AI) models that fail to capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing high-quality, culturally relevant data for SEA languages. By involving contributors from SEA countries, SEA-VL aims to ensure better cultural relevance and diversity, fostering greater inclusivity of underrepresented languages in VL research. Beyond crowdsourcing, our initiative goes one step further in the exploration of the automatic collection of culturally relevant images through crawling and image generation. First, we find that image crawling achieves approximately ~85% cultural relevance while being more cost- and time-efficient than crowdsourcing. Second, despite the substantial progress in generative vision models, synthetic images remain unreliable in accurately reflecting SEA cultures. The generated images often fail to reflect the nuanced traditions and cultural contexts of the region. Collectively, we gather 1.28M SEA culturally-relevant images, more than 50 times larger than other existing datasets. Through SEA-VL, we aim to bridge the representation gap in SEA, fostering the development of more inclusive AI systems that authentically represent diverse cultures across SEA.

caption, computational linguistic, dataset, (14 more...)

arXiv.org Artificial Intelligence

2503.0792

Country:

Asia > Southeast Asia (0.61)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Yue, Xiang, Song, Yueqi, Asai, Akari, Kim, Seungone, Nyandwi, Jean de Dieu, Khanuja, Simran, Kantharuban, Anjali, Sutawika, Lintang, Ramamoorthy, Sathyanarayanan, Neubig, Graham

arXiv.org Artificial IntelligenceDec-8-2024

Despite recent advances in multimodal large language models (MLLMs), their development has predominantly focused on English- and western-centric datasets and tasks, leaving most of the world's languages and diverse cultural contexts underrepresented. This paper introduces Pangea, a multilingual multimodal LLM trained on PangeaIns, a diverse 6M instruction dataset spanning 39 languages. PangeaIns features: 1) high-quality English instructions, 2) carefully machine-translated instructions, and 3) culturally relevant multimodal tasks to ensure cross-cultural coverage. To rigorously assess models' capabilities, we introduce PangeaBench, a holistic evaluation suite encompassing 14 datasets covering 47 languages. Results show that Pangea significantly outperforms existing open-source models in multilingual settings and diverse cultural contexts. Ablation studies further reveal the importance of English data proportions, language popularity, and the number of multimodal training samples on overall performance. We fully open-source our data, code, and trained checkpoints, to facilitate the development of inclusive and robust multilingual MLLMs, promoting equity and accessibility across a broader linguistic and cultural spectrum.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.16153

Country:

North America > United States (0.92)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Consumer Products & Services (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Beyond Browsing: API-Based Web Agents

Song, Yueqi, Xu, Frank, Zhou, Shuyan, Neubig, Graham

arXiv.org Artificial IntelligenceOct-21-2024

Web browsers are a portal to the internet, where much of human activity is undertaken. Thus, there has been significant research work in AI agents that interact with the internet through web browsing. However, there is also another interface designed specifically for machine interaction with online content: application programming interfaces (APIs). In this paper we ask - what if we were to take tasks traditionally tackled by browsing agents, and give AI agents access to APIs? To do so, we propose two varieties of agents: (1) an API-calling agent that attempts to perform online tasks through APIs only, similar to traditional coding agents, and (2) a Hybrid Agent that can interact with online data through both web browsing and APIs. In experiments on WebArena, a widely-used and realistic benchmark for web navigation tasks, we find that API-based agents outperform web browsing agents. These results strongly suggest that when APIs are available, they present an attractive alternative to relying on web browsing alone. Existing web agents typically operate within the space of graphical user interfaces (GUI) (Zhang et al., 2023; Zhou et al., 2023; Zheng et al., 2024), using action spaces that simulate human-like keyboard and mouse operations, such as clicking and typing. To observe web pages, common approaches include using accessibility trees, a simplified version of the HTML DOM tree, as the input to text-based models (Zhou et al., 2023; Drouin et al., 2024a), or multimodal, screenshot-based models (Koh et al., 2024a; Xie et al., 2024; You et al., 2024; Hong et al., 2023). However, regardless of the method of interaction with web sites, there is no getting around the fact that these sites were originally designed for human consumption, and may not be the ideal interface for machines. Notably, there is another interface designed specifically for machine interaction with online content: application programming interfaces (APIs). APIs allow machines to communicate directly with the backend of a web service (Branavan et al., 2009), sending and receiving data in machine-friendly formats such as JSON or XML (Meng et al., 2018; Xu et al., 2021). Nonetheless, whether AI agents can effectively use APIs to tackle real-world online tasks, and the conditions under which this is possible, remain unstudied in the scientific literature. In this work, we explore methods for tackling tasks normally framed as web-navigation tasks with an expanded action space to interact with APIs. To do so, we develop new API-based agents that directly interact with web services via API calls, as depicted in Figure 1. At the same time, not all websites have extensive API support, in which case web browsing actions may still be required.

agent, api, artificial intelligence, (16 more...)

arXiv.org Artificial Intelligence

2410.16464

Country:

North America > United States > Massachusetts (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.83)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance

Khanuja, Simran, Ramamoorthy, Sathyanarayanan, Song, Yueqi, Neubig, Graham

arXiv.org Artificial IntelligenceJun-19-2024

Given the rise of multimedia content, human translators increasingly focus on culturally adapting not only words but also other modalities such as images to convey the same meaning. While several applications stand to benefit from this, machine translation systems remain confined to dealing with language in speech and text. In this work, we take a first step towards translating images to make them culturally relevant. First, we build three pipelines comprising state-of-the-art generative models to do the task. Next, we build a two-part evaluation dataset: i) concept: comprising 600 images that are cross-culturally coherent, focusing on a single concept per image, and ii) application: comprising 100 images curated from real-world applications. We conduct a multi-faceted human evaluation of translated images to assess for cultural relevance and meaning preservation. We find that as of today, image-editing models fail at this task, but can be improved by leveraging LLMs and retrievers in the loop. Best pipelines can only translate 5% of images for some countries in the easier concept dataset and no translation is successful for some countries in the application dataset, highlighting the challenging nature of the task. Our code and data is released here: https://github.com/simran-khanuja/image-transcreation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.01247

Country:

Asia (0.72)
North America > United States (0.69)
Africa (0.47)

Genre: Research Report (0.82)

Industry:

Media (0.67)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

What Is Missing in Multilingual Visual Reasoning and How to Fix It

Song, Yueqi, Khanuja, Simran, Neubig, Graham

arXiv.org Artificial IntelligenceMar-3-2024

NLP models today strive for supporting multiple languages and modalities, improving accessibility for diverse users. In this paper, we evaluate their multilingual, multimodal capabilities by testing on a visual reasoning task. We observe that proprietary systems like GPT-4V obtain the best performance on this task now, but open models lag in comparison. Surprisingly, GPT-4V exhibits similar performance between English and other languages, indicating the potential for equitable system development across languages. Our analysis on model failures reveals three key aspects that make this task challenging: multilinguality, complex reasoning, and multimodality. To address these challenges, we propose three targeted interventions including a translate-test approach to tackle multilinguality, a visual programming approach to break down complex reasoning, and a novel method that leverages image captioning to address multimodality. Our interventions achieve the best open performance on this task in a zero-shot setting, boosting open model LLaVA by 13.4%, while also minorly improving GPT-4V's performance.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2403.01404

Country:

North America > Canada (0.14)
Europe > Italy (0.14)
Europe > Ireland (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

GlobalBench: A Benchmark for Global Progress in Natural Language Processing

Song, Yueqi, Cui, Catherine, Khanuja, Simran, Liu, Pengfei, Faisal, Fahim, Ostapenko, Alissa, Winata, Genta Indra, Aji, Alham Fikri, Cahyawijaya, Samuel, Tsvetkov, Yulia, Anastasopoulos, Antonios, Neubig, Graham

arXiv.org Artificial IntelligenceMay-24-2023

Despite the major advances in NLP, significant disparities in NLP system performance across languages still exist. Arguably, these are due to uneven resource allocation and sub-optimal incentives to work on less resourced languages. To track and further incentivize the global development of equitable language technology, we introduce GlobalBench. Prior multilingual benchmarks are static and have focused on a limited number of tasks and languages. In contrast, GlobalBench is an ever-expanding collection that aims to dynamically track progress on all NLP datasets in all languages. Rather than solely measuring accuracy, GlobalBench also tracks the estimated per-speaker utility and equity of technology across all languages, providing a multi-faceted view of how language technology is serving people of the world. Furthermore, GlobalBench is designed to identify the most under-served languages, and rewards research efforts directed towards those languages. At present, the most under-served languages are the ones with a relatively high population, but nonetheless overlooked by composite multilingual benchmarks (like Punjabi, Portuguese, and Wu Chinese). Currently, GlobalBench covers 966 datasets in 190 languages, and has 1,128 system submissions spanning 62 languages.

artificial intelligence, globalbench, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.14716

Country: North America > United States > Michigan (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback