AITopics | Whitehouse, Chenxi

Collaborating Authors

Whitehouse, Chenxi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations

Liu, Dongqi, Whitehouse, Chenxi, Yu, Xi, Mahon, Louis, Saxena, Rohit, Zhao, Zheng, Qiu, Yifu, Lapata, Mirella, Demberg, Vera

arXiv.org Artificial IntelligenceFeb-26-2025

Transforming recorded videos into concise and accurate textual summaries is a growing challenge in multimodal learning. This paper introduces VISTA, a dataset specifically designed for video-to-text summarization in scientific domains. VISTA contains 18,599 recorded AI conference presentations paired with their corresponding paper abstracts. We benchmark the performance of state-of-the-art large models and apply a plan-based framework to better capture the structured nature of abstracts. Both human and automated evaluations confirm that explicit planning enhances summary quality and factual consistency. However, a considerable gap remains between models and human performance, highlighting the challenges of scientific video summarization.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.08279

Country:

Europe > Middle East > Malta (0.14)
North America > Mexico > Mexico City (0.14)
Asia > Middle East > UAE (0.14)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
(2 more...)

Add feedback

Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models

Zhu, Xiaochen, Karadzhov, Georgi, Whitehouse, Chenxi, Vlachos, Andreas

arXiv.org Artificial IntelligenceDec-15-2024

Diffusion models have shown promise in text generation but often struggle with generating long, coherent, and contextually accurate text. Token-level diffusion overlooks word-order dependencies and enforces short output windows, while passage-level diffusion struggles with learning robust representation for long-form text. To address these challenges, we propose Segment-Level Diffusion (SLD), a framework that enhances diffusion-based text generation through text segmentation, robust representation training with adversarial and contrastive learning, and improved latent-space guidance. By segmenting long-form outputs into separate latent representations and decoding them with an autoregressive decoder, SLD simplifies diffusion predictions and improves scalability. Experiments on XSum, ROCStories, DialogSum, and DeliData demonstrate that SLD achieves competitive or superior performance in fluency, coherence, and contextual compatibility across automatic and human evaluation metrics comparing with other diffusion and autoregressive baselines. Ablation studies further validate the effectiveness of our segmentation and representation learning strategies.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.11333

Country:

Europe (1.00)
Asia (0.93)
North America > United States > California (0.28)

Genre:

Research Report (0.82)
Overview (0.68)
Personal > Interview (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

Add feedback

The Automated Verification of Textual Claims (AVeriTeC) Shared Task

Schlichtkrull, Michael, Chen, Yulong, Whitehouse, Chenxi, Deng, Zhenyun, Akhtar, Mubashara, Aly, Rami, Guo, Zhijiang, Christodoulopoulos, Christos, Cocarascu, Oana, Mittal, Arpit, Thorne, James, Vlachos, Andreas

arXiv.org Artificial IntelligenceOct-31-2024

The Automated Verification of Textual Claims (AVeriTeC) shared task asks participants to retrieve evidence and predict veracity for real-world claims checked by fact-checkers. Evidence can be found either via a search engine, or via a knowledge store provided by the organisers. Submissions are evaluated using AVeriTeC score, which considers a claim to be accurately verified if and only if both the verdict is correct and retrieved evidence is considered to meet a certain quality threshold. The shared task received 21 submissions, 18 of which surpassed our baseline. The winning team was TUDA_MAI with an AVeriTeC score of 63%. In this paper we describe the shared task, present the full results, and highlight key takeaways from the shared task.

computational linguistic, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2410.2385

Country:

Asia > China (0.93)
Asia > Middle East > UAE (0.14)
North America > United States > Louisiana (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry:

Media (0.68)
Government > Regional Government > Asia Government > China Government (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Lovenia, Holy, Mahendra, Rahmad, Akbar, Salsabil Maulana, Miranda, Lester James V., Santoso, Jennifer, Aco, Elyanah, Fadhilah, Akhdan, Mansurov, Jonibek, Imperial, Joseph Marvin, Kampman, Onno P., Moniz, Joel Ruben Antony, Habibi, Muhammad Ravi Shulthan, Hudi, Frederikus, Montalan, Railey, Ignatius, Ryan, Lopo, Joanito Agili, Nixon, William, Karlsson, Börje F., Jaya, James, Diandaru, Ryandito, Gao, Yuze, Amadeus, Patrick, Wang, Bin, Cruz, Jan Christian Blaise, Whitehouse, Chenxi, Parmonangan, Ivan Halim, Khelli, Maria, Zhang, Wenyu, Susanto, Lucky, Ryanda, Reynard Adha, Hermawan, Sonny Lazuardi, Velasco, Dan John, Kautsar, Muhammad Dehan Al, Hendria, Willy Fitra, Moslem, Yasmin, Flynn, Noah, Adilazuarda, Muhammad Farid, Li, Haochen, Lee, Johanes, Damanhuri, R., Sun, Shuo, Qorib, Muhammad Reza, Djanibekov, Amirbek, Leong, Wei Qi, Do, Quyet V., Muennighoff, Niklas, Pansuwan, Tanrada, Putra, Ilham Firdausi, Xu, Yan, Tai, Ngee Chia, Purwarianti, Ayu, Ruder, Sebastian, Tjhi, William, Limkonchotiwat, Peerat, Aji, Alham Fikri, Keh, Sedrick, Winata, Genta Indra, Zhang, Ruochen, Koto, Fajri, Yong, Zheng-Xin, Cahyawijaya, Samuel

arXiv.org Artificial IntelligenceJul-8-2024

Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, we introduce SEACrowd, a collaborative initiative that consolidates a comprehensive resource hub that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in SEA.

computational linguistic, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2406.10118

Country:

Asia > Indonesia > Sulawesi (0.45)
Asia > Philippines > Luzon > Cordillera Administrative Region (0.14)

Genre: Research Report (0.81)

Industry:

Education (0.68)
Information Technology (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

Add feedback

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Romero, David, Lyu, Chenyang, Wibowo, Haryo Akbarianto, Lynn, Teresa, Hamed, Injy, Kishore, Aditya Nanda, Mandal, Aishik, Dragonetti, Alina, Abzaliev, Artem, Tonja, Atnafu Lambebo, Balcha, Bontu Fufa, Whitehouse, Chenxi, Salamea, Christian, Velasco, Dan John, Adelani, David Ifeoluwa, Meur, David Le, Villa-Cueva, Emilio, Koto, Fajri, Farooqui, Fauzan, Belcavello, Frederico, Batnasan, Ganzorig, Vallejo, Gisela, Caulfield, Grainne, Ivetta, Guido, Song, Haiyue, Ademtew, Henok Biadglign, Maina, Hernán, Lovenia, Holy, Azime, Israel Abebe, Cruz, Jan Christian Blaise, Gala, Jay, Geng, Jiahui, Ortiz-Barajas, Jesus-German, Baek, Jinheon, Dunstan, Jocelyn, Alemany, Laura Alonso, Nagasinghe, Kumaranage Ravindu Yasas, Benotti, Luciana, D'Haro, Luis Fernando, Viridiano, Marcelo, Estecha-Garitagoitia, Marcos, Cabrera, Maria Camila Buitrago, Rodríguez-Cantelar, Mario, Jouitteau, Mélanie, Mihaylov, Mihail, Imam, Mohamed Fazli Mohamed, Adilazuarda, Muhammad Farid, Gochoo, Munkhjargal, Otgonbold, Munkh-Erdene, Etori, Naome, Niyomugisha, Olivier, Silva, Paula Mónica, Chitale, Pranjal, Dabre, Raj, Chevi, Rendi, Zhang, Ruochen, Diandaru, Ryandito, Cahyawijaya, Samuel, Góngora, Santiago, Jeong, Soyeong, Purkayastha, Sukannya, Kuribayashi, Tatsuki, Jayakumar, Thanmay, Torrent, Tiago Timponi, Ehsan, Toqeer, Araujo, Vladimir, Kementchedjhieva, Yova, Burzo, Zara, Lim, Zheng Wei, Yong, Zheng Xin, Ignat, Oana, Nwatu, Joan, Mihalcea, Rada, Solorio, Thamar, Aji, Alham Fikri

arXiv.org Artificial IntelligenceJun-9-2024

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2406.05967

Country:

South America (1.00)
Europe (1.00)
North America (0.93)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.82)

Add feedback

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Wang, Yuxia, Mansurov, Jonibek, Ivanov, Petar, Su, Jinyan, Shelmanov, Artem, Tsvigun, Akim, Afzal, Osama Mohammed, Mahmoud, Tarek, Puccetti, Giovanni, Arnold, Thomas, Whitehouse, Chenxi, Aji, Alham Fikri, Habash, Nizar, Gurevych, Iryna, Nakov, Preslav

arXiv.org Artificial IntelligenceApr-22-2024

We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual track. Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM. Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine. The task attracted a large number of participants: subtask A monolingual (126), subtask A multilingual (59), subtask B (70), and subtask C (30). In this paper, we present the task, analyze the results, and discuss the system submissions and the methods they used. For all subtasks, the best systems used LLMs.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2404.14183

Country:

North America > Mexico > Mexico City (0.18)
Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Media > News (0.67)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)

Add feedback

PRobELM: Plausibility Ranking Evaluation for Language Models

Yuan, Zhangdie, Whitehouse, Chenxi, Chamoun, Eric, Aly, Rami, Vlachos, Andreas

arXiv.org Artificial IntelligenceApr-4-2024

This paper introduces PRobELM (Plausibility Ranking Evaluation for Language Models), a benchmark designed to assess language models' ability to discern more plausible from less plausible scenarios through their parametric knowledge. While benchmarks such as TruthfulQA emphasise factual accuracy or truthfulness, and others such as COPA explore plausible scenarios without explicitly incorporating world knowledge, PRobELM seeks to bridge this gap by evaluating models' capabilities to prioritise plausible scenarios that leverage world knowledge over less plausible alternatives. This design allows us to assess the potential of language models for downstream use cases such as literature-based discovery where the focus is on identifying information that is likely but not yet known. Our benchmark is constructed from a dataset curated from Wikidata edit histories, tailored to align the temporal bounds of the training data for the evaluated models. PRobELM facilitates the evaluation of language models across multiple prompting types, including statement, text completion, and question-answering. Experiments with 10 models of various sizes and architectures on the relationship between model scales, training recency, and plausibility performance, reveal that factual accuracy does not directly correlate with plausibility performance and that up-to-date training data enhances plausibility assessment across different model architectures.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2404.03818

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry:

Government (1.00)
Law (0.68)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Towards Knowledge-Grounded Natural Language Understanding and Generation

Whitehouse, Chenxi

arXiv.org Artificial IntelligenceMar-22-2024

This thesis investigates how natural language understanding and generation with transformer models can benefit from grounding the models with knowledge representations. Currently, the most prevailing paradigm for training language models is through pre-training on abundant raw text data and fine-tuning on downstream tasks. Although language models continue to advance, especially the recent trend of Large Language Models (LLMs) such as ChatGPT, there seem to be limits to what can be achieved with text data alone and it is desirable to study the impact of applying and integrating rich forms of knowledge representation to improve model performance. The most widely used form of knowledge for language modelling is structured knowledge in the form of triples consisting of entities and their relationships, often in English. This thesis explores beyond this conventional approach and aims to address several key questions: Can knowledge of entities extend its benefits beyond entity-centric tasks such as entity linking? How can we faithfully and effectively extract such structured knowledge from raw text, especially noisy web text? How do other types of knowledge, beyond structured knowledge, contribute to improving NLP tasks?

generative information extraction, large language model, machine learning, (26 more...)

arXiv.org Artificial Intelligence

2403.15364

Country:

Asia (1.00)
North America > United States > California (0.27)
North America > United States > Washington > King County > Seattle (0.14)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(4 more...)

Add feedback

Multitask Multilingual Model Adaptation with Featurized Low-Rank Mixtures

Lin, Chu-Cheng, Wang, Xinyi, Clark, Jonathan H., Lu, Han, Zhu, Yun, Whitehouse, Chenxi, Yu, Hongkun

arXiv.org Artificial IntelligenceFeb-27-2024

Adapting pretrained large language models (LLMs) to various downstream tasks in tens or hundreds of human languages is computationally expensive. Parameter-efficient fine-tuning (PEFT) significantly reduces the adaptation cost, by tuning only a small amount of parameters. However, directly applying PEFT methods such as LoRA (Hu et al., 2022) on diverse dataset mixtures could lead to suboptimal performance due to limited parameter capacity and negative interference among different datasets. In this work, we propose Featurized Low-rank Mixtures (FLix), a novel PEFT method designed for effective multitask multilingual tuning. FLix associates each unique dataset feature, such as the dataset's language or task, with its own low-rank weight update parameters. By composing feature-specific parameters for each dataset, FLix can accommodate diverse dataset mixtures and generalize better to unseen datasets. Our experiments show that FLix leads to significant improvements over a variety of tasks for both supervised learning and zero-shot settings using different training data mixtures.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2402.17934

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Parameter-Efficient Multilingual Summarisation: An Empirical Study

Whitehouse, Chenxi, Huot, Fantine, Bastings, Jasmijn, Dehghani, Mostafa, Lin, Chu-Cheng, Lapata, Mirella

arXiv.org Artificial IntelligenceNov-14-2023

With the increasing prevalence of Large Language Models, traditional full fine-tuning approaches face growing challenges, especially in memory-intensive tasks. This paper investigates the potential of Parameter-Efficient Fine-Tuning, focusing on Low-Rank Adaptation (LoRA), for complex and under-explored multilingual summarisation tasks. We conduct an extensive study across different data availability scenarios, including full-data, low-data, and cross-lingual transfer, leveraging models of different sizes. Our findings reveal that LoRA lags behind full fine-tuning when trained with full data, however, it excels in low-data scenarios and cross-lingual transfer. Interestingly, as models scale up, the performance gap between LoRA and full fine-tuning diminishes. Additionally, we investigate effective strategies for few-shot cross-lingual transfer, finding that continued LoRA tuning achieves the best performance compared to both full fine-tuning and dynamic composition of language-specific LoRA modules.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2311.08572

Country:

North America > Canada (0.14)
Asia > Middle East > UAE (0.14)
North America > United States (0.14)
(5 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Continuing Education (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback