AITopics | Ng, Lynnette Hui Xian

Collaborating Authors

Ng, Lynnette Hui Xian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Cahyawijaya, Samuel, Lovenia, Holy, Moniz, Joel Ruben Antony, Wong, Tack Hwa, Farhansyah, Mohammad Rifqi, Maung, Thant Thiri, Hudi, Frederikus, Anugraha, David, Habibi, Muhammad Ravi Shulthan, Qorib, Muhammad Reza, Agarwal, Amit, Imperial, Joseph Marvin, Patel, Hitesh Laxmichand, Feliren, Vicky, Nasution, Bahrul Ilmi, Rufino, Manuel Antonio, Winata, Genta Indra, Rajagede, Rian Adam, Catalan, Carlos Rafael, Imam, Mohamed Fazli, Pattnayak, Priyaranjan, Pranida, Salsabila Zahirah, Pratama, Kevin, Bangera, Yeshil, Na-Thalang, Adisai, Monderin, Patricia Nicole, Song, Yueqi, Simon, Christian, Ng, Lynnette Hui Xian, Sapan, Richardy Lobo', Rafi, Taki Hasan, Wang, Bin, Supryadi, null, Veerakanjana, Kanyakorn, Ittichaiwong, Piyalitt, Roque, Matthew Theodore, Vincentio, Karissa, Kreangphet, Takdanai, Artkaew, Phakphum, Palgunadi, Kadek Hendrawan, Yu, Yanzhi, Hastuti, Rochana Prih, Nixon, William, Bangera, Mithil, Lim, Adrian Xuan Wei, Khine, Aye Hninn, Zhafran, Hanif Muhammad, Ferdinan, Teddy, Izzani, Audra Aurora, Singh, Ayushman, Evan, null, Krito, Jauza Akbar, Anugraha, Michael, Ilasariya, Fenal Ashokbhai, Li, Haochen, Daniswara, John Amadeo, Tjiaranata, Filbert Aurelian, Yulianrifat, Eryawan Presma, Udomcharoenchaikit, Can, Ansori, Fadil Risdian, Ihsani, Mahardika Krisna, Nguyen, Giang, Barik, Anab Maulana, Velasco, Dan John, Genadi, Rifo Ahmad, Saha, Saptarshi, Wei, Chengwei, Flores, Isaiah, Chen, Kenneth Ko Han, Santos, Anjela Gail, Lim, Wan Shen, Phyo, Kaung Si, Santos, Tim, Dwiastuti, Meisyarah, Luo, Jiayun, Cruz, Jan Christian Blaise, Hee, Ming Shan, Hanif, Ikhlasul Akmal, Hakim, M. Alif Al, Sya'ban, Muhammad Rizky, Kerdthaisong, Kun, Miranda, Lester James V., Koto, Fajri, Fatyanosa, Tirana Noor, Aji, Alham Fikri, Rosal, Jostin Jerico, Kevin, Jun, Wijaya, Robert, Kampman, Onno P., Zhang, Ruochen, Karlsson, Börje F., Limkonchotiwat, Peerat

arXiv.org Artificial IntelligenceMar-18-2025

Southeast Asia (SEA) is a region of extraordinary linguistic and cultural diversity, yet it remains significantly underrepresented in vision-language (VL) research. This often results in artificial intelligence (AI) models that fail to capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing high-quality, culturally relevant data for SEA languages. By involving contributors from SEA countries, SEA-VL aims to ensure better cultural relevance and diversity, fostering greater inclusivity of underrepresented languages in VL research. Beyond crowdsourcing, our initiative goes one step further in the exploration of the automatic collection of culturally relevant images through crawling and image generation. First, we find that image crawling achieves approximately ~85% cultural relevance while being more cost- and time-efficient than crowdsourcing. Second, despite the substantial progress in generative vision models, synthetic images remain unreliable in accurately reflecting SEA cultures. The generated images often fail to reflect the nuanced traditions and cultural contexts of the region. Collectively, we gather 1.28M SEA culturally-relevant images, more than 50 times larger than other existing datasets. Through SEA-VL, we aim to bridge the representation gap in SEA, fostering the development of more inclusive AI systems that authentically represent diverse cultures across SEA.

caption, computational linguistic, dataset, (14 more...)

arXiv.org Artificial Intelligence

2503.0792

Country:

Asia > Southeast Asia (0.61)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

What is a Social Media Bot? A Global Comparison of Bot and Human Characteristics

Ng, Lynnette Hui Xian, Carley, Kathleen M.

arXiv.org Artificial IntelligenceJan-1-2025

Chatter on social media is 20% bots and 80% humans. Chatter by bots and humans is consistently different: bots tend to use linguistic cues that can be easily automated while humans use cues that require dialogue understanding. Bots use words that match the identities they choose to present, while humans may send messages that are not related to the identities they present. Bots and humans differ in their communication structure: sampled bots have a star interaction structure, while sampled humans have a hierarchical structure. These conclusions are based on a large-scale analysis of social media tweets across ~200mil users across 7 events. Social media bots took the world by storm when social-cybersecurity researchers realized that social media users not only consisted of humans but also of artificial agents called bots. These bots wreck havoc online by spreading disinformation and manipulating narratives. Most research on bots are based on special-purposed definitions, mostly predicated on the event studied. This article first begins by asking, "What is a bot?", and we study the underlying principles of how bots are different from humans. We develop a first-principle definition of a social media bot. With this definition as a premise, we systematically compare characteristics between bots and humans across global events, and reflect on how the software-programmed bot is an Artificial Intelligent algorithm, and its potential for evolution as technology advances. Based on our results, we provide recommendations for the use and regulation of bots. Finally, we discuss open challenges and future directions: Detect, to systematically identify these automated and potentially evolving bots; Differentiate, to evaluate the goodness of the bot in terms of their content postings and relationship interactions; Disrupt, to moderate the impact of malicious bots.

artificial intelligence, machine learning, social media, (15 more...)

arXiv.org Artificial Intelligence

2501.00855

Country:

Asia (0.67)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Voting & Elections (1.00)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

What talking you?: Translating Code-Mixed Messaging Texts to English

Ng, Lynnette Hui Xian, Chan, Luo Qi

arXiv.org Artificial IntelligenceNov-7-2024

Translation of code-mixed texts to formal English allow a wider audience to understand these code-mixed languages, and facilitate downstream analysis applications such as sentiment analysis. In this work, we look at translating Singlish, which is colloquial Singaporean English, to formal standard English. Singlish is formed through the code-mixing of multiple Asian languages and dialects. We analysed the presence of other Asian languages and variants which can facilitate translation. Our dataset is short message texts, written as informal communication between Singlish speakers. We use a multi-step prompting scheme on five Large Language Models (LLMs) for language detection and translation. Our analysis show that LLMs do not perform well in this task, and we describe the challenges involved in translation of code-mixed languages. We also release our dataset in this link https://github.com/luoqichan/singlish.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.05253

Country: Asia (1.00)

Genre: Research Report (1.00)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Limpeh ga li gong: Challenges in Singlish Annotations

Chan, Luo Qi, Ng, Lynnette Hui Xian

arXiv.org Artificial IntelligenceNov-7-2024

Singlish, or Colloquial Singapore English, is a language formed from oral and social communication within multicultural Singapore. In this work, we work on a fundamental Natural Language Processing (NLP) task: Parts-Of-Speech (POS) tagging of Singlish sentences. For our analysis, we build a parallel Singlish dataset containing direct English translations and POS tags, with translation and POS annotation done by native Singlish speakers. Our experiments show that automatic transition- and transformer- based taggers perform with only $\sim 80\%$ accuracy when evaluated against human-annotated POS labels, suggesting that there is indeed room for improvement on computation analysis of the language. We provide an exposition of challenges in Singlish annotation: its inconsistencies in form and semantics, the highly context-dependent particles of the language, its structural unique expressions, and the variation of the language on different mediums. Our task definition, resultant labels and results reflects the challenges in analysing colloquial languages formulated from a variety of dialects, and paves the way for future studies beyond POS tagging.

machine learning, natural language, singlish, (19 more...)

arXiv.org Artificial Intelligence

2410.16156

Country: Asia > Singapore (0.48)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

$\textit{Who Speaks Matters}$: Analysing the Influence of the Speaker's Ethnicity on Hate Classification

Malik, Ananya, Sharma, Kartik, Ng, Lynnette Hui Xian, Bhatt, Shaily

arXiv.org Artificial IntelligenceOct-27-2024

Large Language Models (LLMs) offer a lucrative promise for scalable content moderation, including hate speech detection. However, they are also known to be brittle and biased against marginalised communities and dialects. This requires their applications to high-stakes tasks like hate speech detection to be critically scrutinized. In this work, we investigate the robustness of hate speech classification using LLMs, particularly when explicit and implicit markers of the speaker's ethnicity are injected into the input. For the explicit markers, we inject a phrase that mentions the speaker's identity. For the implicit markers, we inject dialectal features. By analysing how frequently model outputs flip in the presence of these markers, we reveal varying degrees of brittleness across 4 popular LLMs and 5 ethnicities. We find that the presence of implicit dialect markers in inputs causes model outputs to flip more than the presence of explicit markers. Further, the percentage of flips varies across ethnicities. Finally, we find that larger models are more robust. Our findings indicate the need for exercising caution in deploying LLMs for high-stakes tasks like hate speech detection.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.2049

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Disentangling Singlish Discourse Particles with Task-Driven Representation

Foo, Linus Tze En, Ng, Lynnette Hui Xian

arXiv.org Artificial IntelligenceOct-16-2024

Singlish, or formally Colloquial Singapore English, is an English-based creole language originating from the SouthEast Asian country Singapore. The language contains influences from Sinitic languages such as Chinese dialects, Malay, Tamil and so forth. A fundamental task to understanding Singlish is to first understand the pragmatic functions of its discourse particles, upon which Singlish relies heavily to convey meaning. This work offers a preliminary effort to disentangle the Singlish discourse particles (lah, meh and hor) with task-driven representation learning. After disentanglement, we cluster these discourse particles to differentiate their pragmatic functions, and perform Singlish-to-English machine translation. Our work provides a computational method to understanding Singlish discourse particles, and opens avenues towards a deeper comprehension of the language and its usage.

machine learning, natural language, particle, (13 more...)

arXiv.org Artificial Intelligence

2409.20366

Country:

Europe (0.93)
Asia > Singapore (0.47)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DIVERSE: Deciphering Internet Views on the U.S. Military Through Video Comment Stance Analysis, A Novel Benchmark Dataset for Stance Classification

Cruickshank, Iain J., Ng, Lynnette Hui Xian

arXiv.org Artificial IntelligenceMar-5-2024

Stance detection of social media text is a key component of downstream tasks involving the identification of groups of users with opposing opinions on contested topics such as vaccination and within arguments. In particular, stance provides an indication of an opinion towards an entity. This paper introduces DIVERSE, a dataset of over 173,000 YouTube video comments annotated for their stance towards videos of the U.S. military. The stance is annotated through a human-guided, machine-assisted labeling methodology that makes use of weak signals of tone within the sentence as supporting indicators, as opposed to using manual annotations by humans. These weak signals consist of the presence of hate speech and sarcasm, the presence of specific keywords, the sentiment of the text, and the stance inference from two Large Language Models. The weak signals are then consolidated using a data programming model before each comment is annotated with a final stance label. On average, the videos have 200 comments each, and the stance of the comments skews slightly towards the "against" characterization for both the U.S. Army and the videos posted on the channel.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.03334

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military > Army (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Use of Large Language Models for Stance Classification

Cruickshank, Iain J., Ng, Lynnette Hui Xian

arXiv.org Artificial IntelligenceSep-24-2023

Stance detection, the task of predicting an author's viewpoint towards a subject of interest, has long been a focal point of research. Current stance detection methods predominantly rely on manual annotation of sentences, followed by training a supervised machine learning model. This manual annotation process, however, imposes limitations on the model's ability to fully comprehend the stances in the sentence and hampers its potential to generalize across different contexts. In this study, we investigate the use of Large Language Models (LLMs) for the task of stance classification, with an absolute minimum use of human labels. We scrutinize four distinct types of prompting schemes combined with LLMs, comparing their accuracies with manual stance determination. Our study reveals that while LLMs can match or sometimes even exceed the benchmark results in each dataset, their overall accuracy is not definitively better than what can be produced by supervised models. This suggests potential areas for improvement in the stance classification for LLMs. The application of LLMs, however, opens up promising avenues for unsupervised stance detection, thereby curtailing the need for manual collection and annotation of stances. This not only streamlines the process but also paves the way for expanding stance detection capabilities across languages. Through this paper, we shed light on the stance classification abilities of LLMs, thereby contributing valuable insights that can guide future advancements in this domain.

large language model, natural language, stance classification, (2 more...)

arXiv.org Artificial Intelligence

2309.13734

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Simulating the social influence in transport mode choices

Salazar-Serna, Kathleen, Ng, Lynnette Hui Xian, Cadavid, Lorena, Franco, Carlos J., Carley, Kathleen

arXiv.org Artificial IntelligenceAug-1-2023

Agent-based simulations have been used in modeling transportation systems for traffic management and passenger flows. In this work, we hope to shed light on the complex factors that influence transportation mode decisions within developing countries, using Colombia as a case study. We model an ecosystem of human agents that decide at each time step on the mode of transportation they would take to work. Their decision is based on a combination of their personal satisfaction with the journey they had just taken, which is evaluated across a personal vector of needs, the information they crowdsource from their prevailing social network, and their personal uncertainty about the experience of trying a new transport solution. We simulate different network structures to analyze the social influence for different decision-makers. We find that in low/medium connected groups inquisitive people actively change modes cyclically over the years while imitators cluster rapidly and change less frequently.

agent, artificial intelligence, transport mode, (17 more...)

arXiv.org Artificial Intelligence

2308.006

Country:

Asia (0.93)
South America (0.90)
Europe (0.68)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

How Hate Speech Varies by Target Identity: A Computational Analysis

Yoder, Michael Miller, Ng, Lynnette Hui Xian, Brown, David West, Carley, Kathleen M.

arXiv.org Artificial IntelligenceDec-7-2022

This paper investigates how hate speech varies in systematic ways according to the identities it targets. Across multiple hate speech datasets annotated for targeted identities, we find that classifiers trained on hate speech targeting specific identity groups struggle to generalize to other targeted identities. This provides empirical evidence for differences in hate speech by target identity; we then investigate which patterns structure this variation. We find that the targeted demographic category (e.g. gender/sexuality or race/ethnicity) appears to have a greater effect on the language of hate speech than does the relative social power of the targeted identity group. We also find that words associated with hate speech targeting specific identities often relate to stereotypes, histories of oppression, current social movements, and other social contexts specific to identities. These experiments suggest the importance of considering targeted identity, as well as the social contexts associated with these identities, in automated hate speech classification.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.10839

Country:

Asia (0.93)
North America > United States > Pennsylvania (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Terrorism (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law > Criminal Law (0.93)
(4 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback