AITopics | Lee, Roy Ka-Wei

Collaborating Authors

Lee, Roy Ka-Wei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation

Wang, Han, Lee, Roy Ka-Wei

arXiv.org Artificial IntelligenceFeb-24-2024

Online memes have emerged as powerful digital cultural artifacts in the age of social media, offering not only humor but also platforms for political discourse, social critique, and information dissemination. Their extensive reach and influence in shaping online communities' sentiments make them invaluable tools for campaigning and promoting ideologies. Despite the development of several meme-generation tools, there remains a gap in their systematic evaluation and their ability to effectively communicate ideologies. Addressing this, we introduce MemeCraft, an innovative meme generator that leverages large language models (LLMs) and visual language models (VLMs) to produce memes advocating specific social movements. MemeCraft presents an end-to-end pipeline, transforming user prompts into compelling multimodal memes without manual intervention. Conscious of the misuse potential in creating divisive content, an intrinsic safety mechanism is embedded to curb hateful meme production.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3589334.3648151

2403.14652

Country:

Asia > Singapore (0.14)
North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Government (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models

Hee, Ming Shan, Sharma, Shivam, Cao, Rui, Nandi, Palash, Chakraborty, Tanmoy, Lee, Roy Ka-Wei

arXiv.org Artificial IntelligenceFeb-1-2024

In the evolving landscape of online communication, moderating hate speech (HS) presents an intricate challenge, compounded by the multimodal nature of digital content. This comprehensive survey delves into the recent strides in HS moderation, spotlighting the burgeoning role of large language models (LLMs) and large multimodal models (LMMs). Our exploration begins with a thorough analysis of current literature, revealing the nuanced interplay between textual, visual, and auditory elements in propagating HS. We uncover a notable trend towards integrating these modalities, primarily due to the complexity and subtlety with which HS is disseminated. A significant emphasis is placed on the advances facilitated by LLMs and LMMs, which have begun to redefine the boundaries of detection and moderation capabilities. We identify existing gaps in research, particularly in the context of underrepresented languages and cultures, and the need for solutions to handle low-resource settings. The survey concludes with a forward-looking perspective, outlining potential avenues for future research, including the exploration of novel AI methodologies, the ethical governance of AI in moderation, and the development of more nuanced, context-aware systems. This comprehensive overview aims to catalyze further research and foster a collaborative effort towards more sophisticated, responsible, and human-centric approaches to HS moderation in the digital era. WARNING: This paper contains offensive examples.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2401.16727

Country: Asia > Middle East > Palestine (0.14)

Genre: Overview (1.00)

Industry:

Government > Regional Government (0.68)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Prompting Large Language Models for Topic Modeling

Wang, Han, Prakash, Nirmalendu, Hoang, Nguyen Khoi, Hee, Ming Shan, Naseem, Usman, Lee, Roy Ka-Wei

arXiv.org Artificial IntelligenceDec-15-2023

Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words. Moreover, these models often neglect sentence-level semantics, focusing primarily on token-level semantics. In this paper, we propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs) to address these challenges. It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths. This approach eliminates the need for manual parameter tuning and improves the quality of extracted topics. We benchmark PromptTopic against the state-of-the-art baselines on three vastly diverse datasets, establishing its proficiency in discovering meaningful topics. Furthermore, qualitative analysis showcases PromptTopic's ability to uncover relevant topics in multiple datasets.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.09693

Country:

Asia > Middle East > Yemen (0.15)
Africa > Middle East > Egypt (0.14)
North America > United States > Virginia (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

MATK: The Meme Analytical Tool Kit

Hee, Ming Shan, Kumaresan, Aditi, Hoang, Nguyen Khoi, Prakash, Nirmalendu, Cao, Rui, Lee, Roy Ka-Wei

arXiv.org Artificial IntelligenceDec-10-2023

The rise of social media platforms has brought about a new digital culture called memes. Memes, which combine visuals and text, can strongly influence public opinions on social and cultural issues. As a result, people have become interested in categorizing memes, leading to the development of various datasets and multimodal models that show promising results in this field. However, there is currently a lack of a single library that allows for the reproduction, evaluation, and comparison of these models using fair benchmarks and settings. To fill this gap, we introduce the Meme Analytical Tool Kit (MATK), an open-source toolkit specifically designed to support existing memes datasets and cutting-edge multimodal models. MATK aims to assist researchers and engineers in training and reproducing these multimodal models for meme classification tasks, while also providing analysis techniques to gain insights into their strengths and weaknesses. To access MATK, please visit \url{https://github.com/Social-AI-Studio/MATK}.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3581783.3613463

2312.06094

Country:

Asia > Singapore (0.20)
North America > United States > New York (0.14)

Genre: Research Report (0.65)

Industry: Information Technology (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models

Prakash, Nirmalendu, Wang, Han, Hoang, Nguyen Khoi, Hee, Ming Shan, Lee, Roy Ka-Wei

arXiv.org Artificial IntelligenceDec-10-2023

The proliferation of social media has given rise to a new form of communication: memes. Memes are multimodal and often contain a combination of text and visual elements that convey meaning, humor, and cultural significance. While meme analysis has been an active area of research, little work has been done on unsupervised multimodal topic modeling of memes, which is important for content moderation, social media analysis, and cultural studies. We propose \textsf{PromptMTopic}, a novel multimodal prompt-based model designed to learn topics from both text and visual modalities by leveraging the language modeling capabilities of large language models. Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities. We evaluate our proposed model through extensive experiments on three real-world meme datasets, which demonstrate its superiority over state-of-the-art topic modeling baselines in learning descriptive topics in memes. Additionally, our qualitative analysis shows that \textsf{PromptMTopic} can identify meaningful and culturally relevant topics from memes. Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.\\ \red{\textbf{Disclaimer: This paper contains sensitive content that may be disturbing to some readers.}}

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3581783.3613836

2312.06093

Country:

Asia (1.00)
Europe (0.93)
North America > Canada > Ontario (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SBTRec- A Transformer Framework for Personalized Tour Recommendation Problem with Sentiment Analysis

Ho, Ngai Lam, Lee, Roy Ka-Wei, Lim, Kwan Hui

arXiv.org Artificial IntelligenceNov-18-2023

When traveling to an unfamiliar city for holidays, tourists often rely on guidebooks, travel websites, or recommendation systems to plan their daily itineraries and explore popular points of interest (POIs). However, these approaches may lack optimization in terms of time feasibility, localities, and user preferences. In this paper, we propose the SBTRec algorithm: a BERT-based Trajectory Recommendation with sentiment analysis, for recommending personalized sequences of POIs as itineraries. The key contributions of this work include analyzing users' check-ins and uploaded photos to understand the relationship between POI visits and distance. We introduce SBTRec, which encompasses sentiment analysis to improve recommendation accuracy by understanding users' preferences and satisfaction levels from reviews and comments about different POIs. Our proposed algorithms are evaluated against other sequence prediction methods using datasets from 8 cities. The results demonstrate that SBTRec achieves an average F1 score of 61.45%, outperforming baseline algorithms. The paper further discusses the flexibility of the SBTRec algorithm, its ability to adapt to different scenarios and cities without modification, and its potential for extension by incorporating additional information for more reliable predictions. Overall, SBTRec provides personalized and relevant POI recommendations, enhancing tourists' overall trip experiences. Future work includes fine-tuning personalized embeddings for users, with evaluation of users' comments on POIs,~to further enhance prediction accuracy.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.11071

Country:

North America > United States (0.46)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.83)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.83)

Add feedback

Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts

Ghosal, Deepanway, Majumder, Navonil, Lee, Roy Ka-Wei, Mihalcea, Rada, Poria, Soujanya

arXiv.org Artificial IntelligenceOct-30-2023

Visual question answering (VQA) is the task of answering questions about an image. The task assumes an understanding of both the image and the question to provide a natural language answer. VQA has gained popularity in recent years due to its potential applications in a wide range of fields, including robotics, education, and healthcare. In this paper, we focus on knowledge-augmented VQA, where answering the question requires commonsense knowledge, world knowledge, and reasoning about ideas and concepts not present in the image. We propose a multimodal framework that uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately. We benchmark our method on the multi-choice question-answering task of the A-OKVQA, Science-QA, VSR, and IconQA datasets using CLIP and BLIP models. We show that the use of language guidance is a simple but powerful and effective strategy for visual question answering. Our language guidance improves the performance of CLIP by 7.6% and BLIP-2 by 4.8% in the challenging A-OKVQA dataset. We also observe consistent improvement in performance on the Science-QA, VSR, and IconQA datasets when using the proposed language guidances. The implementation of LG-VQA is publicly available at https:// github.com/declare-lab/LG-VQA.

artificial intelligence, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2310.20159

Country:

North America > United States (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.40)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

BTRec: BERT-Based Trajectory Recommendation for Personalized Tours

Ho, Ngai Lam, Lee, Roy Ka-Wei, Lim, Kwan Hui

arXiv.org Artificial IntelligenceOct-30-2023

An essential task for tourists having a pleasant holiday is to have a well-planned itinerary with relevant recommendations, especially when visiting unfamiliar cities. Many tour recommendation tools only take into account a limited number of factors, such as popular Points of Interest (POIs) and routing constraints. Consequently, the solutions they provide may not always align with the individual users of the system. We propose an iterative algorithm in this paper, namely: BTREC (BERT-based Trajectory Recommendation), that extends from the POIBERT embedding algorithm to recommend personalized itineraries on POIs using the BERT framework. Our BTREC algorithm incorporates users' demographic information alongside past POI visits into a modified BERT language model to recommend a personalized POI itinerary prediction given a pair of source and destination POIs. Our recommendation system can create a travel itinerary that maximizes POIs visited, while also taking into account user preferences for categories of POIs and time availability. Our recommendation algorithm is largely inspired by the problem of sentence completion in natural language processing (NLP). Using a dataset of eight cities of different sizes, our experimental results demonstrate that our proposed algorithm is stable and outperforms many other sequence prediction algorithms, measured by recall, precision, and F1-scores.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.19886

Country:

Asia (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Consumer Products & Services > Travel (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Using Audio Data to Facilitate Depression Risk Assessment in Primary Health Care

Levinson, Adam Valen, Goyal, Abhay, Man, Roger Ho Chun, Lee, Roy Ka-Wei, Saha, Koustuv, Parekh, Nimay, Altice, Frederick L., Cheung, Lam Yin, De Choudhury, Munmun, Kumar, Navin

arXiv.org Artificial IntelligenceOct-16-2023

Telehealth is a valuable tool for primary health care (PHC), where depression is a common condition. PHC is the first point of contact for most people with depression, but about 25% of diagnoses made by PHC physicians are inaccurate. Many other barriers also hinder depression detection and treatment in PHC. Artificial intelligence (AI) may help reduce depression misdiagnosis in PHC and improve overall diagnosis and treatment outcomes. Telehealth consultations often have video issues, such as poor connectivity or dropped calls. Audio-only telehealth is often more practical for lower-income patients who may lack stable internet connections. Thus, our study focused on using audio data to predict depression risk. The objectives were to: 1) Collect audio data from 24 people (12 with depression and 12 without mental health or major health condition diagnoses); 2) Build a machine learning model to predict depression risk. TPOT, an autoML tool, was used to select the best machine learning algorithm, which was the K-nearest neighbors classifier. The selected model had high performance in classifying depression risk (Precision: 0.98, Recall: 0.93, F1-Score: 0.96). These findings may lead to a range of tools to help screen for and treat depression. By developing tools to detect depression risk, patients can be routed to AI-driven chatbots for initial screenings. Partnerships with a range of stakeholders are crucial to implementing these solutions. Moreover, ethical considerations, especially around data privacy and potential biases in AI models, need to be at the forefront of any AI-driven intervention in mental health care.

artificial intelligence, depression, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2310.10928

Country: North America > United States (0.69)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

Hung, Chia-Yu, Hu, Zhiqiang, Hu, Yujia, Lee, Roy Ka-Wei

arXiv.org Artificial IntelligenceOct-12-2023

Authorship verification (AV) is a fundamental task in natural language processing (NLP) and computational linguistics, with applications in forensic analysis, plagiarism detection, and identification of deceptive content. Existing AV techniques, including traditional stylometric and deep learning approaches, face limitations in terms of data requirements and lack of explainability. To address these limitations, this paper proposes PromptAV, a novel technique that leverages Large-Language Models (LLMs) for AV by providing step-by-step stylometric explanation prompts. PromptAV outperforms state-of-the-art baselines, operates effectively with limited training data, and enhances interpretability through intuitive explanations, showcasing its potential as an effective and interpretable solution for the AV task.

large language model, machine learning, promptav, (23 more...)

arXiv.org Artificial Intelligence

2310.08123

Country: Asia > India (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback