AITopics | Zhang, Yigeng

Collaborating Authors

Zhang, Yigeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

Baharlouei, Elaheh, Shafaei, Mahsa, Zhang, Yigeng, Escalante, Hugo Jair, Solorio, Thamar

arXiv.org Artificial IntelligenceJun-11-2024

We address the challenge of detecting questionable content in online media, specifically the subcategory of comic mischief. This type of content combines elements such as violence, adult content, or sarcasm with humor, making it difficult to detect. Employing a multimodal approach is vital to capture the subtle details inherent in comic mischief content. To tackle this problem, we propose a novel end-to-end multimodal system for the task of comic mischief detection. As part of this contribution, we release a novel dataset for the targeted task consisting of three modalities: video, text (video captions and subtitles), and audio. We also design a HIerarchical Cross-attention model with CAPtions (HICCAP) to capture the intricate relationships among these modalities. The results show that the proposed approach makes a significant improvement over robust baselines and state-of-the-art models for comic mischief detection and its type classification. This emphasizes the potential of our system to empower users, to make informed decisions about the online content they choose to see. In addition, we conduct experiments on the UCF101, HMDB51, and XD-Violence datasets, comparing our model against other state-of-the-art approaches showcasing the outstanding performance of our proposed model in various scenarios.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.07841

Country:

North America > United States (0.28)
North America > Mexico > Puebla (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)
Research Report > Promising Solution (0.86)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.95)
(2 more...)

Add feedback

Interpreting Themes from Educational Stories

Zhang, Yigeng, González, Fabio A., Solorio, Thamar

arXiv.org Artificial IntelligenceApr-8-2024

Reading comprehension continues to be a crucial research focus in the NLP community. Recent advances in Machine Reading Comprehension (MRC) have mostly centered on literal comprehension, referring to the surface-level understanding of content. In this work, we focus on the next level - interpretive comprehension, with a particular emphasis on inferring the themes of a narrative text. We introduce the first dataset specifically designed for interpretive comprehension of educational narratives, providing corresponding well-edited theme texts. The dataset spans a variety of genres and cultural origins and includes human-annotated theme keywords with varying levels of granularity. We further formulate NLP tasks under different abstractions of interpretive comprehension toward the main idea of a story. After conducting extensive experiments with state-of-the-art methods, we found the task to be both challenging and significant for NLP research.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2404.0525

Country:

Europe (0.68)
North America > United States > Texas (0.28)
Asia > Middle East > Republic of Türkiye (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Positive and Risky Message Assessment for Music Products

Zhang, Yigeng, Shafaei, Mahsa, Gonzalez, Fabio, Solorio, Thamar

arXiv.org Artificial IntelligenceSep-18-2023

People can use various tools, such as high-fidelity players and streaming apps, to enjoy In this work, we introduce a novel NLP task: assessing music at any time. Listeners can simply go the positive and risky messages of a music online, press the PLAY button, and find themselves item. We study the messages that a music item invigorated after a bad day. However, this easy access conveys from five significant dimensions regarding also raises concerns that children and adolescents appropriateness: Positive Messages, Violence, may have a higher chance of being exposed to Substance Consumption, Sex, and Consumerism risky content.

lyric, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.10182

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area (0.94)
Media > Music (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

Hou, Haowen, Yan, Xiaopeng, Zhang, Yigeng, Lian, Fengzong, Kang, Zhanhui

arXiv.org Artificial IntelligenceDec-29-2022

In the field of cross-modal retrieval, single encoder models tend to perform better than dual encoder models, but they suffer from high latency and low throughput. In this paper, we present a dual encoder model called BagFormer that utilizes a cross modal interaction mechanism to improve recall performance without sacrificing latency and throughput. BagFormer achieves this through the use of bag-wise interactions, which allow for the transformation of text to a more appropriate granularity and the incorporation of entity knowledge into the model. Our experiments demonstrate that BagFormer is able to achieve results comparable to state-of-the-art single encoder models in cross-modal retrieval tasks, while also offering efficient training and inference with 20.72 times lower latency and 25.74 times higher throughput.

bagformer, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.14322

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback