AITopics | Hu, Yujia

Collaborating Authors

Hu, Yujia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GPTKB: Comprehensively Materializing Factual LLM Knowledge

Hu, Yujia, Nguyen, Tuan-Phong, Ghosh, Shrestha, Razniewski, Simon

arXiv.org Artificial IntelligenceDec-16-2024

LLMs have majorly advanced NLP and AI, and next to their ability to perform a wide range of procedural tasks, a major success factor is their internalized factual knowledge. Since (Petroni et al., 2019), analyzing this knowledge has gained attention. However, most approaches investigate one question at a time via modest-sized pre-defined samples, introducing an availability bias (Tversky and Kahnemann, 1973) that prevents the discovery of knowledge (or beliefs) of LLMs beyond the experimenter's predisposition. To address this challenge, we propose a novel methodology to comprehensively materializing an LLM's factual knowledge through recursive querying and result consolidation. As a prototype, we employ GPT-4o-mini to construct GPTKB, a large-scale knowledge base (KB) comprising 105 million triples for over 2.9 million entities - achieved at 1% of the cost of previous KB projects. This work marks a milestone in two areas: For LLM research, for the first time, it provides constructive insights into the scope and structure of LLMs' knowledge (or beliefs). For KB construction, it pioneers new pathways for the long-standing challenge of general-domain KB construction. GPTKB is accessible at https://gptkb.org.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.0492

Country:

Europe > Germany (0.46)
North America > United States > Massachusetts (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Analyst Reports and Stock Performance: Evidence from the Chinese Market

Liu, Rui, Liang, Jiayou, Chen, Haolong, Hu, Yujia

arXiv.org Artificial IntelligenceNov-13-2024

This article applies natural language processing (NLP) to extract and quantify textual information to predict stock performance. Using an extensive dataset of Chinese analyst reports and employing a customized BERT deep learning model for Chinese text, this study categorizes the sentiment of the reports as positive, neutral, or negative. The findings underscore the predictive capacity of this sentiment indicator for stock volatility, excess returns, and trading volume. Specifically, analyst reports with strong positive sentiment will increase excess return and intraday volatility, and vice versa, reports with strong negative sentiment also increase volatility and trading volume, but decrease future excess return. The magnitude of this effect is greater for positive sentiment reports than for negative sentiment reports. This article contributes to the empirical literature on sentiment analysis and the response of the stock market to news in the Chinese stock market.

machine learning, natural language, sentiment, (20 more...)

arXiv.org Artificial Intelligence

2411.08726

Country: Asia > China > Guangdong Province (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification

Hu, Yujia, Hu, Zhiqiang, Seah, Chun-Wei, Lee, Roy Ka-Wei

arXiv.org Artificial IntelligenceJul-16-2024

Large Language Models (LLMs) have demonstrated remarkable proficiency in a wide range of NLP tasks. However, when it comes to authorship verification (AV) tasks, which involve determining whether two given texts share the same authorship, even advanced models like ChatGPT exhibit notable limitations. This paper introduces a novel approach, termed InstructAV, for authorship verification. This approach utilizes LLMs in conjunction with a parameter-efficient fine-tuning (PEFT) method to simultaneously improve accuracy and explainability. The distinctiveness of InstructAV lies in its ability to align classification decisions with transparent and understandable explanations, representing a significant progression in the field of authorship verification. Through comprehensive experiments conducted across various datasets, InstructAV demonstrates its state-of-the-art performance on the AV task, offering high classification accuracy coupled with enhanced explanation reliability.

explanation, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.12882

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.66)

Industry:

Leisure & Entertainment (0.68)
Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations

Xiao, Yunze, Hu, Yujia, Choo, Kenny Tsu Wei, Lee, Roy Ka-wei

arXiv.org Artificial IntelligenceJun-17-2024

Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce \textsf{ToxiCloakCN}, an enhanced dataset derived from ToxiCN, augmented with homophonic substitutions and emoji transformations, to test the robustness of LLMs against these cloaking perturbations. Our findings reveal that existing models significantly underperform in detecting offensive content when these perturbations are applied. We provide an in-depth analysis of how different types of offensive content are affected by these perturbations and explore the alignment between human and model explanations of offensiveness. Our work highlights the urgent need for more advanced techniques in offensive language detection to combat the evolving tactics used to evade detection mechanisms.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.12223

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Minnesota (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Law > Civil Rights & Constitutional Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

Hung, Chia-Yu, Hu, Zhiqiang, Hu, Yujia, Lee, Roy Ka-Wei

arXiv.org Artificial IntelligenceOct-12-2023

Authorship verification (AV) is a fundamental task in natural language processing (NLP) and computational linguistics, with applications in forensic analysis, plagiarism detection, and identification of deceptive content. Existing AV techniques, including traditional stylometric and deep learning approaches, face limitations in terms of data requirements and lack of explainability. To address these limitations, this paper proposes PromptAV, a novel technique that leverages Large-Language Models (LLMs) for AV by providing step-by-step stylometric explanation prompts. PromptAV outperforms state-of-the-art baselines, operates effectively with limited training data, and enhances interpretability through intuitive explanations, showcasing its potential as an effective and interpretable solution for the AV task.

large language model, machine learning, promptav, (23 more...)

arXiv.org Artificial Intelligence

2310.08123

Country: Asia > India (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Yelp Reviews and Food Types: A Comparative Analysis of Ratings, Sentiments, and Topics

Liao, Wenyu, Shi, Yiqing, Hu, Yujia, Quan, Wei

arXiv.org Artificial IntelligenceJul-20-2023

This study examines the relationship between Yelp reviews and food types, investigating how ratings, sentiments, and topics vary across different types of food. Specifically, we analyze how ratings and sentiments of reviews vary across food types, cluster food types based on ratings and sentiments, infer review topics using machine learning models, and compare topic distributions among different food types. Our analyses reveal that some food types have similar ratings, sentiments, and topics distributions, while others have distinct patterns. We identify four clusters of food types based on ratings and sentiments and find that reviewers tend to focus on different topics when reviewing certain food types. These findings have important implications for understanding user behavior and cultural influence on digital media platforms and promoting cross-cultural understanding and appreciation.

artificial intelligence, food type, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.10826

Country: North America > United States (0.70)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Consumer Health (1.00)
Consumer Products & Services > Restaurants (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback