ugc
- North America > United States > Texas (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (4 more...)
From Detection to Discovery: A Closed-Loop Approach for Simultaneous and Continuous Medical Knowledge Expansion and Depression Detection on Social Media
Geng, Shuang, Zhang, Wenli, Xie, Jiaheng, Wang, Rui, Ram, Sudha
Social media user-generated content (UGC) provides real-time, self-reported indicators of mental health conditions such as depression, offering a valuable source for predictive analytics. While prior studies integrate medical knowledge to improve prediction accuracy, they overlook the opportunity to simultaneously expand such knowledge through predictive processes. We develop a Closed-Loop Large Language Model (LLM)-Knowledge Graph framework that integrates prediction and knowledge expansion in an iterative learning cycle. In the knowledge-aware depression detection phase, the LLM jointly performs depression detection and entity extraction, while the knowledge graph represents and weights these entities to refine prediction performance. In the knowledge refinement and expansion phase, new entities, relationships, and entity types extracted by the LLM are incorporated into the knowledge graph under expert supervision, enabling continual knowledge evolution. Using large-scale UGC, the framework enhances both predictive accuracy and medical understanding. Expert evaluations confirmed the discovery of clinically meaningful symptoms, comorbidities, and social triggers complementary to existing literature. We conceptualize and operationalize prediction-through-learning and learning-through-prediction as mutually reinforcing processes, advancing both methodological and theoretical understanding in predictive analytics. The framework demonstrates the co-evolution of computational models and domain knowledge, offering a foundation for adaptive, data-driven knowledge systems applicable to other dynamic risk monitoring contexts.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Iowa > Story County > Ames (0.04)
- (4 more...)
- Overview (0.92)
- Research Report > New Finding (0.46)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Consumer Health (0.87)
- Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.63)
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Texas (0.04)
- North America > United States > Wisconsin (0.04)
- (9 more...)
Leveraging Multi-Source Textural UGC for Neighbourhood Housing Quality Assessment: A GPT-Enhanced Framework
Hong, Qiyuan, Zhao, Huimin, Long, Ying
This study leverages GPT-4o to assess neighbourhood housing quality using multi-source textural user-generated content (UGC) from Dianping, Weibo, and the Government Message Board. The analysis involves filtering relevant texts, extracting structured evaluation units, and conducting sentiment scoring. A refined housing quality assessment system with 46 indicators across 11 categories was developed, highlighting an objective-subjective method gap and platform-specific differences in focus. GPT-4o outperformed rule-based and BERT models, achieving 92.5% accuracy in fine-tuned settings. The findings underscore the value of integrating UGC and GPT-driven analysis for scalable, resident-centric urban assessments, offering practical insights for policymakers and urban planners.
- Education (1.00)
- Government (0.90)
- Health & Medicine > Therapeutic Area (0.51)
Aligning Large Language Models with Implicit Preferences from User-Generated Content
Tan, Zhaoxuan, Li, Zheng, Liu, Tianyi, Wang, Haodong, Yun, Hyokun, Zeng, Ming, Chen, Pei, Zhang, Zhihan, Gao, Yifan, Wang, Ruijie, Nigam, Priyanka, Yin, Bing, Jiang, Meng
Learning from preference feedback is essential for aligning large language models (LLMs) with human values and improving the quality of generated responses. However, existing preference learning methods rely heavily on curated data from humans or advanced LLMs, which is costly and difficult to scale. In this work, we present PUGC, a novel framework that leverages implicit human Preferences in unlabeled User-Generated Content (UGC) to generate preference data. Although UGC is not explicitly created to guide LLMs in generating human-preferred responses, it often reflects valuable insights and implicit preferences from its creators that has the potential to address readers' questions. PUGC transforms UGC into user queries and generates responses from the policy model. The UGC is then leveraged as a reference text for response scoring, aligning the model with these implicit preferences. This approach improves the quality of preference data while enabling scalable, domain-specific alignment. Experimental results on Alpaca Eval 2 show that models trained with DPO and PUGC achieve a 9.37% performance improvement over traditional methods, setting a 35.93% state-of-the-art length-controlled win rate using Mistral-7B-Instruct. Further studies highlight gains in reward quality, domain-specific alignment effectiveness, robustness against UGC quality, and theory of mind capabilities. Our code and dataset are available at https://zhaoxuan.info/PUGC.github.io/
- Media > News (1.00)
- Information Technology > Security & Privacy (0.68)
Can Large Language Models Understand Internet Buzzwords Through User-Generated Content
Huang, Chen, Luo, Junkai, Wang, Xinzuo, Lei, Wenqiang, Lv, Jiancheng
The massive user-generated content (UGC) available in Chinese social media is giving rise to the possibility of studying internet buzzwords. In this paper, we study if large language models (LLMs) can generate accurate definitions for these buzzwords based on UGC as examples. Our work serves a threefold contribution. First, we introduce CHEER, the first dataset of Chinese internet buzzwords, each annotated with a definition and relevant UGC. Second, we propose a novel method, called RESS, to effectively steer the comprehending process of LLMs to produce more accurate buzzword definitions, mirroring the skills of human language learning. Third, with CHEER, we benchmark the strengths and weaknesses of various off-the-shelf definition generation methods and our RESS. Our benchmark demonstrates the effectiveness of RESS while revealing crucial shared challenges: over-reliance on prior exposure, underdeveloped inferential abilities, and difficulty identifying high-quality UGC to facilitate comprehension. We believe our work lays the groundwork for future advancements in LLM-based definition generation. Our dataset and code are available at https://github.com/SCUNLP/Buzzword.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (9 more...)
Safeguarding Marketing Research: The Generation, Identification, and Mitigation of AI-Fabricated Disinformation
Generative AI has ushered in the ability to generate content that closely mimics human contributions, introducing an unprecedented threat: Deployed en masse, these models can be used to manipulate public opinion and distort perceptions, resulting in a decline in trust towards digital platforms. This study contributes to marketing literature and practice in three ways. First, it demonstrates the proficiency of AI in fabricating disinformative user-generated content (UGC) that mimics the form of authentic content. Second, it quantifies the disruptive impact of such UGC on marketing research, highlighting the susceptibility of analytics frameworks to even minimal levels of disinformation. Third, it proposes and evaluates advanced detection frameworks, revealing that standard techniques are insufficient for filtering out AI-generated disinformation. We advocate for a comprehensive approach to safeguarding marketing research that integrates advanced algorithmic solutions, enhanced human oversight, and a reevaluation of regulatory and ethical frameworks. Our study seeks to serve as a catalyst, providing a foundation for future research and policy-making aimed at navigating the intricate challenges at the nexus of technology, ethics, and marketing.
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.67)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Data Science > Data Mining (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.88)
Gradient Estimation for Binary Latent Variables via Gradient Variance Clipping
Kunes, Russell Z., Yin, Mingzhang, Land, Max, Haviv, Doron, Pe'er, Dana, Tavaré, Simon
Gradient estimation is often necessary for fitting generative models with discrete latent variables, in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator \textit{bitflip}-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, \textit{unbiased gradient variance clipping} (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically prove that UGC has uniformly lower variance than DisARM. Empirically, we observe that UGC achieves the optimal value of the optimization objectives in toy experiments, discrete VAE training, and in a best subset selection problem.
Machine Translation for User-Generated Content
A specific use case worth exploring in this regard is MT for User Generated Content (UGC). Because of the speed with which UGC (comments, feedback, reviews) is being created and the corresponding costs of its professional translation, many organizations turn to MT. Popular examples of such companies are Skype (in addition to text translation, Microsoft developed the Automatic Speech Recognition (ASR) for audio speech translation in Skype) and Facebook. The social network is aiming to solve the challenge of fine-tuning each system relating to a specific language pair, using neural machine translation (NMT) and benefiting from various contexts for translations. One solution that tackles this issue is the technology developed by Language I/O. It takes into account the client's glossaries and TMs, selects the best MT engine output and then improves on the results using cultural intelligence and/or human linguists who compare machine translations post-facto to ensure that their MT Optimizer engine learns over time.
Using machine learning to yield useful market insight - Market Business News
Gauging consumer needs is essential in marketing. Focus groups, interviews and surveys are currently the most common means of gathering this data. But the process can be time-consuming and expensive. The advent of machine learning technology and artificial intelligence (AI) has sparked interest in using the technology to yield valuable insights into consumer wants. Researchers at MIT devised a method of efficiently identifying customer needs from user-generated content (UCG) with machine learning, according to a study published in Marketing Science.
- Questionnaire & Opinion Survey (0.59)
- Research Report > New Finding (0.41)