sd score
Know When To Stop: A Study of Semantic Drift in Text Generation
Spataru, Ava, Hambro, Eric, Voita, Elena, Cancedda, Nicola
In this work, we explicitly show that modern LLMs tend to generate correct facts first, then "drift away" and generate incorrect facts later: this was occasionally observed but never properly measured. We develop a semantic drift score that measures the degree of separation between correct and incorrect facts in generated texts and confirm our hypothesis when generating Wikipedia-style biographies. This correct-then-incorrect generation pattern suggests that factual accuracy can be improved by knowing when to stop generation. Therefore, we explore the trade-off between information quantity and factual accuracy for several early stopping methods and manage to improve factuality by a large margin. We further show that reranking with semantic similarity can further improve these results, both compared to the baseline and when combined with early stopping. Finally, we try calling external API to bring the model back to the right generation path, but do not get positive results. Overall, our methods generalize and can be applied to any long-form text generation to produce more reliable information, by balancing trade-offs between factual accuracy, information quantity and computational cost.
- North America > United States > California (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > France > Corsica > Ajaccio (0.04)
- (12 more...)
- Media (0.68)
- Leisure & Entertainment > Sports > Rugby > Rugby League (0.46)
Predicting Depression and Anxiety: A Multi-Layer Perceptron for Analyzing the Mental Health Impact of COVID-19
Fong, David, Chu, Tianshu, Heflin, Matthew, Gu, Xiaosi, Seneviratne, Oshani
We introduce a multi-layer perceptron (MLP) called the COVID-19 Depression and Anxiety Predictor (CoDAP) to predict mental health trends, particularly anxiety and depression, during the COVID-19 pandemic. Our method utilizes a comprehensive dataset, which tracked mental health symptoms weekly over ten weeks during the initial COVID-19 wave (April to June 2020) in a diverse cohort of U.S. adults. This period, characterized by a surge in mental health symptoms and conditions, offers a critical context for our analysis. Our focus was to extract and analyze patterns of anxiety and depression through a unique lens of qualitative individual attributes using CoDAP. This model not only predicts patterns of anxiety and depression during the pandemic but also unveils key insights into the interplay of demographic factors, behavioral changes, and social determinants of mental health. These findings contribute to a more nuanced understanding of the complexity of mental health issues in times of global health crises, potentially guiding future early interventions.
- North America > United States (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
H_eval: A new hybrid evaluation metric for automatic speech recognition tasks
Sasindran, Zitha, Yelchuri, Harsha, Prabhakar, T. V., Rao, Supreeth
Many studies have examined the shortcomings of word error rate (WER) as an evaluation metric for automatic speech recognition (ASR) systems. Since WER considers only literal word-level correctness, new evaluation metrics based on semantic similarity such as semantic distance (SD) and BERTScore have been developed. However, we found that these metrics have their own limitations, such as a tendency to overly prioritise keywords. We propose H_eval, a new hybrid evaluation metric for ASR systems that considers both semantic correctness and error rate and performs significantly well in scenarios where WER and SD perform poorly. Due to lighter computation compared to BERTScore, it offers 49 times reduction in metric computation time. Furthermore, we show that H_eval correlates strongly with downstream NLP tasks. Also, to reduce the metric calculation time, we built multiple fast and lightweight models using distillation techniques
- North America > Canada > Ontario (0.05)
- North America > United States > Virginia > Fairfax County > Herndon (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation
Kim, Younghyun, Mo, Sangwoo, Kim, Minkyu, Lee, Kyungmin, Lee, Jaeho, Shin, Jinwoo
Biases in models pose a critical issue when deploying machine learning systems, but diagnosing them in an explainable manner can be challenging. To address this, we introduce the bias-to-text (B2T) framework, which uses language interpretation to identify and mitigate biases in vision models, such as image classifiers and text-to-image generative models. Our language descriptions of visual biases provide explainable forms that enable the discovery of novel biases and effective model debiasing. To achieve this, we analyze common keywords in the captions of mispredicted or generated images. Here, we propose novel score functions to avoid biases in captions by comparing the similarities between bias keywords and those images. Additionally, we present strategies to debias zero-shot classifiers and text-to-image diffusion models using the bias keywords from the B2T framework. We demonstrate the effectiveness of our framework on various image classification and generation tasks. For classifiers, we discover a new spurious correlation between the keywords "(sports) player" and "female" in Kaggle Face and improve the worst-group accuracy on Waterbirds by 11% through debiasing, compared to the baseline. For generative models, we detect and effectively prevent unfair (e.g., gender-biased) and unsafe (e.g., "naked") image generation.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > Haiti (0.04)
- Europe > Sweden (0.04)
- (9 more...)
- Health & Medicine (0.68)
- Materials (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)