Goto

Collaborating Authors

 anorexia


Demo: Statistically Significant Results On Biases and Errors of LLMs Do Not Guarantee Generalizable Results

Liu, Jonathan, Qiu, Haoling, Lasko, Jonathan, Karakos, Damianos, Yarmohammadi, Mahsa, Dredze, Mark

arXiv.org Artificial Intelligence

Recent research has shown that hallucinations, omissions, and biases are prevalent in everyday use-cases of LLMs. However, chatbots used in medical contexts must provide consistent advice in situations where non-medical factors are involved, such as when demographic information is present. In order to understand the conditions under which medical chatbots fail to perform as expected, we develop an infrastructure that 1) automatically generates queries to probe LLMs and 2) evaluates answers to these queries using multiple LLM-as-a-judge setups and prompts. For 1), our prompt creation pipeline samples the space of patient demographics, histories, disorders, and writing styles to create realistic questions that we subsequently use to prompt LLMs. In 2), our evaluation pipeline provides hallucination and omission detection using LLM-as-a-judge as well as agentic workflows, in addition to LLM-as-a-judge treatment category detectors. As a baseline study, we perform two case studies on inter-LLM agreement and the impact of varying the answering and evaluation LLMs. We find that LLM annotators exhibit low agreement scores (average Cohen's Kappa $κ=0.118$), and only specific (answering, evaluation) LLM pairs yield statistically significant differences across writing styles, genders, and races. We recommend that studies using LLM evaluation use multiple LLMs as evaluators in order to avoid arriving at statistically significant but non-generalizable results, particularly in the absence of ground-truth data. We also suggest publishing inter-LLM agreement metrics for transparency. Our code and dataset are available here: https://github.com/BBN-E/medic-neurips-2025-demo.


Mental Disorder Classification via Temporal Representation of Text

Kumar, Raja, Maharaj, Kishan, Saxena, Ashita, Bhattacharyya, Pushpak

arXiv.org Artificial Intelligence

Mental disorders pose a global challenge, aggravated by the shortage of qualified mental health professionals. Mental disorder prediction from social media posts by current LLMs is challenging due to the complexities of sequential text data and the limited context length of language models. Current language model-based approaches split a single data instance into multiple chunks to compensate for limited context size. The predictive model is then applied to each chunk individually, and the most voted output is selected as the final prediction. This results in the loss of inter-post dependencies and important time variant information, leading to poor performance. We propose a novel framework which first compresses the large sequence of chronologically ordered social media posts into a series of numbers. We then use this time variant representation for mental disorder classification. We demonstrate the generalization capabilities of our framework by outperforming the current SOTA in three different mental conditions: depression, self-harm, and anorexia, with an absolute improvement of 5% in the F1 score. We investigate the situation where current data instances fall within the context length of language models and present empirical results highlighting the importance of temporal properties of textual data. Furthermore, we utilize the proposed framework for a cross-domain study, exploring commonalities across disorders and the possibility of inter-domain data usage.


Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts

Benítez-Andrades, José Alberto, García-Ordás, María Teresa, Russo, Mayra, Sakor, Ahmad, Rotger, Luis Daniel Fernandes, Vidal, Maria-Esther

arXiv.org Artificial Intelligence

Social networks are vital for information sharing, especially in the health sector for discussing diseases and treatments. These platforms, however, often feature posts as brief texts, posing challenges for Artificial Intelligence (AI) in understanding context. We introduce a novel hybrid approach combining community-maintained knowledge graphs (like Wikidata) with deep learning to enhance the categorization of social media posts. This method uses advanced entity recognizers and linkers (like Falcon 2.0) to connect short post entities to knowledge graphs. Knowledge graph embeddings (KGEs) and contextualized word embeddings (like BERT) are then employed to create rich, context-based representations of these posts. Our focus is on the health domain, particularly in identifying posts related to eating disorders (e.g., anorexia, bulimia) to aid healthcare providers in early diagnosis. We tested our approach on a dataset of 2,000 tweets about eating disorders, finding that merging word embeddings with knowledge graph information enhances the predictive models' reliability. This methodology aims to assist health experts in spotting patterns indicative of mental disorders, thereby improving early detection and accurate diagnosis for personalized medicine.


Animal study shows abnormal activity of brain circuit causes anorexia

#artificialintelligence

Researchers have found that genetically and pharmacologically restoring the normal activity of the brain circuit improved anorexia, opening the possibility of developing a treatment strategy for affected individuals in the future. Researchers at Baylor College of Medicine, Louisiana State University and collaborating institutions has discovered that abnormal activity in a particular brain circuit underlies anorexia in an animal model of the condition. Genetically and pharmacologically restoring the normal activity of the brain circuit improved the condition, opening the possibility of developing a treatment strategy for affected individuals in the future. Anorexia has no approved treatment, and the underlying causes is unclear. The study was recently published in Nature Neuroscience.


Genetic variants influence ability to read emotions

Daily Mail - Science & tech

While some people are innately tuned into the emotions of those around them, others struggle to tell what they are thinking simply by looking at them. And a new study suggests that these differences could be in our DNA. Researchers pinpointed specific genetic variants that influence our ability to read a person's emotions, by asking people to compete a test first derived 20 years ago. And the findings suggest that women are better at reading people's thoughts than men. If you want to test how good you are at reading emotions, scroll through the pictures below and choose which expression you think the eyes are displaying.


How AI can cause crime, wars, misery and enable abuse. -- PerErikGG.COM

#artificialintelligence

The AI algorithms that has recently the last decade taken over the internet, is specially constructed to keep people engaged. It's competition for costumers, taken to the logical extremes. When you watch a video on YouTube, search for products on the internet or chose a tv series in Netflix. The data used to show relevant content and products. This is big business, but what are the costs? By only showing the content relevant to the topic at hand that other people liked, you are artificially creating echo chamber.