cohen
A Creepy New Device Is Spreading Across School Campuses. Students Are Being Harassed. Teachers Are Sounding the Alarm.
Users Meta's A.I. Smart Glasses Are Wreaking Havoc in Schools Across the Country. It's Only Going to Get Worse. As the discreet wearable cameras become more popular, students are saying they feel constantly watched and harassed--and professors are reshaping their classrooms in response. Joziah was tabling on campus for his peer mentor job at the end of last semester at Florida State University when he noticed something strange happening across the quad: A trio of men, wearing Meta AI glasses, were stopping every young woman who passed by and asking them for their social media contacts. "I recognized them from TikTok, because they're kind of big, especially in Miami," the 19-year-old told me.
- Marketing (1.00)
- Education > Educational Setting (0.94)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.94)
- Information Technology > Services (0.64)
Ben & Jerry's row deepens as three board members removed
Ben & Jerry's row deepens as three board members removed Three members of Ben & Jerry's independent board will no longer be eligible to serve in their roles, after the ice cream company introduced a new set of governance practices. These include a nine-year limit set on board members' terms. Chair Anuradha Mittal, who earlier said she had no plans to resign under pressure, is among those affected. The move was criticised by the company's co-founder Ben Cohen, who called it a blatant power grab designed to strip the board of legal authority and independence. His remarks are the latest in a long-running row between Ben and Jerry's and its owner over the Cherry Garcia maker's social activism and the continued independence of its board.
- North America > Central America (0.15)
- Oceania > Australia (0.06)
- North America > United States > Vermont (0.06)
- (16 more...)
- Law (0.69)
- Leisure & Entertainment > Sports (0.44)
- Media > Film (0.30)
- Government > Regional Government (0.30)
Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection
Piot, Paloma, Otero, David, Martín-Rodilla, Patricia, Parapar, Javier
Hate speech spreads widely online, harming individuals and communities, making automatic detection essential for large-scale moderation, yet detecting it remains difficult. Part of the challenge lies in subjectivity: what one person flags as hate speech, another may see as benign. Traditional annotation agreement metrics, such as Cohen's $κ$, oversimplify this disagreement, treating it as an error rather than meaningful diversity. Meanwhile, Large Language Models (LLMs) promise scalable annotation, but prior studies demonstrate that they cannot fully replace human judgement, especially in subjective tasks. In this work, we reexamine LLM reliability using a subjectivity-aware framework, cross-Rater Reliability (xRR), revealing that even under fairer lens, LLMs still diverge from humans. Yet this limitation opens an opportunity: we find that LLM-generated annotations can reliably reflect performance trends across classification models, correlating with human evaluations. We test this by examining whether LLM-generated annotations preserve the relative ordering of model performance derived from human evaluation (i.e. whether models ranked as more reliable by human annotators preserve the same order when evaluated with LLM-generated labels). Our results show that, although LLMs differ from humans at the instance level, they reproduce similar ranking and classification patterns, suggesting their potential as proxy evaluators. While not a substitute for human annotators, they might serve as a scalable proxy for evaluation in subjective NLP tasks.
- Europe > Austria > Vienna (0.14)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Ben & Jerry's brand could be destroyed, says co-founder
Ben & Jerry's brand could be destroyed, says co-founder Ben & Jerry's will be destroyed as a brand if it remains with parent company Magnum, the company's co-founder Ben Cohen has told the BBC. His remarks are the latest in a long-running spat between the ice cream brand and its parent company over its ability to express its social activism and the continued independence of its board. The comments came on the day that the Magnum Ice Cream Company (TMICC) started trading on the European stock market - spinning off from owner Unilever. A spokesperson for Magnum said the firm wanted to build and strengthen Ben & Jerry's powerful, non-partisan values-based position in the world. Ben & Jerry's was sold to Unilever in 2000 in a deal which allowed it to retain an independent board and the right to make decisions about its social mission.
- Asia > Middle East > Israel (0.16)
- North America > Central America (0.15)
- Asia > China (0.06)
- (18 more...)
- Consumer Products & Services (1.00)
- Media > Film (0.48)
- Government > Regional Government (0.48)
- Leisure & Entertainment > Sports (0.43)
Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage
Platt, Nolan, Luchs, Ethan, Nizamani, Sehrish
Usability evaluations are essential for ensuring that modern interfaces meet user needs, yet traditional heuristic evaluations by human experts can be time-consuming and subjective, especially early in development. This paper investigates whether large language models (LLMs) can provide reliable and consistent heuristic assessments at the development stage. By applying Jakob Nielsen's ten usability heuristics to thirty open-source websites, we generated over 850 heuristic evaluations in three independent evaluations per site using a pipeline of OpenAI's GPT-4o. For issue detection, the model demonstrated moderate consistency, with an average pairwise Cohen's Kappa of 0.50 and an exact agreement of 84%. Severity judgments showed more variability: weighted Cohen's Kappa averaged 0.63, but exact agreement was just 56%, and Krippendorff's Alpha was near zero. These results suggest that while GPT-4o can produce internally consistent evaluations, especially for identifying the presence of usability issues, its ability to judge severity varies and requires human oversight in practice. Our findings highlight the feasibility and limitations of using LLMs for early-stage, automated usability testing, and offer a foundation for improving consistency in automated User Experience (UX) evaluation. To the best of our knowledge, our work provides one of the first quantitative inter-rater reliability analyses of automated heuristic evaluation and highlights methods for improving model consistency.
A robust generalizable device-agnostic deep learning model for sleep-wake determination from triaxial wrist accelerometry
Montazeri, Nasim, Yang, Stone, Luszczynski, Dominik, Zhang, John, Gurve, Dharmendra, Centen, Andrew, Goubran, Maged, Lim, Andrew
Study Objectives: Wrist accelerometry is widely used for inferring sleep-wake state. Previous works demonstrated poor wake detection, without cross-device generalizability and validation in different age range and sleep disorders. We developed a robust deep learning model for to detect sleep-wakefulness from triaxial accelerometry and evaluated its validity across three devices and in a large adult population spanning a wide range of ages with and without sleep disorders. Methods: We collected wrist accelerometry simultaneous to polysomnography (PSG) in 453 adults undergoing clinical sleep testing at a tertiary care sleep laboratory, using three devices. We extracted features in 30-second epochs and trained a 3-class model to detect wake, sleep, and sleep with arousals, which was then collapsed into wake vs. sleep using a decision tree. To enhance wake detection, the model was specifically trained on randomly selected subjects with low sleep efficiency and/or high arousal index from one device recording and then tested on the remaining recordings. Results: The model showed high performance with F1 Score of 0.86, sensitivity (sleep) of 0.87, and specificity (wakefulness) of 0.78, and significant and moderate correlation to PSG in predicting total sleep time (R=0.69) and sleep efficiency (R=0.63). Model performance was robust to the presence of sleep disorders, including sleep apnea and periodic limb movements in sleep, and was consistent across all three models of accelerometer. Conclusions: We present a deep model to detect sleep-wakefulness from actigraphy in adults with relative robustness to the presence of sleep disorders and generalizability across diverse commonly used wrist accelerometers.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Israel (0.04)
- Oceania > Australia > Victoria (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.71)
- Health & Medicine > Therapeutic Area > Sleep (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.47)
Generalizable and Efficient Automated Scoring with a Knowledge-Distilled Multi-Task Mixture-of-Experts
Fang, Luyang, Wang, Tao, Ma, Ping, Zhai, Xiaoming
Automated scoring of written constructed responses typically relies on separate models per task, straining computational resources, storage, and maintenance in real-world education settings. We propose UniMoE-Guided, a knowledge-distilled multi-task Mixture-of-Experts (MoE) approach that transfers expertise from multiple task-specific large models (teachers) into a single compact, deployable model (student). The student combines (i) a shared encoder for cross-task representations, (ii) a gated MoE block that balances shared and task-specific processing, and (iii) lightweight task heads. Trained with both ground-truth labels and teacher guidance, the student matches strong task-specific models while being far more efficient to train, store, and deploy. Beyond efficiency, the MoE layer improves transfer and generalization: experts develop reusable skills that boost cross-task performance and enable rapid adaptation to new tasks with minimal additions and tuning. On nine NGSS-aligned science-reasoning tasks (seven for training/evaluation and two held out for adaptation), UniMoE-Guided attains performance comparable to per-task models while using $\sim$6$\times$ less storage than maintaining separate students, and $87\times$ less than the 20B-parameter teacher. The method offers a practical path toward scalable, reliable, and resource-efficient automated scoring for classroom and large-scale assessment systems.
- North America > United States (0.28)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Education > Assessment & Standards (1.00)
- Education > Educational Setting (0.93)
- Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.91)
Testing Hypotheses from the Social Approval Theory of Online Hate: An Analysis of 110 Million Messages from Parler
Markowitz, David M., Taylor, Samuel Hardman
We examined how online hate is motivated by receiving social approval via Walther's (2024) social approval theory of online hate, which argues (H1a) more signals of social approval on hate messages predicts more subsequent hate messages, and (H1b) as social approval increases, hate speech becomes more extreme. Using 110 million messages from Parler (2018-2021), we observed the number of upvotes received on a hate speech post was unassociated with hate speech in one's next post and during the next month, three-months, and six-months. The number of upvotes received on (extreme) hate speech comments, however, was positively associated with (extreme) hate speech during the next week, month, three-months, and six-months. Between-person effects revealed an average positive relationship between social approval and hate speech production at all time intervals. For comments, social approval linked more strongly to online hate than social disapproval. Social approval is a critical mechanism facilitating online hate propagation.
- Asia > China (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Michigan > Ingham County > Lansing (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (0.93)
ReviewGraph: A Knowledge Graph Embedding Based Framework for Review Rating Prediction with Sentiment Features
de Vink, A. J. W., Amat-Lefort, Natalia, Han, Lifeng
In the hospitality industry, understanding the factors that drive customer review ratings is critical for improving guest satisfaction and business performance. This work proposes ReviewGraph for Review Rating Prediction (RRP), a novel framework that transforms textual customer reviews into knowledge graphs by extracting (subject, predicate, object) triples and associating sentiment scores. Using graph embeddings (Node2Vec) and sentiment features, the framework predicts review rating scores through machine learning classifiers. We compare ReviewGraph performance with traditional NLP baselines (such as Bag of Words, TF-IDF, and Word2Vec) and large language models (LLMs), evaluating them in the HotelRec dataset. In comparison to the state of the art literature, our proposed model performs similar to their best performing model but with lower computational cost (without ensemble). While ReviewGraph achieves comparable predictive performance to LLMs and outperforms baselines on agreement-based metrics such as Cohen's Kappa, it offers additional advantages in interpretability, visual exploration, and potential integration into Retrieval-Augmented Generation (RAG) systems. This work highlights the potential of graph-based representations for enhancing review analytics and lays the groundwork for future research integrating advanced graph neural networks and fine-tuned LLM-based extraction methods. We will share ReviewGraph output and platform open-sourced on our GitHub page https://github.com/aaronlifenghan/ReviewGraph
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Europe > Middle East > Cyprus > Limassol > Limassol (0.04)
- (2 more...)
AttentiveGRUAE: An Attention-Based GRU Autoencoder for Temporal Clustering and Behavioral Characterization of Depression from Wearable Data
Soley, Nidhi, Patel, Vishal M, Taylor, Casey O
In this study, we present AttentiveGRUAE, a novel attention-based gated recurrent unit (GRU) autoencoder designed for temporal clustering and prediction of outcome from longitudinal wearable data. Our model jointly optimizes three objectives: (1) learning a compact latent representation of daily behavioral features via sequence reconstruction, (2) predicting end-of-period depression rate through a binary classification head, and (3) identifying behavioral subtypes through Gaussian Mixture Model (GMM) based soft clustering of learned embeddings. We evaluate AttentiveGRUAE on longitudinal sleep data from 372 participants (GLOBEM 2018-2019), and it demonstrates superior performance over baseline clustering, domain-aligned self-supervised, and ablated models in both clustering quality (silhouette score = 0.70 vs 0.32-0.70) and depression classification (AUC = 0.74 vs 0.50-0.67). Additionally, external validation on cross-year cohorts from 332 participants (GLOBEM 2020-2021) confirms cluster reproducibility (silhouette score = 0.63, AUC = 0.61) and stability. We further perform subtype analysis and visualize temporal attention, which highlights sleep-related differences between clusters and identifies salient time windows that align with changes in sleep regularity, yielding clinically interpretable explanations of risk.
- North America > United States > Maryland > Baltimore (0.05)
- Asia > Nepal (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.94)