AITopics | cohen

Collaborating Authors

cohen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Creepy New Device Is Spreading Across School Campuses. Students Are Being Harassed. Teachers Are Sounding the Alarm.

SlateFeb-12-2026, 10:50:00 GMT

Users Meta's A.I. Smart Glasses Are Wreaking Havoc in Schools Across the Country. It's Only Going to Get Worse. As the discreet wearable cameras become more popular, students are saying they feel constantly watched and harassed--and professors are reshaping their classrooms in response. Joziah was tabling on campus for his peer mentor job at the end of last semester at Florida State University when he noticed something strange happening across the quad: A trio of men, wearing Meta AI glasses, were stopping every young woman who passed by and asking them for their social media contacts. "I recognized them from TikTok, because they're kind of big, especially in Miami," the 19-year-old told me.

advertisement, artificial intelligence, social media, (13 more...)

Slate

Country: North America > United States > Texas (0.05)

Industry:

Marketing (1.00)
Education > Educational Setting (0.94)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.94)
Information Technology > Services (0.64)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Ben & Jerry's row deepens as three board members removed

BBC NewsDec-16-2025, 04:05:54 GMT

Ben & Jerry's row deepens as three board members removed Three members of Ben & Jerry's independent board will no longer be eligible to serve in their roles, after the ice cream company introduced a new set of governance practices. These include a nine-year limit set on board members' terms. Chair Anuradha Mittal, who earlier said she had no plans to resign under pressure, is among those affected. The move was criticised by the company's co-founder Ben Cohen, who called it a blatant power grab designed to strip the board of legal authority and independence. His remarks are the latest in a long-running row between Ben and Jerry's and its owner over the Cherry Garcia maker's social activism and the continued independence of its board.

artificial intelligence, ben & jerry, unilever, (11 more...)

BBC News

Country:

North America > Central America (0.15)
Oceania > Australia (0.06)
North America > United States > Vermont (0.06)
(16 more...)

Industry:

Law (0.69)
Leisure & Entertainment > Sports (0.44)
Media > Film (0.30)
Government > Regional Government (0.30)

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection

Piot, Paloma, Otero, David, Martín-Rodilla, Patricia, Parapar, Javier

arXiv.org Artificial IntelligenceDec-11-2025

Hate speech spreads widely online, harming individuals and communities, making automatic detection essential for large-scale moderation, yet detecting it remains difficult. Part of the challenge lies in subjectivity: what one person flags as hate speech, another may see as benign. Traditional annotation agreement metrics, such as Cohen's $κ$, oversimplify this disagreement, treating it as an error rather than meaningful diversity. Meanwhile, Large Language Models (LLMs) promise scalable annotation, but prior studies demonstrate that they cannot fully replace human judgement, especially in subjective tasks. In this work, we reexamine LLM reliability using a subjectivity-aware framework, cross-Rater Reliability (xRR), revealing that even under fairer lens, LLMs still diverge from humans. Yet this limitation opens an opportunity: we find that LLM-generated annotations can reliably reflect performance trends across classification models, correlating with human evaluations. We test this by examining whether LLM-generated annotations preserve the relative ordering of model performance derived from human evaluation (i.e. whether models ranked as more reliable by human annotators preserve the same order when evaluated with LLM-generated labels). Our results show that, although LLMs differ from humans at the instance level, they reproduce similar ranking and classification patterns, suggesting their potential as proxy evaluators. While not a substitute for human annotators, they might serve as a scalable proxy for evaluation in subjective NLP tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.09662

Country:

Europe > Austria > Vienna (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Ben & Jerry's brand could be destroyed, says co-founder

BBC NewsDec-9-2025, 08:31:08 GMT

Ben & Jerry's brand could be destroyed, says co-founder Ben & Jerry's will be destroyed as a brand if it remains with parent company Magnum, the company's co-founder Ben Cohen has told the BBC. His remarks are the latest in a long-running spat between the ice cream brand and its parent company over its ability to express its social activism and the continued independence of its board. The comments came on the day that the Magnum Ice Cream Company (TMICC) started trading on the European stock market - spinning off from owner Unilever. A spokesperson for Magnum said the firm wanted to build and strengthen Ben & Jerry's powerful, non-partisan values-based position in the world. Ben & Jerry's was sold to Unilever in 2000 in a deal which allowed it to retain an independent board and the right to make decisions about its social mission.

artificial intelligence, ben & jerry, magnum, (12 more...)

BBC News

Country:

Asia > Middle East > Israel (0.16)
North America > Central America (0.15)
Asia > China (0.06)
(18 more...)

Industry:

Consumer Products & Services (1.00)
Media > Film (0.48)
Government > Regional Government (0.48)
Leisure & Entertainment > Sports (0.43)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

Catching UX Flaws in Code: Leveraging LLMs to Identify Usability Flaws at the Development Stage

Platt, Nolan, Luchs, Ethan, Nizamani, Sehrish

arXiv.org Artificial IntelligenceDec-5-2025

Usability evaluations are essential for ensuring that modern interfaces meet user needs, yet traditional heuristic evaluations by human experts can be time-consuming and subjective, especially early in development. This paper investigates whether large language models (LLMs) can provide reliable and consistent heuristic assessments at the development stage. By applying Jakob Nielsen's ten usability heuristics to thirty open-source websites, we generated over 850 heuristic evaluations in three independent evaluations per site using a pipeline of OpenAI's GPT-4o. For issue detection, the model demonstrated moderate consistency, with an average pairwise Cohen's Kappa of 0.50 and an exact agreement of 84%. Severity judgments showed more variability: weighted Cohen's Kappa averaged 0.63, but exact agreement was just 56%, and Krippendorff's Alpha was near zero. These results suggest that while GPT-4o can produce internally consistent evaluations, especially for identifying the presence of usability issues, its ability to judge severity varies and requires human oversight in practice. Our findings highlight the feasibility and limitations of using LLMs for early-stage, automated usability testing, and offer a foundation for improving consistency in automated User Experience (UX) evaluation. To the best of our knowledge, our work provides one of the first quantitative inter-rater reliability analyses of automated heuristic evaluation and highlights methods for improving model consistency.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/VL-HCC65237.2025.00024

2512.04262

Country: North America > United States > Virginia > Montgomery County > Blacksburg (0.05)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A robust generalizable device-agnostic deep learning model for sleep-wake determination from triaxial wrist accelerometry

Montazeri, Nasim, Yang, Stone, Luszczynski, Dominik, Zhang, John, Gurve, Dharmendra, Centen, Andrew, Goubran, Maged, Lim, Andrew

arXiv.org Artificial IntelligenceDec-2-2025

Study Objectives: Wrist accelerometry is widely used for inferring sleep-wake state. Previous works demonstrated poor wake detection, without cross-device generalizability and validation in different age range and sleep disorders. We developed a robust deep learning model for to detect sleep-wakefulness from triaxial accelerometry and evaluated its validity across three devices and in a large adult population spanning a wide range of ages with and without sleep disorders. Methods: We collected wrist accelerometry simultaneous to polysomnography (PSG) in 453 adults undergoing clinical sleep testing at a tertiary care sleep laboratory, using three devices. We extracted features in 30-second epochs and trained a 3-class model to detect wake, sleep, and sleep with arousals, which was then collapsed into wake vs. sleep using a decision tree. To enhance wake detection, the model was specifically trained on randomly selected subjects with low sleep efficiency and/or high arousal index from one device recording and then tested on the remaining recordings. Results: The model showed high performance with F1 Score of 0.86, sensitivity (sleep) of 0.87, and specificity (wakefulness) of 0.78, and significant and moderate correlation to PSG in predicting total sleep time (R=0.69) and sleep efficiency (R=0.63). Model performance was robust to the presence of sleep disorders, including sleep apnea and periodic limb movements in sleep, and was consistent across all three models of accelerometer. Conclusions: We present a deep model to detect sleep-wakefulness from actigraphy in adults with relative robustness to the presence of sleep disorders and generalizability across diverse commonly used wrist accelerometers.

accelerometer, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.01986

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Israel (0.04)
Oceania > Australia > Victoria (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.71)

Industry:

Health & Medicine > Therapeutic Area > Sleep (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalizable and Efficient Automated Scoring with a Knowledge-Distilled Multi-Task Mixture-of-Experts

Fang, Luyang, Wang, Tao, Ma, Ping, Zhai, Xiaoming

arXiv.org Machine LearningNov-25-2025

Automated scoring of written constructed responses typically relies on separate models per task, straining computational resources, storage, and maintenance in real-world education settings. We propose UniMoE-Guided, a knowledge-distilled multi-task Mixture-of-Experts (MoE) approach that transfers expertise from multiple task-specific large models (teachers) into a single compact, deployable model (student). The student combines (i) a shared encoder for cross-task representations, (ii) a gated MoE block that balances shared and task-specific processing, and (iii) lightweight task heads. Trained with both ground-truth labels and teacher guidance, the student matches strong task-specific models while being far more efficient to train, store, and deploy. Beyond efficiency, the MoE layer improves transfer and generalization: experts develop reusable skills that boost cross-task performance and enable rapid adaptation to new tasks with minimal additions and tuning. On nine NGSS-aligned science-reasoning tasks (seven for training/evaluation and two held out for adaptation), UniMoE-Guided attains performance comparable to per-task models while using $\sim$6$\times$ less storage than maintaining separate students, and $87\times$ less than the 20B-parameter teacher. The method offers a practical path toward scalable, reliable, and resource-efficient automated scoring for classroom and large-scale assessment systems.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

2511.17601

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Education > Assessment & Standards (1.00)
Education > Educational Setting (0.93)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Testing Hypotheses from the Social Approval Theory of Online Hate: An Analysis of 110 Million Messages from Parler

Markowitz, David M., Taylor, Samuel Hardman

arXiv.org Artificial IntelligenceNov-24-2025

We examined how online hate is motivated by receiving social approval via Walther's (2024) social approval theory of online hate, which argues (H1a) more signals of social approval on hate messages predicts more subsequent hate messages, and (H1b) as social approval increases, hate speech becomes more extreme. Using 110 million messages from Parler (2018-2021), we observed the number of upvotes received on a hate speech post was unassociated with hate speech in one's next post and during the next month, three-months, and six-months. The number of upvotes received on (extreme) hate speech comments, however, was positively associated with (extreme) hate speech during the next week, month, three-months, and six-months. Between-person effects revealed an average positive relationship between social approval and hate speech production at all time intervals. For comments, social approval linked more strongly to online hate than social disapproval. Social approval is a critical mechanism facilitating online hate propagation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.1081

Country:

Asia > China (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Michigan > Ingham County > Lansing (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.93)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Law > Civil Rights & Constitutional Law (0.93)
Law Enforcement & Public Safety (0.93)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

ReviewGraph: A Knowledge Graph Embedding Based Framework for Review Rating Prediction with Sentiment Features

de Vink, A. J. W., Amat-Lefort, Natalia, Han, Lifeng

arXiv.org Artificial IntelligenceNov-18-2025

In the hospitality industry, understanding the factors that drive customer review ratings is critical for improving guest satisfaction and business performance. This work proposes ReviewGraph for Review Rating Prediction (RRP), a novel framework that transforms textual customer reviews into knowledge graphs by extracting (subject, predicate, object) triples and associating sentiment scores. Using graph embeddings (Node2Vec) and sentiment features, the framework predicts review rating scores through machine learning classifiers. We compare ReviewGraph performance with traditional NLP baselines (such as Bag of Words, TF-IDF, and Word2Vec) and large language models (LLMs), evaluating them in the HotelRec dataset. In comparison to the state of the art literature, our proposed model performs similar to their best performing model but with lower computational cost (without ensemble). While ReviewGraph achieves comparable predictive performance to LLMs and outperforms baselines on agreement-based metrics such as Cohen's Kappa, it offers additional advantages in interpretability, visual exploration, and potential integration into Retrieval-Augmented Generation (RAG) systems. This work highlights the potential of graph-based representations for enhancing review analytics and lays the groundwork for future research integrating advanced graph neural networks and fine-tuned LLM-based extraction methods. We will share ReviewGraph output and platform open-sourced on our GitHub page https://github.com/aaronlifenghan/ReviewGraph

large language model, machine learning, oversampling 0, (21 more...)

arXiv.org Artificial Intelligence

2508.13953

Country:

South America > Uruguay > Maldonado > Maldonado (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Europe > Middle East > Cyprus > Limassol > Limassol (0.04)
(2 more...)

Genre: Research Report > New Finding (0.71)

Industry: Consumer Products & Services > Hotels (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

AttentiveGRUAE: An Attention-Based GRU Autoencoder for Temporal Clustering and Behavioral Characterization of Depression from Wearable Data

Soley, Nidhi, Patel, Vishal M, Taylor, Casey O

arXiv.org Artificial IntelligenceNov-17-2025

In this study, we present AttentiveGRUAE, a novel attention-based gated recurrent unit (GRU) autoencoder designed for temporal clustering and prediction of outcome from longitudinal wearable data. Our model jointly optimizes three objectives: (1) learning a compact latent representation of daily behavioral features via sequence reconstruction, (2) predicting end-of-period depression rate through a binary classification head, and (3) identifying behavioral subtypes through Gaussian Mixture Model (GMM) based soft clustering of learned embeddings. We evaluate AttentiveGRUAE on longitudinal sleep data from 372 participants (GLOBEM 2018-2019), and it demonstrates superior performance over baseline clustering, domain-aligned self-supervised, and ablated models in both clustering quality (silhouette score = 0.70 vs 0.32-0.70) and depression classification (AUC = 0.74 vs 0.50-0.67). Additionally, external validation on cross-year cohorts from 332 participants (GLOBEM 2020-2021) confirms cluster reproducibility (silhouette score = 0.63, AUC = 0.61) and stability. We further perform subtype analysis and visualize temporal attention, which highlights sleep-related differences between clusters and identifies salient time windows that align with changes in sleep regularity, yielding clinically interpretable explanations of risk.

artificial intelligence, depression, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.02558

Country:

North America > United States > Maryland > Baltimore (0.05)
Asia > Nepal (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback