Media
Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing
Zhang, Alice, Bertley, Callihan, Liang, Dawei, Thomaz, Edison
Social interactions play a crucial role in shaping human behavior, relationships, and societies. It encompasses various forms of communication, such as verbal conversation, non-verbal gestures, facial expressions, and body language. In this work, we develop a novel computational approach to detect a foundational aspect of human social interactions, in-person verbal conversations, by leveraging audio and inertial data captured with a commodity smartwatch in acoustically-challenging scenarios. To evaluate our approach, we conducted a lab study with 11 participants and a semi-naturalistic study with 24 participants. We analyzed machine learning and deep learning models with 3 different fusion methods, showing the advantages of fusing audio and inertial data to consider not only verbal cues but also non-verbal gestures in conversations. Furthermore, we perform a comprehensive set of evaluations across activities and sampling rates to demonstrate the benefits of multimodal sensing in specific contexts. Overall, our framework achieved 82.0$\pm$3.0% macro F1-score when detecting conversations in the lab and 77.2$\pm$1.8% in the semi-naturalistic setting.
NLP Meets the World: Toward Improving Conversations With the Public About Natural Language Processing Research
Recent developments in large language models (LLMs) have been accompanied by rapidly growing public interest in natural language processing (NLP). This attention is reflected by major news venues, which sometimes invite NLP researchers to share their knowledge and views with a wide audience. Recognizing the opportunities of the present, for both the research field and for individual researchers, this paper shares recommendations for communicating with a general audience about the capabilities and limitations of NLP. These recommendations cover three themes: vague terminology as an obstacle to public understanding, unreasonable expectations as obstacles to sustainable growth, and ethical failures as obstacles to continued support. Published NLP research and popular news coverage are cited to illustrate these themes with examples. The recommendations promote effective, transparent communication with the general public about NLP, in order to strengthen public understanding and encourage support for research.
CultureCLIP: Empowering CLIP with Cultural Awareness through Synthetic Images and Contextualized Captions
Huang, Yuchen, Fan, Zhiyuan, He, Zhitao, Polisetty, Sandeep, Li, Wenyan, Fung, Yi R.
Pretrained vision-language models (VLMs) such as CLIP excel in general multimodal comprehension but often struggle to capture nuanced, context-dependent visual cues. This makes it difficult to distinguish between similar-looking concepts with potentially different cultural meanings. Such deficiencies are mainly due to a limited amount of high-quality cultural data, contextual information, and the lack of negative examples that highlight subtle differences. To mitigate this, we design a data curation pipeline leveraging open-sourced VLMs and text-to-image models to construct CulTwin, a synthetic cultural dataset. This dataset consists of paired concept-caption-image triplets, where concepts visually resemble each other but are culturally different. Then, we fine-tune CLIP on CulTwin to develop CultureCLIP, which aligns cultural concepts with contextually enhanced captions and synthetic images through tailored contrastive learning. Experiments on culture-specific benchmarks show that CultureCLIP outperforms the base CLIP, achieving up to a notable 5.49% improvement in fine-grained concept recognition on certain tasks while preserving CLIP's original generalization ability, validating the effectiveness of our data synthesis and VLM backbone training paradigm in capturing subtle cultural distinctions.
Pink Floppy Disc and The Bitles: Embracing the future of AI music
Feedback is New Scientist's popular sideways look at the latest science and technology news. You can submit items you believe may amuse readers to Feedback by emailing feedback@newscientist.com Feedback has been dimly aware for a while that there is a slew of AI-generated music swamping platforms like Spotify. Our awareness was limited, we confess, because we are so old that we still prefer to listen to CDs. Still, we weren't too surprised when New Scientist's Timothy Revell told us about an indie rock band called The Velvet Sundown that appears to be entirely AI-generated, from their songs, which sound like the beige love-children of Coldplay and the Eagles, to their uncanny-valley Instagram photos, which look like rejected concept art from Daisy Jones & the Six.
Tiny cyborg beetles are built to save lives in real emergencies
Police forces around the world are adding AI-powered robots. In a groundbreaking fusion of nature and technology, researchers at the University of Queensland have developed remote-controlled beetles equipped with tiny, removable backpacks that could drastically reduce the time it takes to locate survivors in disaster zones. Also known as cyborg beetles, these hybrid helpers are part of an ambitious project to improve emergency response in situations like building collapses, earthquakes or industrial explosions. By combining natural mobility with simple controls, researchers are developing a faster, more flexible way to reach people in hard-to-access areas. A close-up of a cyborg beetle with mounted electronics.
Drone surveillance catches kids in dangerous high-speed stunt atop moving subway train in New York City
An NYPD drone captured four minors between the ages of 12 and 16 riding on top of a train in the Bronx Thursday as it passed multiple stations at a high speed. Three teenagers and one 12-year-old boy were apprehended by police after an NYPD drone captured them riding on top of a train in New York City Thursday passing through multiple stations at a high speed. NYPD drone footage obtained by Fox News Digital shows the four subway surfers -- between the ages of 12 and 16 -- climbing up the side of the moving northbound 6 express train as it passed beneath the Westchester Avenue Bridge. The minors can then be seen standing up and forming a line, some of them jumping up and down and spreading their arms. NYPD drone footage obtained by Fox News Digital shows the four subway surfers -- between the ages of 12 and 16 -- climbing up the side of the moving northbound 6 express train as it passed beneath the Westchester Avenue Bridge.
PLEX: Perturbation-free Local Explanations for LLM-Based Text Classification
Rahulamathavan, Yogachandran, Farooq, Misbah, De Silva, Varuna
--Large Language Models (LLMs) excel in text classification, but their complexity hinders interpretability, making it difficult to understand the reasoning behind their predictions. Explainable AI (XAI) methods like LIME and SHAP offer local explanations by identifying influential words, but they rely on computationally expensive perturbations. These methods typically generate thousands of perturbed sentences and perform inferences on each, incurring a substantial computational burden, especially with LLMs. T o address this, we propose P erturbation-free L ocal Ex planation (PLEX), a novel method that leverages the contextual embeddings extracted from the LLM and a "Siamese network" style neural network trained to align with feature importance scores. This one-off training eliminates the need for subsequent perturbations, enabling efficient explanations for any new sentence. We demonstrate PLEX's effectiveness on four different classification tasks (sentiment, fake news, fake COVID-19 news and depression), showing more than 92% agreement with LIME and SHAP . Our evaluation using a "stress test" reveals that PLEX accurately identifies influential words, leading to a similar decline in classification accuracy as observed with LIME and SHAP when these words are removed. Notably, in some cases, PLEX demonstrates superior performance in capturing the impact of key features. PLEX dramatically accelerates explanation, reducing time and computational overhead by two and four orders of magnitude, respectively. This work offers a promising solution for explainable LLM-based text classification. ARGE language models (LLMs) have significantly advanced text classification, achieving state-of-the-art results in tasks like emotion recognition, sentiment analysis, topic categorization, and spam detection [1]. Powered by transformer architectures with millions or billions of parameters, they effectively capture complex linguistic patterns. However, the very complexity that enables their high performance also renders their internal workings opaque and difficult to interpret.
Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing
Ulmer, Dennis, Lorson, Alexandra, Titov, Ivan, Hardmeier, Christian
Human users increasingly rely on natural language interactions with large language models (LLMs) in order to receive help on a large variety of tasks and problems. However, the trustworthiness and perceived legitimacy of LLMs is undermined by the fact that their output is frequently stated in very confident terms, even when its accuracy is questionable. Therefore, there is a need to signal the confidence of the language model to a user in order to reap the benefits of human-machine collaboration and mitigate potential harms. Verbalized uncertainty is the expression of confidence with linguistic means, an approach that integrates perfectly into language-based interfaces. Nevertheless, most recent research in natural language processing (NLP) overlooks the nuances surrounding human uncertainty communication and the data biases that influence machine uncertainty communication. We argue for anthropomimetic uncertainty, meaning that intuitive and trustworthy uncertainty communication requires a degree of linguistic authenticity and personalization to the user, which could be achieved by emulating human communication. We present a thorough overview over the research in human uncertainty communication, survey ongoing research, and perform additional analyses to demonstrate so-far overlooked biases in verbalized uncertainty. We conclude by pointing out unique factors in human-machine communication of uncertainty and deconstruct anthropomimetic uncertainty into future research directions for NLP.
Alleviating User-Sensitive bias with Fair Generative Sequential Recommendation Model
Liu, Yang, Wu, Feng, Zhu, Xuefang
Recommendation fairness has recently attracted much attention. In the real world, recommendation systems are driven by user behavior, and since users with the same sensitive feature (e.g., gender and age) tend to have the same patterns, recommendation models can easily capture the strong correlation preference of sensitive features and thus cause recommendation unfairness. Diffusion model (DM) as a new generative model paradigm has achieved great success in recommendation systems. DM's ability to model uncertainty and represent diversity, and its modeling mechanism has a high degree of adaptability with the real-world recommendation process with bias. Therefore, we use DM to effectively model the fairness of recommendation and enhance the diversity. This paper proposes a FairGENerative sequential Recommendation model based on DM, FairGENRec. In the training phase, we inject random noise into the original distribution under the guidance of the sensitive feature recognition model, and a sequential denoise model is designed for the reverse reconstruction of items. Simultaneously, recommendation fairness modeling is completed by injecting multi-interests representational information that eliminates the bias of sensitive user features into the generated results. In the inference phase, the model obtains the noise in the form of noise addition by using the history interactions which is followed by reverse iteration to reconstruct the target item representation. Finally, our extensive experiments on three datasets demonstrate the dual enhancement effect of FairGENRec on accuracy and fairness, while the statistical analysis of the cases visualizes the degree of improvement on the fairness of the recommendation.
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
Zhang, Xinnong, Lin, Jiayu, Mou, Xinyi, Yang, Shiyue, Liu, Xiawei, Sun, Libo, Lyu, Hanjia, Yang, Yihang, Qi, Weihong, Chen, Yue, Li, Guanying, Yan, Ling, Hu, Yao, Chen, Siming, Wang, Yu, Huang, Xuanjing, Luo, Jiebo, Tang, Shiping, Wu, Libo, Zhou, Baohua, Wei, Zhongyu
Social simulation is transforming traditional social science research by modeling human behavior through interactions between virtual individuals and their environments. With recent advances in large language models (LLMs), this approach has shown growing potential in capturing individual differences and predicting group behaviors. However, existing methods face alignment challenges related to the environment, target users, interaction mechanisms, and behavioral patterns. To this end, we introduce SocioVerse, an LLM-agent-driven world model for social simulation. Our framework features four powerful alignment components and a user pool of 10 million real individuals. To validate its effectiveness, we conducted large-scale simulation experiments across three distinct domains: politics, news, and economics. Results demonstrate that SocioVerse can reflect large-scale population dynamics while ensuring diversity, credibility, and representativeness through standardized procedures and minimal manual adjustments.