Goto

Collaborating Authors

 tutorial


China's OpenClaw Boom Is a Gold Rush for AI Companies

WIRED

China's OpenClaw Boom Is a Gold Rush for AI Companies Hype around the open source agent is driving people to rent cloud servers and buy AI subscriptions just to try it, creating a windfall for tech companies. George Zhang thought OpenClaw could make him rich, even though he didn't really understand how the viral AI agent software worked. But he saw a video of a Chinese social media influencer demonstrating how it could be deployed to manage stock portfolios and make investment decisions autonomously. Zhang, who works in cross-border ecommerce in the Chinese city of Xiamen, was intrigued enough that he decided to try installing OpenClaw in late February. Zhang is one of the many people in China who got swept up in the craze over OpenClaw recently.




TheUtilityofExplainableAIinAdHoc Human-MachineTeamingSupplmentary

Neural Information Processing Systems

The participant istold thatthecobot will place extra resources into the chest for the human and that the cobot will not help the human build. The participant is also informed on how to share tools with the cobot and of all possible cobot behaviors.



An introduction to science communication at #AAAI2026

AIHub

We're pleased to announce that we will be giving an introduction to science communication for AI researchers at AAAI this year. This will be held on Wednesday 21 January from 13:00 - 14:30. The session is part of the Undergraduate Consortium programme. However, if you are attending the conference and fancy finding out how you can communicate your research to a general audience in different formats, then you are more than welcome to join us. The session will comprise a talk, a Q&A, and the opportunity to try some of the activities presented in the tutorial. You will have the opportunity to receive advice on any science communication ideas or questions you have.


HealthcareNLP: where are we and what is next?

Han, Lifeng, Rayson, Paul, Verberne, Suzan, Moore, Andrew, Nenadic, Goran

arXiv.org Artificial Intelligence

This proposed tutorial focuses on Healthcare Domain Applications of NLP, what we have achieved around HealthcareNLP, and the challenges that lie ahead for the future. Existing reviews in this domain either overlook some important tasks, such as synthetic data generation for addressing privacy concerns, or explainable clinical NLP for improved integration and implementation, or fail to mention important methodologies, including retrieval augmented generation and the neural symbolic integration of LLMs and KGs. In light of this, the goal of this tutorial is to provide an introductory overview of the most important sub-areas of a patient- and resource-oriented HealthcareNLP, with three layers of hierarchy: data/resource layer: annotation guidelines, ethical approvals, governance, synthetic data; NLP-Eval layer: NLP tasks such as NER, RE, sentiment analysis, and linking/coding with categorised methods, leading to explainable HealthAI; patients layer: Patient Public Involvement and Engagement (PPIE), health literacy, translation, simplification, and summarisation (also NLP tasks), and shared decision-making support. A hands-on session will be included in the tutorial for the audience to use HealthcareNLP applications. The target audience includes NLP practitioners in the healthcare application domain, NLP researchers who are interested in domain applications, healthcare researchers, and students from NLP fields. The type of tutorial is "Introductory to CL/NLP topics (HealthcareNLP)" and the audience does not need prior knowledge to attend this. Tutorial materials: https://github.com/4dpicture/HealthNLP


Spoken Conversational Agents with Large Language Models

Yang, Chao-Han Huck, Stolcke, Andreas, Heck, Larry

arXiv.org Artificial Intelligence

Building on this, we will examine joint text-speech pre-training (Chiu et al., 2022; Bar-rault et al., 2023; Chen et al., 2022) methods, This section will provide a comprehensive look at how state-of-the-art voice-interfaced LLMs (Reid et al., 2024; Chu et al., Current Trends The current work in AI virtual assistants builds upon the voice-only systems of the last decade by leveraging LLMs to significantly improve the coverage and robustness of the spoken language understanding and dialogue state tracking components, in addition to substantial advancements in spoken language generation. It highlights recent advancements in multi-turn dialogue systems, encompassing both LLM-based open-domain dialogue (ODD) and task-oriented dialogue (TOD) systems, as well as relevant datasets and evaluation metrics.


Watch and Learn: Learning to Use Computers from Online Videos

Song, Chan Hee, Song, Yiwen, Goyal, Palash, Su, Yu, Riva, Oriana, Palangi, Hamid, Pfister, Tomas

arXiv.org Artificial Intelligence

Computer-using agents (CUAs) must plan task workflows across diverse and evolving applications, yet progress is limited by the lack of large-scale, high-quality training data. Existing datasets are narrow, static, and costly to annotate, while synthetic data often yields oversimplified or misaligned behaviors. We present Watch & Learn (W&L), a framework that converts readily available Internet videos of human computer use into executable UI trajectories at scale. Instead of directly generating actions or relying on handcrafted heuristics, we cast trajectory annotation as an inverse dynamics problem that predicts user actions from consecutive screen states, which simplifies learning and generalizes across domains. Through a task-aware retrieval and labeling pipeline, W&L yields over 53K high-quality trajectories that enhance CUAs both as in-context exemplars and as supervised training data. On OSWorld, it consistently improves general-purpose and specialized CUAs, while on WindowsAgentArena it achieves state-of-the-art performance among 7B-scale models under the 15-step limit. These results show that web-scale human demonstration videos can serve as a practical and scalable foundation for advancing real-world CUAs.


Spatio-Temporal Trajectory Foundation Model - Recent Advances and Future Directions

Yang, Sean Bin, Sun, Ying, Cheng, Yunyao, Lin, Yan, Torp, Kristian, Hu, Jilin

arXiv.org Artificial Intelligence

Foundation models (FMs) have emerged as a powerful paradigm, enabling a diverse range of data analytics and knowledge discovery tasks across scientific fields. Inspired by the success of FMs, particularly large language models, researchers have recently begun to explore spatio-temporal foundation models (STFMs) to improve adaptability and generalization across a wide spectrum of spatio-temporal (ST) tasks. Despite rapid progress, a systematic investigation of trajectory foundation models (TFMs), a crucial subclass of STFMs, is largely lacking. This tutorial addresses this gap by offering a comprehensive overview of recent advances in TFMs, including a taxonomy of existing methodologies and a critical analysis of their strengths and limitations. In addition, the tutorial highlights open challenges and outlines promising research directions to advance spatio-temporal general intelligence through the development of robust, responsible, and transferable TFMs.