Communications: Instructional Materials
Introduction to AI Safety, Ethics, and Society
Artificial Intelligence is rapidly embedding itself within militaries, economies, and societies, reshaping their very foundations. Given the depth and breadth of its consequences, it has never been more pressing to understand how to ensure that AI systems are safe, ethical, and have a positive societal impact. This book aims to provide a comprehensive approach to understanding AI risk. Our primary goals include consolidating fragmented knowledge on AI risk, increasing the precision of core ideas, and reducing barriers to entry by making content simpler and more comprehensible. The book has been designed to be accessible to readers from diverse backgrounds. You do not need to have studied AI, philosophy, or other such topics. The content is skimmable and somewhat modular, so that you can choose which chapters to read. We introduce mathematical formulas in a few places to specify claims more precisely, but readers should be able to understand the main points without these.
RoboCrowd: Scaling Robot Data Collection through Crowdsourcing
Mirchandani, Suvir, Yuan, David D., Burns, Kaylee, Islam, Md Sazzad, Zhao, Tony Z., Finn, Chelsea, Sadigh, Dorsa
In recent years, imitation learning from large-scale human demonstrations has emerged as a promising paradigm for training robot policies. However, the burden of collecting large quantities of human demonstrations is significant in terms of collection time and the need for access to expert operators. We introduce a new data collection paradigm, RoboCrowd, which distributes the workload by utilizing crowdsourcing principles and incentive design. RoboCrowd helps enable scalable data collection and facilitates more efficient learning of robot policies. We build RoboCrowd on top of ALOHA (Zhao et al. 2023) -- a bimanual platform that supports data collection via puppeteering -- to explore the design space for crowdsourcing in-person demonstrations in a public environment. We propose three classes of incentive mechanisms to appeal to users' varying sources of motivation for interacting with the system: material rewards, intrinsic interest, and social comparison. We instantiate these incentives through tasks that include physical rewards, engaging or challenging manipulations, as well as gamification elements such as a leaderboard. We conduct a large-scale, two-week field experiment in which the platform is situated in a university cafe. We observe significant engagement with the system -- over 200 individuals independently volunteered to provide a total of over 800 interaction episodes. Our findings validate the proposed incentives as mechanisms for shaping users' data quantity and quality. Further, we demonstrate that the crowdsourced data can serve as useful pre-training data for policies fine-tuned on expert demonstrations -- boosting performance up to 20% compared to when this data is not available. These results suggest the potential for RoboCrowd to reduce the burden of robot data collection by carefully implementing crowdsourcing and incentive design principles.
AutoGLM: Autonomous Foundation Agents for GUIs
Liu, Xiao, Qin, Bo, Liang, Dongzhu, Dong, Guang, Lai, Hanyu, Zhang, Hanchen, Zhao, Hanlin, Iong, Iat Long, Sun, Jiadai, Wang, Jiaqi, Gao, Junjie, Shan, Junjun, Liu, Kangning, Zhang, Shudan, Yao, Shuntian, Cheng, Siyi, Yao, Wentao, Zhao, Wenyi, Liu, Xinghan, Liu, Xinyi, Chen, Xinying, Yang, Xinyue, Yang, Yang, Xu, Yifan, Yang, Yu, Wang, Yujia, Xu, Yulin, Qi, Zehan, Dong, Yuxiao, Tang, Jie
We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation agents for autonomous control of digital devices through Graphical User Interfaces (GUIs). While foundation models excel at acquiring human knowledge, they often struggle with decision-making in dynamic real-world environments, limiting their progress toward artificial general intelligence. This limitation underscores the importance of developing foundation agents capable of learning through autonomous environmental interactions by reinforcing existing models. Focusing on Web Browser and Phone as representative GUI scenarios, we have developed AutoGLM as a practical foundation agent system for real-world GUI interactions. Our approach integrates a comprehensive suite of techniques and infrastructures to create deployable agent systems suitable for user delivery. Through this development, we have derived two key insights: First, the design of an appropriate "intermediate interface" for GUI control is crucial, enabling the separation of planning and grounding behaviors, which require distinct optimization for flexibility and accuracy respectively. Second, we have developed a novel progressive training framework that enables self-evolving online curriculum reinforcement learning for AutoGLM. Our evaluations demonstrate AutoGLM's effectiveness across multiple domains. For web browsing, AutoGLM achieves a 55.2% success rate on VAB-WebArena-Lite (improving to 59.1% with a second attempt) and 96.2% on OpenTable evaluation tasks. In Android device control, AutoGLM attains a 36.2% success rate on AndroidLab (VAB-Mobile) and 89.7% on common tasks in popular Chinese APPs.
Expanding AI Awareness Through Everyday Interactions with AI: A Reflective Journal Study
As the application of AI continues to expand, students in technology programs are poised to be both producers and users of the technologies. They are also positioned to engage with AI applications within and outside the classroom. While focusing on the curriculum when examining students' AI knowledge is common, extending this connection to students' everyday interactions with AI provides a more complete picture of their learning. In this paper, we explore student's awareness and engagement with AI in the context of school and their daily lives. Over six weeks, 22 undergraduate students participated in a reflective journal study and submitted a weekly journal entry about their interactions with AI. The participants were recruited from a technology and society course that focuses on the implications of technology on people, communities, and processes. In their weekly journal entries, participants reflected on interactions with AI on campus (coursework, advertises campus events, or seminars) and beyond (social media, news, or conversations with friends and family). The journal prompts were designed to help them think through what they had read, watched, or been told and reflect on the development of their own perspectives, knowledge, and literacy on the topic. Overall, students described nine categories of interactions: coursework, news and current events, using software and applications, university events, social media related to their work, personal discussions with friends and family, interacting with content, and gaming. Students reported that completing the diaries allowed them time for reflection and made them more aware of the presence of AI in their daily lives and of its potential benefits and drawbacks. This research contributes to the ongoing work on AI awareness and literacy by bringing in perspectives from beyond a formal educational context.
Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning
Wang, Xinrui, Geng, Chuanxing, Wan, Wenhai, Li, Shao-yuan, Chen, Songcan
Online continual learning requires the models to learn from constant, endless streams of data. While significant efforts have been made in this field, most were focused on mitigating the catastrophic forgetting issue to achieve better classification ability, at the cost of a much heavier training workload. They overlooked that in real-world scenarios, e.g., in high-speed data stream environments, data do not pause to accommodate slow models. In this paper, we emphasize that model throughput -- defined as the maximum number of training samples that a model can process within a unit of time -- is equally important. It directly limits how much data a model can utilize and presents a challenging dilemma for current methods. With this understanding, we revisit key challenges in OCL from both empirical and theoretical perspectives, highlighting two critical issues beyond the well-documented catastrophic forgetting: Model's ignorance: the single-pass nature of OCL challenges models to learn effective features within constrained training time and storage capacity, leading to a trade-off between effective learning and model throughput; Model's myopia: the local learning nature of OCL on the current task leads the model to adopt overly simplified, task-specific features and excessively sparse classifier, resulting in the gap between the optimal solution for the current task and the global objective. To tackle these issues, we propose the Non-sparse Classifier Evolution framework (NsCE) to facilitate effective global discriminative feature learning with minimal time cost. NsCE integrates non-sparse maximum separation regularization and targeted experience replay techniques with the help of pre-trained models, enabling rapid acquisition of new globally discriminative features.
Engadget Podcast: Why the Windows 11 2024 update is all about Copilot AI
This week, Microsoft started rolling out the Windows 11 2024 update, but it quickly became clear that the company was far more eager to unveil new features for its Copilot AI and Copilot AI PCs. In this episode, Devindra and Cherlynn chat about Microsoft's current AI priorities, and what it means for people with older PCs. Also, we discuss the death of HoloLens and Microsoft giving up on AR as Meta, Apple and even Snap build for an augmented reality future. Listen below or subscribe on your podcast app of choice. If you've got suggestions or topics you'd like covered on the show, be sure to email us or drop a note in the comments! And be sure to check out our other podcast, Engadget News! Tech debt led to Sonos' disastrous app relaunch, will they be able to win users back? Google is making Gmail summaries more useful and adding a "happening soon" tab to your inbox โ 41:11 Harvard students hack together facial recognition for Meta's smart glasses that instantly doxes strangers โ 44:00 ...
COBE: Contextualized Object Embeddings from Narrated Instructional Video Facebook AI, 2
Many objects in the real world undergo dramatic variations in visual appearance. For example, a tomato may be red or green, sliced or chopped, fresh or fried, liquid or solid. Training a single detector to accurately recognize tomatoes in all these different states is challenging. On the other hand, contextual cues (e.g., the presence of a knife, a cutting board, a strainer or a pan) are often strongly indicative of how the object appears in the scene. Recognizing such contextual cues is useful not only to improve the accuracy of object detection or to determine the state of the object, but also to understand its functional properties and to infer ongoing or upcoming human-object interactions.
Google's NotebookLM can now transform YouTube videos into study guides - here's how
Google's NotebookLM, an artificial intelligence (AI)-powered notebook, is expanding its source base. Today, you can now add public YouTube URLs and audio files directly into your notebook. These additions let users summarize lectures, compare the contents of audio files, and transform video and .mp3/.wav audio into study guides. As Google said in the release, NotebookLM is "part note taker, part collaborator, part data collector, and part librarian," letting users craft a digital notebook composing all the research, documents, and other source material they collect. Also: Google's NotebookLM can discuss your notes with you now.