Goto

Collaborating Authors

 act


Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Neural Information Processing Systems

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. However, for many sequential decision domains such as robotics, video games, and computer use, publicly available data does not contain the labels required to train behavioral priors in the same way. We extend the internet-scale pretraining paradigm to sequential decision domains through semi-supervised imitation learning wherein agents learn to act by watching online unlabeled videos. Specifically, we show that with a small amount of labeled data we can train an inverse dynamics model accurate enough to label a huge unlabeled source of online data -- here, online videos of people playing Minecraft -- from which we can then train a general behavioral prior. Despite using the native human interface (mouse and keyboard at 20Hz), we show that this behavioral prior has nontrivial zero-shot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish.


Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs

Neural Information Processing Systems

Most text-driven human motion generation methods employ sequential modeling approaches, e.g., transformer, to extract sentence-level text representations automatically and implicitly for human motion synthesis. However, these compact text representations may overemphasize the action names at the expense of other important properties and lack fine-grained details to guide the synthesis of subtly distinct motion. In this paper, we propose hierarchical semantic graphs for fine-grained control over motion generation.


Constructing coherent spatial memory in LLM agents through graph rectification

Zhang, Puzhen, Chen, Xuyang, Feng, Yu, Jiang, Yuhan, Meng, Liqiu

arXiv.org Artificial Intelligence

Given a map description through global traversal navigation instructions (e.g., visiting each room sequentially with action signals such as north, west, etc.), an LLM can often infer the implicit spatial layout of the environment and answer user queries by providing a shortest path from a start to a destination (for instance, navigating from the lobby to a meeting room via the hall and elevator). However, such context-dependent querying becomes incapable as the environment grows much longer, motivating the need for incremental map construction that builds a complete topological graph from stepwise observations. We propose a framework for LLM-driven construction and map repair, designed to detect, localize, and correct structural inconsistencies in incrementally constructed navigation graphs. Central to our method is the Version Control, which records the full history of graph edits and their source observations, enabling fine-grained rollback, conflict tracing, and repair evaluation. We further introduce an Edge Impact Score to prioritize minimal-cost repairs based on structural reachability, path usage, and conflict propagation. To properly evaluate our approach, we create a refined version of the MANGO benchmark dataset by systematically removing non-topological actions and inherent structural conflicts, providing a cleaner testbed for LLM-driven construction and map repair. Our approach significantly improves map correctness and robustness, especially in scenarios with entangled or chained inconsistencies. Our results highlight the importance of introspective, history-aware repair mechanisms for maintaining coherent spatial memory in LLM agents.


The Illusion of Rights based AI Regulation

Mei, Yiyang, Sag, Matthew

arXiv.org Artificial Intelligence

Whether and how to regulate AI is one of the defining questions of our times - a question that is being debated locally, nationally, and internationally. We argue that much of this debate is proceeding on a false premise. Specifically, our article challenges the prevailing academic consensus that the European Union's AI regulatory framework is fundamentally rights-driven and the correlative presumption that other rights-regarding nations should therefore follow Europe's lead in AI regulation. Rather than taking rights language in EU rules and regulations at face value, we show how EU AI regulation is the logical outgrowth of a particular cultural, political, and historical context. We show that although instruments like the General Data Protection Regulation (GDPR) and the AI Act invoke the language of fundamental rights, these rights are instrumentalized - used as rhetorical cover for governance tools that address systemic risks and maintain institutional stability. As such, we reject claims that the EU's regulatory framework and the substance of its rules should be adopted as universal imperatives and transplanted to other liberal democracies. To add weight to our argument from historical context, we conduct a comparative analysis of AI regulation in five contested domains: data privacy, cybersecurity, healthcare, labor, and misinformation. This EU-US comparison shows that the EU's regulatory architecture is not meaningfully rights-based. Our article's key intervention in AI policy debates is not to suggest that the current American regulatory model is necessarily preferable but that the presumed legitimacy of the EU's AI regulatory approach must be abandoned.


Position: It's Time to Act on the Risk of Efficient Personalized Text Generation

Iofinova, Eugenia, Jovanovic, Andrej, Alistarh, Dan

arXiv.org Artificial Intelligence

The recent surge in high-quality open-sourced Generative AI text models (colloquially: LLMs), as well as efficient finetuning techniques, has opened the possibility of creating high-quality personalized models, i.e., models generating text attuned to a specific individual's needs and capable of credibly imitating their writing style by leveraging that person's own data to refine an open-source model. The technology to create such models is accessible to private individuals, and training and running such models can be done cheaply on consumer-grade hardware. These advancements are a huge gain for usability and privacy. This position paper argues, however, that these advancements also introduce new safety risks by making it practically feasible for malicious actors to impersonate specific individuals at scale, for instance for the purpose of phishing emails, based on small amounts of publicly available text. We further argue that these risks are complementary to - and distinct from - the much-discussed risks of other impersonation attacks such as image, voice, or video deepfakes, and are not adequately addressed by the larger research community, or the current generation of open - and closed-source models.


Part 2: Canada's evolving artificial intelligence and privacy regime

#artificialintelligence

The publication of this series was inspired by the release ChatGPT, which is a generative artificial intelligence (AI) chatbox developed by Open AI. ChatGPT uses machine learning and natural language processing to provide relatively sophisticated and human-like responses to almost any question. Unlike traditional AI systems, ChatGPT is a generative AI platform, which means that the content it creates is "new," rather than a reiteration of something that already exists. As ChatGPT demonstrates, content can be produced through generative AI in a matter of seconds and may be composed of images, videos, audio, text or even code. The reality is that generative AI is well on the way to becoming not just faster and cheaper, but better in some cases than what humans create by hand.


Can the world's de facto tech regulator really rein in AI? - Coda Story

#artificialintelligence

Artificial intelligence is creeping into every aspect of our lives. AI-powered software is triaging hospital patients to determine who gets which treatment, deciding whether an asylum seeker is lying or telling the truth in their application and even conjuring up weird conceits for sitcoms. Just lately, these kinds of tools have been helping killer robots select their targets in the war in Ukraine. AI systems have been proven to carry systemic biases again and again, but their increasing centrality to the way we live makes those debates even more urgent. In typical tech fashion, AI-driven tools are advancing much faster than the laws that could theoretically govern them.


2023 Will Be The Year Of AI Ethics Legislation Acceleration

#artificialintelligence

Ethical AI will need careful planting of many ecosystems. Ethical AI has been a concern of AI leaders, and practitioners for many years, but finally it seems, global jurisdictions are starting to move from policy formulation and stakeholder engagement to putting some teeth into drafting legal bills or acts. Expect many new laws to pass in 2023, tightening up citizen privacy and creating risk frameworks and audit requirements for data bias, privacy and security risks. At the same time, regulators are going to have to evolve an entire global ecosystem to ensure AI audits are effectively conducted and many questions loom as to who will validate certifications for AI audit practices and will we over burden AI innovations like we have done in so many other regulated operating practices that the risk and costs of non-conformance inhibit's innovation and capital funding? Finding a balance will be key.


Awesome ChatGPT Prompts

#artificialintelligence

Collection of the best ChatGPT prompts found on the internet. We have hundreds of prompts we have sourced and also have a place for you to submit your own to our community! You can easily find a prompt across different categories, copy to your clipboard, and run it quickly in ChatGPT.


Act like a Machine Learning Pro in Simple Way (PyCaret + mlflow)

#artificialintelligence

Build your own ML lab and become a ML Professional to your boss in a simple way. Machine learning (ML) has been well known for a while, since a massive amount of companies want to merge their business with AI or Data Science related. Along with the data project, analysis, the funniest part would be the machine learning model.