Goto

Collaborating Authors

 Europe


Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Neural Information Processing Systems

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate endto-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.


Without-Replacement Sampling for Stochastic Gradient Methods Ohad Shamir Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel ohad.shamir@weizmann.ac.il

Neural Information Processing Systems

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In contrast, sampling without replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling under several scenarios, focusing on the natural regime of few passes over the data. Moreover, we describe a useful application of these results in the context of distributed optimization with randomly-partitioned data, yielding a nearly-optimal algorithm for regularized least squares (in terms of both communication complexity and runtime complexity) under broad parameter regimes. Our proof techniques combine ideas from stochastic optimization, adversarial online learning and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems.


A drone delivered her lethal dose of fentanyl in a church parking lot. Now her dealer is going to prison

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. A drone delivered her lethal dose of fentanyl in a church parking lot. The Drug Enforcement Administration was among agencies involved in the investigation. This is read by an automated voice. Please report any issues or inconsistencies here .


Meta to capture U.S. employee mouse movements and keystrokes to train AI

The Japan Times

Meta to capture U.S. employee mouse movements and keystrokes to train AI NEW YORK - Meta is installing new tracking software on U.S.-based employees' computers to capture mouse movements, clicks and keystrokes for use in training its artificial intelligence models, part of a broad initiative to build AI agents that can perform work tasks autonomously, the company told staffers in internal memos. The tool, called Model Capability Initiative (MCI), will run on work-related apps and websites and will also take occasional snapshots of the content on employees' screens, according to one of the memos, posted by a staff AI research scientist on Tuesday in a channel for the company's model-building Meta SuperIntelligence Labs team. The purpose, according to the memo, was to improve the company's AI models in areas where they struggle to replicate how humans interact with computers, like choosing from dropdown menus and using keyboard shortcuts. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right.



Clustering with Bregman Divergences: an Asymptotic Analysis

Neural Information Processing Systems

Clustering, in particular k-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of k-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters k is large. We establish quantization rates and describe the limiting distribution of the centers as k, extending well-known results for k-means clustering.


Safe and Efficient Off-Policy Reinforcement Learning

Neural Information Processing Systems

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace(ฮป), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of "off-policyness"; and (3) it is efficient as it makes the best use of samples collected from near on-policy behaviour policies. We analyze the contractive nature of the related operator under both off-policy policy evaluation and control settings and derive online sample-based algorithms. We believe this is the first return-based off-policy control algorithm converging a.s. to Q without the GLIE assumption (Greedy in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins' Q(ฮป), which was an open problem since 1989. We illustrate the benefits of Retrace(ฮป) on a standard suite of Atari 2600 games. One fundamental trade-off in reinforcement learning lies in the definition of the update target: should one estimate Monte Carlo returns or bootstrap from an existing Q-function?




SpaceX secures option to buy AI startup Cursor for 60bn or partner for 10bn

The Guardian

Elon Musk speaks at the SpaceX Hyperloop Pod Competition II in Hawthorne, California, in 2017. Elon Musk speaks at the SpaceX Hyperloop Pod Competition II in Hawthorne, California, in 2017. Cursor is a Silicon Valley startup using AI to automate coding as Elon Musk's firm seeks foothold in the AI market SpaceX said it has secured an option to either acquire code-generation startup Cursor for $60bn later this year, or pay $10bn for their new partnership, as it pushes deeper into the lucrative market for AI developer tools. Along with OpenAI and Anthropic, Cursor is one of several Silicon Valley startups that has drawn waves of developers by using artificial intelligence to automate coding, a business where AI companies have found early commercial traction. The deal could give xAI, the Grok chatbot maker that SpaceX merged with in February, a stronger foothold in the AI coding market where it has so far lagged rivals.