Goto

Collaborating Authors

 agrawal





Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum

Kumar, Navdeep, Dahan, Tehila, Cohen, Lior, Barua, Ananyabrata, Ramponi, Giorgia, Levy, Kfir Yehuda, Mannor, Shie

arXiv.org Machine Learning

We establish an optimal sample complexity of $O(ε^{-2})$ for obtaining an $ε$-optimal global policy using a single-timescale actor-critic (AC) algorithm in infinite-horizon discounted Markov decision processes (MDPs) with finite state-action spaces, improving upon the prior state of the art of $O(ε^{-3})$. Our approach applies STORM (STOchastic Recursive Momentum) to reduce variance in the critic updates. However, because samples are drawn from a nonstationary occupancy measure induced by the evolving policy, variance reduction via STORM alone is insufficient. To address this challenge, we maintain a buffer of small fraction of recent samples and uniformly sample from it for each critic update. Importantly, these mechanisms are compatible with existing deep learning architectures and require only minor modifications, without compromising practical applicability.


Our Cars Can Talk: How IoT Brings AI to Vehicles

Agrawal, Amod Kant

arXiv.org Artificial Intelligence

Abstract--Bringing AI to vehicles and enabling them as sensing platforms is key to transforming maintenance from reactive to proactive. Now is the time to integrate AI copilots that speak both languages: machine and driver. This article offers a conceptual and technical perspective intended to spark interdisciplinary dialogue and guide future research and development in intelligent vehicle systems, predictive maintenance, and AI-powered user interaction. Vehicle maintenance remains largely reactive to this day, often triggered by the dreaded check engine light, sometimes at the worst possible time: in the middle of a busy week, or right before a road trip. However, today's vehicles are equipped with a dense network of sensors that can monitor nearly every aspect of performance in real time.


Rise of the RoboMop! AI machines could be cleaning your floors within a decade - and the price will shock you

Daily Mail - Science & tech

At the moment they may exist only in our wildest dreams or in Hollywood science-fiction epics. But humanoid robots that wash dishes, vacuum the carpets, cook and pick up dirty laundry could be available within a decade – and all for the price of a family car. These machines – equipped with hands, arms and legs capable of doing basic household chores – are currently in development around the world. Pulkit Agrawal, associate professor in the department of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT), said: 'Silicon Valley companies are promising this year you can buy a robot, but my guess would be more like five to ten years, at least. 'The technology is progressing, but it's good to be realistic that it will take time to deploy.'


Review for NeurIPS paper: Simple and Fast Algorithm for Binary Integer and Online Linear Programming

Neural Information Processing Systems

Weaknesses: - My primary concern is insufficient comparison with the existing literature on online LP, like the two works cited [Agrawal '14, Kesselheim et al. '14]: - The paper claims novelty in the sublihear competitive ratios obtained in those works of the form O(1 - \eps(m,n)), so that \eps(m,n) * OPT is the regret. From a glance at the works cited by [Agrawal '14], "Online stochastic packing applied to display ad allocation" [Feldman et al. '10] has an 1/OPT term in this competitive ratio, giving a sublinear regret bound. Some clarifying discussion is necessary here. Some discussion and a clearer comparison is necessary, since this line of work is so well-established. Some clarification about this would be appreciated; in any case, the manuscript should discuss this at greater depth.


Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains

Paniv, Yurii, Kiulian, Artur, Chaplynskyi, Dmytro, Khandoga, Mykola, Polishko, Anton, Bas, Tetiana, Gabrielli, Guillermo

arXiv.org Artificial Intelligence

While the evaluation of multimodal English-centric models is an active area of research with numerous benchmarks, there is a profound lack of benchmarks or evaluation suites for low- and mid-resource languages. We introduce ZNO-Vision, a comprehensive multimodal Ukrainian-centric benchmark derived from standardized university entrance examination (ZNO). The benchmark consists of over 4,300 expert-crafted questions spanning 12 academic disciplines, including mathematics, physics, chemistry, and humanities. We evaluated the performance of both open-source models and API providers, finding that only a handful of models performed above baseline. Alongside the new benchmark, we performed the first evaluation study of multimodal text generation for the Ukrainian language: we measured caption generation quality on the Multi30K-UK dataset, translated the VQA benchmark into Ukrainian, and measured performance degradation relative to original English versions. Lastly, we tested a few models from a cultural perspective on knowledge of national cuisine. We believe our work will advance multimodal generation capabilities for the Ukrainian language and our approach could be useful for other low-resource languages.


Watch a robot peel a squash with human-like dexterity

New Scientist

A robot that peels vegetables in the same way that people do demonstrates a level of dexterity that could help move delicate objects along a manufacturing line. Prototype robots are often tasked with peeling vegetables to test their ability to carefully handle awkward objects. But these challenges are usually simplified, such as the vegetable being fixed in place, or only testing single fruits or vegetables, like peeling a banana. How this moment for AI will change society forever (and how it won't) Now, Pulkit Agrawal at the Massachusetts Institute of Technology and his colleagues have developed a robotic system that can rotate different types of fruit and vegetable using its fingers on one hand, while the other arm is made to peel. "These additional steps of doing rotation are something which is very straightforward to humans, we don't even think about it," says Agrawal. "But for a robot, this becomes challenging." First, the robot was taught in a simulated environment, receiving an algorithmic reward for a proper rotation and a punishment if it rotated the wrong way or not at all.


Automatic Environment Shaping is the Next Frontier in RL

Park, Younghyo, Margolis, Gabriel B., Agrawal, Pulkit

arXiv.org Artificial Intelligence

Many roboticists dream of presenting a robot with a task in the evening and returning the next morning to find the robot capable of solving the task. What is preventing us from achieving this? Sim-to-real reinforcement learning (RL) has achieved impressive performance on challenging robotics tasks, but requires substantial human effort to set up the task in a way that is amenable to RL. It's our position that algorithmic improvements in policy optimization and other ideas should be guided towards resolving the primary bottleneck of shaping the training environment, i.e., designing observations, actions, rewards and simulation dynamics. Most practitioners don't tune the RL algorithm, but other environment parameters to obtain a desirable controller. We posit that scaling RL to diverse robotic tasks will only be achieved if the community focuses on automating environment shaping procedures.