AITopics | Rivkin, Dmitriy

Collaborating Authors

Rivkin, Dmitriy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots

Rivkin, Dmitriy, Kakodkar, Nikhil, Hogan, Francois, Baghi, Bobak H., Dudek, Gregory

arXiv.org Artificial IntelligenceFeb-1-2024

Abstract-- This work explores the capacity of large language models (LLMs) to address problems at the intersection of spatial planning and natural language interfaces for navigation. We focus on following complex instructions that are more akin to natural conversation than traditional explicit procedural directives typically seen in robotics. Unlike most prior work where navigation directives are provided as simple imperative commands (e.g., "go to the fridge"), we examine implicit directives obtained through conversational interactions.We leverage the 3D simulator AI2Thor to create household query scenarios at scale, and augment it by adding complex language queries for 40 object types. We demonstrate that a robot using our method CARTIER (Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots) can parse descriptive language queries up to 42% more reliably than existing LLM-enabled methods by exploiting the ability of LLMs to interpret the user interaction in the context of the objects in the scenario. This paper explores the extent to which natural interaction is possible between human and robot in the context of a navigation task. We seek to answer the question: "Can a robot infer its task in a navigational context without Figure 1: CARTIER prompts an LLM with knowledge about receiving an explicit command?" Household robotic tasks are a robot's environment in order to parse user intent from often formulated using imperative commands with a template implicit, conversational queries. It then informs the robot structure that can be abstracted as "go-do" commands (go where to navigate in order to help the user.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.11865

Country: North America > United States > California (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

PhotoBot: Reference-Guided Interactive Photography via Natural Language

Limoyo, Oliver, Li, Jimmy, Rivkin, Dmitriy, Kelly, Jonathan, Dudek, Gregory

arXiv.org Artificial IntelligenceJan-19-2024

We introduce PhotoBot, a framework for automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via a reference picture that is retrieved from a curated gallery. We exploit a visual language model (VLM) and an object detector to characterize reference pictures via textual descriptions and use a large language model (LLM) to retrieve relevant reference pictures based on a user's language query through text-based reasoning. To correspond the reference picture and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across significantly varying images. Using these features, we compute pose adjustments for an RGB-D camera by solving a Perspective-n-Point (PnP) problem. We demonstrate our approach on a real-world manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.11061

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (0.82)
Questionnaire & Opinion Survey (0.69)

Industry: Media > Photography (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

SAGE: Smart home Agent with Grounded Execution

Rivkin, Dmitriy, Hogan, Francois, Feriani, Amal, Konar, Abhisek, Sigal, Adam, Liu, Steve, Dudek, Greg

arXiv.org Artificial IntelligenceJan-19-2024

The common sense reasoning abilities and vast general knowledge of Large Language Models (LLMs) make them a natural fit for interpreting user requests in a Smart Home assistant context. LLMs, however, lack specific knowledge about the user and their home limit their potential impact. SAGE (Smart Home Agent with Grounded Execution), overcomes these and other limitations by using a scheme in which a user request triggers an LLM-controlled sequence of discrete actions. These actions can be used to retrieve information, interact with the user, or manipulate device states. SAGE controls this process through a dynamically constructed tree of LLM prompts, which help it decide which action to take next, whether an action was successful, and when to terminate the process. The SAGE action set augments an LLM's capabilities to support some of the most critical requirements for a Smart Home assistant. These include: flexible and scalable user preference management ("is my team playing tonight?"), access to any smart device's full functionality without device-specific code via API reading "turn down the screen brightness on my dryer", persistent device state monitoring ("remind me to throw out the milk when I open the fridge"), natural device references using only a photo of the room ("turn on the light on the dresser"), and more. We introduce a benchmark of 50 new and challenging smart home tasks where SAGE achieves a 75% success rate, significantly outperforming existing LLM-enabled baselines (30% success rate).

large language model, machine learning, sage, (19 more...)

arXiv.org Artificial Intelligence

2311.00772

Genre: Research Report (0.40)

Industry: Information Technology > Smart Houses & Appliances (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence

Rivkin, Dmitriy, Dudek, Gregory, Kakodkar, Nikhil, Meger, David, Limoyo, Oliver, Liu, Xue, Hogan, Francois

arXiv.org Artificial IntelligenceFeb-15-2023

Our work examines the way in which large language models can be used for robotic planning and sampling, specifically the context of automated photographic documentation. Specifically, we illustrate how to produce a photo-taking robot with an exceptional level of semantic awareness by leveraging recent advances in general purpose language (LM) and vision-language (VLM) models. Given a high-level description of an event we use an LM to generate a natural-language list of photo descriptions that one would expect a photographer to capture at the event. We then use a VLM to identify the best matches to these descriptions in the robot's video stream. The photo portfolios generated by our method are consistently rated as more appropriate to the event by human evaluators than those generated by existing methods.

artificial intelligence, natural language, photographer, (18 more...)

arXiv.org Artificial Intelligence

2302.07931

Genre: Research Report > New Finding (0.46)

Industry:

Media > Photography (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Self-Supervised Transformer Architecture for Change Detection in Radio Access Networks

Kozlov, Igor, Rivkin, Dmitriy, Chang, Wei-Di, Wu, Di, Liu, Xue, Dudek, Gregory

arXiv.org Artificial IntelligenceFeb-3-2023

Radio Access Networks (RANs) for telecommunications represent large agglomerations of interconnected hardware consisting of hundreds of thousands of transmitting devices (cells). Such networks undergo frequent and often heterogeneous changes caused by network operators, who are seeking to tune their system parameters for optimal performance. The effects of such changes are challenging to predict and will become even more so with the adoption of 5G/6G networks. Therefore, RAN monitoring is vital for network operators. We propose a self-supervised learning framework that leverages self-attention and self-distillation for this task. It works by detecting changes in Performance Measurement data, a collection of time-varying metrics which reflect a set of diverse measurements of the network performance at the cell level. Experimental results show that our approach outperforms the state of the art by 4% on a real-world based dataset consisting of about hundred thousands timeseries. It also has the merits of being scalable and generalizable. This allows it to provide deep insight into the specifics of mode of operation changes while relying minimally on expert knowledge.

artificial intelligence, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.02025

Country: North America (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Telecommunications > Networks (0.69)
Information Technology > Networks (0.69)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback