AITopics | Patel, Alkesh

Collaborating Authors

Patel, Alkesh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MARRS: Multimodal Reference Resolution System

Ates, Halim Cagri, Bhargava, Shruti, Li, Site, Lu, Jiarui, Maddula, Siddhardha, Moniz, Joel Ruben Antony, Nalamalapu, Anil Kumar, Nguyen, Roman Hoang, Ozyildirim, Melis, Patel, Alkesh, Piraviperumal, Dhivya, Renkens, Vincent, Samal, Ankit, Tran, Thy, Tseng, Bo-Hsiang, Yu, Hong, Zhang, Yuan, Zou, Rong

arXiv.org Artificial IntelligenceNov-2-2023

Successfully handling context is essential for any dialog understanding task. This context maybe be conversational (relying on previous user queries or system responses), visual (relying on what the user sees, for example, on their screen), or background (based on signals such as a ringing alarm or playing music). In this work, we present an overview of MARRS, or Multimodal Reference Resolution System, an on-device framework within a Natural Language Understanding system, responsible for handling conversational, visual and background context. In particular, we present different machine learning models to enable handing contextual queries; specifically, one to enable reference resolution, and one to handle context via query rewriting. We also describe how these models complement each other to form a unified, coherent, lightweight system that can understand context while preserving user privacy.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.crac-main.7

2311.0165

Country: North America > United States (0.46)

Genre:

Overview (0.55)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.30)

Add feedback

Referring to Screen Texts with Voice Assistants

Bhargava, Shruti, Dhoot, Anand, Jonsson, Ing-Marie, Nguyen, Hoang Long, Patel, Alkesh, Yu, Hong, Renkens, Vincent

arXiv.org Artificial IntelligenceJun-10-2023

Voice assistants help users make phone calls, send messages, create events, navigate, and do a lot more. However, assistants have limited capacity to understand their users' context. In this work, we aim to take a step in this direction. Our work dives into a new experience for users to refer to phone numbers, addresses, email addresses, URLs, and dates on their phone screens. Our focus lies in reference understanding, which becomes particularly interesting when multiple similar texts are present on screen, similar to visual grounding. We collect a dataset and propose a lightweight general-purpose model for this novel experience. Due to the high cost of consuming pixels directly, our system is designed to rely on the extracted text from the UI. Our model is modular, thus offering flexibility, improved interpretability, and efficient runtime memory utilization.

category, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.07298

Country: North America > United States (0.47)

Genre: Research Report (0.40)

Industry:

Information Technology (0.46)
Media (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.73)
(2 more...)

Add feedback

Generating Natural Questions from Images for Multimodal Assistants

Patel, Alkesh, Bindal, Akanksha, Kotek, Hadas, Klein, Christopher, Williams, Jason

arXiv.org Artificial IntelligenceNov-17-2020

Generating natural, diverse, and meaningful questions from images is an essential task for multimodal assistants as it confirms whether they have understood the object and scene in the images properly. The research in visual question answering (VQA) and visual question generation (VQG) is a great step. However, this research does not capture questions that a visually-abled person would ask multimodal assistants. Recently published datasets such as KB-VQA, FVQA, and OK-VQA try to collect questions that look for external knowledge which makes them appropriate for multimodal assistants. However, they still contain many obvious and common-sense questions that humans would not usually ask a digital assistant. In this paper, we provide a new benchmark dataset that contains questions generated by human annotators keeping in mind what they would ask multimodal digital assistants. Large scale annotations for several hundred thousand images are expensive and time-consuming, so we also present an effective way of automatically generating questions from unseen images. In this paper, we present an approach for generating diverse and meaningful questions that consider image content and metadata of image (e.g., location, associated keyword). We evaluate our approach using standard evaluation metrics such as BLEU, METEOR, ROUGE, and CIDEr to show the relevance of generated questions with human-provided questions. We also measure the diversity of generated questions using generative strength and inventiveness metrics. We report new state-of-the-art results on the public and our datasets.

artificial intelligence, dataset, neural network, (17 more...)

arXiv.org Artificial Intelligence

2012.03678

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Noise-robust Named Entity Understanding for Virtual Assistants

Muralidharan, Deepak, Moniz, Joel Ruben Antony, Gao, Sida, Yang, Xiao, Li, Lin, Kao, Justine, Pulman, Stephen, Kothari, Atish, Shen, Ray, Pan, Yinying, Kaul, Vivek, Ibrahim, Mubarak Seyed, Xiang, Gang, Dun, Nan, Zhou, Yidan, O, Andy, Zhang, Yuan, Wang, Pooja Chitkara Xuan, Patel, Alkesh, Tayal, Kushal, Zheng, Roger, Grasch, Peter, Williams, Jason

arXiv.org Artificial IntelligenceMay-29-2020

Named Entity Understanding (NEU) plays an essential role in interactions between users and voice assistants, since successfully identifying entities and correctly linking them to their standard forms is crucial to understanding the user's intent. NEU is a challenging task in voice assistants due to the ambiguous nature of natural language and because noise introduced by speech transcription and user errors occur frequently in spoken natural language queries. In this paper, we propose an architecture with novel features that jointly solves the recognition of named entities (a.k.a. Named Entity Recognition, or NER) and the resolution to their canonical forms (a.k.a. Entity Linking, or EL). We show that by combining NER and EL information in a joint reranking module, our proposed framework improves accuracy in both tasks. This improved performance and the features that enable it, also lead to better accuracy in downstream tasks, such as domain classification and semantic parsing.

computational linguistics, deep learning, speech recognition, (22 more...)

arXiv.org Artificial Intelligence

2005.14408

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Colorado (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback