AITopics | Dobhal, Daksh

Collaborating Authors

Dobhal, Daksh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

$\forall$uto$\exists$$\lor\!\land$L: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks

Karia, Rushang, Bramblett, Daniel, Dobhal, Daksh, Srivastava, Siddharth

arXiv.org Artificial IntelligenceOct-10-2024

This paper presents $\forall$uto$\exists$$\lor\!\land$L, a novel benchmark for scaling Large Language Model (LLM) assessment in formal tasks with clear notions of correctness, such as truth maintenance in translation and logical reasoning. $\forall$uto$\exists$$\lor\!\land$L is the first benchmarking paradigm that offers several key advantages necessary for scaling objective evaluation of LLMs without human labeling: (a) ability to evaluate LLMs of increasing sophistication by auto-generating tasks at different levels of difficulty; (b) auto-generation of ground truth that eliminates dependence on expensive and time-consuming human annotation; (c) the use of automatically generated, randomized datasets that mitigate the ability of successive LLMs to overfit to static datasets used in many contemporary benchmarks. Empirical analysis shows that an LLM's performance on $\forall$uto$\exists$$\lor\!\land$L is highly indicative of its performance on a diverse array of other benchmarks focusing on translation and reasoning tasks, making it a valuable autonomous evaluation paradigm in settings where hand-curated datasets can be hard to obtain and/or update.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.08437

Country: North America > United States > Michigan (0.14)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.46)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Using Explainable AI and Hierarchical Planning for Outreach with Robots

Dobhal, Daksh, Nagpal, Jayesh, Karia, Rushang, Verma, Pulkit, Nayyar, Rashmeet Kaur, Shah, Naman, Srivastava, Siddharth

arXiv.org Artificial IntelligenceMar-31-2024

Understanding how robots plan and execute tasks is crucial in today's world, where they are becoming more prevalent in our daily lives. However, teaching non-experts the complexities of robot planning can be challenging. This work presents an open-source platform that simplifies the process using a visual interface that completely abstracts the complex internals of hierarchical planning that robots use for performing task and motion planning. Using the principles developed in the field of explainable AI, this intuitive platform enables users to create plans for robots to complete tasks, and provides helpful hints and natural language explanations for errors. The platform also has a built-in simulator to demonstrate how robots execute submitted plans. This platform's efficacy was tested in a user study on university students with little to no computer science background. Our results show that this platform is highly effective in teaching novice users the intuitions of robot task planning.

artificial intelligence, jedai, natural language, (9 more...)

arXiv.org Artificial Intelligence

2404.00808

Genre: Research Report > New Finding (0.55)

Industry: Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.87)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.63)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.63)

Add feedback

Can LLMs Converse Formally? Automatically Assessing LLMs in Translating and Interpreting Formal Specifications

Karia, Rushang, Dobhal, Daksh, Bramblett, Daniel, Verma, Pulkit, Srivastava, Siddharth

arXiv.org Artificial IntelligenceMar-27-2024

Automatic system synthesis and verification often require specifications to be provided in a formal language such as propositional logic [Haubelt and Feldmann, 2003, Scholl and Becker, 2001]. Typically, human experts serve as middlemen that can (a) translate natural language (NL) specifications of stakeholders to formal syntax, or (b) explain or interpret the system's functionality by translating the system manual into NL. Given the success of Large Language Models (LLMs) in translation tasks [Xue et al., 2021], utilizing LLMs as middlemen can help in reducing overall system design costs. Thus, it is vital to develop an evaluation methodology that can assess the capabilities of LLMs in such settings. However, developing such a methodology is quite difficult. Firstly, obtaining high-quality datasets - such as those that contain ground truth data that LLMs have not been trained on - is difficult. As LLMs evolve, the dataset would need to evolve as well since it would likely be included as a part of the next-gen LLMs training process. Scaling up existing datasets is challenging since they require human annotators to encode NL text and their formal specifications. Finally, the assessment task must consider both the directions of translation; formal-to-natural and natural-to-formal.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.18327

Country: North America > United States > Arizona (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback