AITopics | Karnik, Sathwik

Collaborating Authors

Karnik, Sathwik

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Embodied Red Teaming for Auditing Robotic Foundation Models

Karnik, Sathwik, Hong, Zhang-Wei, Abhangi, Nishant, Lin, Yen-Chen, Wang, Tsun-Hsuan, Agrawal, Pulkit

arXiv.org Artificial IntelligenceNov-27-2024

Language-conditioned robot models (i.e., robotic foundation models) enable robots to perform a wide range of tasks based on natural language instructions. Despite strong performance on existing benchmarks, evaluating the safety and effectiveness of these models is challenging due to the complexity of testing all possible language variations. Current benchmarks have two key limitations: they rely on a limited set of human-generated instructions, missing many challenging cases, and they focus only on task performance without assessing safety, such as avoiding damage. To address these gaps, we introduce Embodied Red Teaming (ERT), a new evaluation method that generates diverse and challenging instructions to test these models. ERT uses automated red teaming techniques with Vision Language Models (VLMs) to create contextually grounded, difficult instructions. Experimental results show that state-of-the-art models frequently fail or behave unsafely on ERT tests, underscoring the shortcomings of current benchmarks in evaluating real-world performance and safety. Code and videos are available at: https://sites.google.com/view/embodiedredteam.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2411.18676

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.87)

Industry:

Leisure & Entertainment > Sports (0.92)
Information Technology (0.92)
Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

Add feedback

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Hong, Zhang-Wei, Kumar, Aviral, Karnik, Sathwik, Bhandwaldar, Abhishek, Srivastava, Akash, Pajarinen, Joni, Laroche, Romain, Gupta, Abhishek, Agrawal, Pulkit

arXiv.org Artificial IntelligenceOct-11-2023

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl.

artificial intelligence, machine learning, offline reinforcement learning, (2 more...)

arXiv.org Artificial Intelligence

2310.04413

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams

Drori, Iddo, Zhang, Sarah J., Shuttleworth, Reece, Zhang, Sarah, Tyser, Keith, Chin, Zad, Lantigua, Pedro, Surbehera, Saisamrit, Hunter, Gregory, Austin, Derek, Tang, Leonard, Hicke, Yann, Simhon, Sage, Karnik, Sathwik, Granberry, Darnell, Udell, Madeleine

arXiv.org Artificial IntelligenceJun-28-2023

A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We curate a dataset and benchmark of questions from machine learning final exams available online and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. For reproducibility and future research on this final exam benchmark, we use automatic checkers for multiple-choice, numeric, and questions with expression answers. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to mere machine seconds. Our results suggest that rather than banning large language models such as ChatGPT in class, instructors should teach students to harness them by asking students meta-questions about correctness, completeness, and originality of the responses generated, encouraging critical thinking in academic studies.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2206.05442

Country:

North America > United States > California (0.15)
North America > United States > New York (0.14)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback