kambhampati
AI hallucinates because it's trained to fake answers it doesn't know
Earlier today, OpenAI completed a controversial restructuring of its for-profit arm into a public benefit corporation: the latest gust in a whirlwind that has swept up hundreds of billions of dollars of global investment for artificial intelligence (AI) tools. But even as the AI company--founded as a nonprofit, now valued at 500 billion--completes its long-awaited restructuring, a nagging issue with its core offering remains unresolved: hallucinations. Large language models (LLMs) such as those that underpin OpenAI's popular ChatGPT platform are prone to confidently spouting factually incorrect statements. These blips are often attributed to bad input data, but in a preprint posted last month, a team from OpenAI and the Georgia Institute of Technology proves that even with flawless training data, LLMs can never be all-knowing--in part because some questions are just inherently unanswerable. However, that doesn't mean hallucinations are inevitable.
- North America > United States > Illinois > Champaign County > Urbana (0.05)
- North America > United States > Arizona (0.05)
- Europe > Netherlands > South Holland > Delft (0.05)
On the Role of Domain Experts in Creating Effective Tutoring Systems
Sreedharan, Sarath, Sikes, Kelsey, Blanchard, Nathaniel, Mason, Lisa, Krishnaswamy, Nikhil, Zarestky, Jill
The role that highly curated knowledge, provided by domain experts, could play in creating effective tutoring systems is often overlooked within the AI for education community. In this paper, we highlight this topic by discussing two ways such highly curated expert knowledge could help in creating novel educational systems. First, we will look at how one could use explainable AI (XAI) techniques to automatically create lessons. Most existing XAI methods are primarily aimed at debugging AI systems. However, we will discuss how one could use expert specified rules about solving specific problems along with novel XAI techniques to automatically generate lessons that could be provided to learners. Secondly, we will see how an expert specified curriculum for learning a target concept can help develop adaptive tutoring systems, that can not only provide a better learning experience, but could also allow us to use more efficient algorithms to create these systems. Finally, we will highlight the importance of such methods using a case study of creating a tutoring system for pollinator identification, where such knowledge could easily be elicited from experts.
- North America > United States > Colorado (0.05)
- North America > United States > Oregon (0.05)
- Europe (0.04)
- Research Report (1.00)
- Instructional Material (0.66)
- Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)
The AI Civil War Is Here
The story unfolds so rapidly that it can all seem, at a glance, preordained. After transferring to Columbia last fall, as Chungin "Roy" Lee tells it, he used AI to cheat his way through school, used AI to cheat his way through internship interviews at Amazon and Meta--he received offers from both--and in the winter broadcasted his tool on social media. He was placed on probation, suspended, and, more keen on AI than education, dropped out this spring to found a start-up.That start-up, Cluely, markets the ability to "cheat on everything" using an AI assistant that runs in the background during meetings or sales calls. Last month, it finished a 15 million fundraising round led by Andreessen Horowitz, the storied venture-capital firm. Lee unapologetically believes that the arrival of omniscient AI is inevitable, that bots will soon automate every job.
- North America > United States > California (0.05)
- North America > United States > Arizona (0.04)
- Asia > China (0.04)
- Information Technology (0.70)
- Government (0.47)
- Banking & Finance > Capital Markets (0.34)
- Information Technology > Communications > Social Media (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.53)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)
Is superintelligent AI just around the corner, or just a sci-fi dream?
Are machines about to become smarter than humans? If you take the leaders of artificial intelligence companies at their word, their products mean that the coming decade will be quite unlike any in human history: a golden era of "radical abundance", where high-energy physics is "solved" and we see the beginning of space colonisation. But researchers working with today's most powerful AI systems are finding a different reality, in which even the best models are failing to solve basic puzzles that most humans find trivial, while the promise of AI that can "reason" seems to be overblown. So, whom should you believe? Sam Altman and Demis Hassabis, the CEOs of OpenAI and Google DeepMind, respectively, have both made recent claims that powerful, world-altering AI systems are just around the corner.
- North America > United States > Maryland (0.05)
- North America > United States > Arizona (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- (2 more...)
Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AI
Tulli, Silvia, Vasileiou, Stylianos Loukas, Sreedharan, Sarath
"Human-aware" has become a popular keyword used to describe a particular class of AI systems that are designed to work and interact with humans. While there exists a surprising level of consistency among the works that use the label human-aware, the term itself mostly remains poorly understood. In this work, we retroactively try to provide an account of what constitutes a human-aware AI system. We see that human-aware AI is a design oriented paradigm, one that focuses on the need for modeling the humans it may interact with. Additionally, we see that this paradigm offers us intuitive dimensions to understand and categorize the kinds of interactions these systems might have with humans. We show the pedagogical value of these dimensions by using them as a tool to understand and review the current landscape of work related to human-AI systems that purport some form of human modeling. To fit the scope of a workshop paper, we specifically narrowed our review to papers that deal with sequential decision-making and were published in a major AI conference in the last three years. Our analysis helps identify the space of potential research problems that are currently being overlooked. We perform additional analysis on the degree to which these works make explicit reference to results from social science and whether they actually perform user-studies to validate their systems. We also provide an accounting of the various AI methods used by these works.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Colorado (0.04)
- Overview (1.00)
- Research Report (0.84)
TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners
de la Rosa, Tomas, Gopalakrishnan, Sriram, Pozanco, Alberto, Zeng, Zhen, Borrajo, Daniel
Travel planning is a complex task that involves generating a sequence of actions related to visiting places subject to constraints and maximizing some user satisfaction criteria. Traditional approaches rely on problem formulation in a given formal language, extracting relevant travel information from web sources, and use an adequate problem solver to generate a valid solution. As an alternative, recent Large Language Model (LLM) based approaches directly output plans from user requests using language. Although LLMs possess extensive travel domain knowledge and provide high-level information like points of interest and potential routes, current state-of-the-art models often generate plans that lack coherence, fail to satisfy constraints fully, and do not guarantee the generation of high-quality solutions. We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners, where (i) LLMs get and translate travel information and user information into data structures that can be fed into planners; and (ii) automated planners generate travel plans that guarantee constraint satisfaction and optimize for users' utility. Our experiments across various travel scenarios show that TRIP-PAL outperforms an LLM when generating travel plans.
- South America > Uruguay > Colonia > Colonia del Sacramento (0.04)
- North America > United States > New York (0.04)
- North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning
Gundawar, Atharva, Verma, Mudit, Guan, Lin, Valmeekam, Karthik, Bhambri, Siddhant, Kambhampati, Subbarao
As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such complex domains. The recent discourse introduced by the paper on LLM Modulo marks a significant stride, proposing a conceptual framework that enhances the integration of LLMs into diverse planning and reasoning activities. This workshop paper delves into the practical application of this framework within the domain of travel planning, presenting a specific instance of its implementation. We are using the Travel Planning benchmark by the OSU NLP group, a benchmark for evaluating the performance of LLMs in producing valid itineraries based on user queries presented in natural language. While popular methods of enhancing the reasoning abilities of LLMs such as Chain of Thought, ReAct, and Reflexion achieve a meager 0%, 0.6%, and 0% with GPT3.5-Turbo respectively, our operationalization of the LLM-Modulo framework for TravelPlanning domain provides a remarkable improvement, enhancing baseline performances by 4.6x for GPT4-Turbo and even more for older models like GPT3.5-Turbo from 0% to 5%. Furthermore, we highlight the other useful roles of LLMs in the planning pipeline, as suggested in LLM-Modulo, which can be reliably operationalized such as extraction of useful critics and reformulator for critics.
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Kambhampati, Subbarao, Valmeekam, Karthik, Guan, Lin, Stechly, Kaya, Verma, Mudit, Bhambri, Siddhant, Saldyt, Lucas, Murthy, Anil
There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {\bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Arizona (0.04)
- Europe > Czechia > Prague (0.04)
- (2 more...)
Guided Demonstrations Using Automated Excuse Generation
Diehl, Maximilian, Chakraborti, Tathagata, Ramirez-Amaro, Karinne
Teaching task-level directives to robots via demonstration is a popular tool to expand the robot's capabilities to interact with its environment. While current learning from demonstration systems primarily focuses on abstracting the task-level knowledge to the robot, these systems lack the ability to understand which part of the task can be already solved given the robot's prior knowledge. Therefore, instead of only requiring demonstrations of the missing pieces, these systems will require a demonstration of the complete task, which is cumbersome, repetitive, and can discourage people from helping the robot by performing the demonstrations. Therefore, we propose to use the notion of "excuses" to identify the smallest change in the robot state that makes a task, currently not solvable by the robot, solvable -- as a means to solicit more targeted demonstrations from a human. These excuses are generated automatically using combinatorial search over possible changes that can be made to the robot's state and choosing the minimum changes that make it solvable. These excuses then serve as guidance for the demonstrator who can use it to decide what to demonstrate to the robot in order to make this requested change possible, thereby making the original task solvable for the robot without having to demonstrate it in its entirety. By working with symbolic state descriptions, the excuses can be directly communicated and intuitively understood by a human demonstrator. We show empirically and in a user study that the use of excuses reduces the demonstration time by 54% and leads to a 74% reduction in demonstration size.
- North America > United States > Massachusetts (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Towards More Likely Models for AI Planning
Caglar, Turgay, Belhaj, Sirine, Chakraborti, Tathagata, Katz, Michael, Sreedharan, Sarath
This is the first work to look at the application of large language models (LLMs) for the purpose of model space edits in automated planning tasks. To set the stage for this sangam, we explore two different flavors of model space problems that have been studied in the AI planning literature and explore the effect of an LLM on those tasks. We empirically demonstrate how the performance of an LLM contrasts with combinatorial search (CS) - an approach that has been traditionally used to solve model space tasks in planning, both with the LLM in the role of a standalone model space reasoner as well as in the role of a statistical signal in concert with the CS approach as part of a two-stage process. Our experiments show promising results suggesting further forays of LLMs into the exciting world of model space reasoning for planning tasks in the future.
- North America > United States > Colorado (0.04)
- Europe > France (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)