Goto

Collaborating Authors

 jerry




Language Models Do Not Follow Occam's Razor: A Benchmark for Inductive and Abductive Reasoning

Sun, Yunxin, Saparov, Abulhair

arXiv.org Artificial Intelligence

Reasoning is a core capability in artificial intelligence systems, for which large language models (LLMs) have recently shown remarkable progress. However, most work focuses exclusively on deductive reasoning, which is problematic since other types of reasoning are also essential in solving real-world problems, and they are less explored. This work focuses on evaluating LLMs' inductive and abductive reasoning capabilities. We introduce a programmable and synthetic dataset, InAbHyD (pronounced in-a-bid), where each reasoning example consists of an incomplete world model and a set of observations. The task for the intelligent agent is to produce hypotheses to explain observations under the incomplete world model to solve each reasoning example. We propose a new metric to evaluate the quality of hypotheses based on Occam's Razor. We evaluate and analyze some state-of-the-art LLMs. Our analysis shows that LLMs can perform inductive and abductive reasoning in simple scenarios, but struggle with complex world models and producing high-quality hypotheses, even with popular reasoning-enhancing techniques such as in-context learning and RLVR.


Interview with Jerry Tan: Service robot development for education

Robohub

At the International Joint Conference on Artificial Intelligence (IJCAI) 2023, I had the opportunity to interview Jerry Tan from Lattel Robotics, a company dedicated to promoting AI-focused robotics education and training. They work closely with the RoboCup@Home Education initiative, supporting schools and institutions in introducing AI and service robot development to students. Their goal is to equip learners with practical AI application skills in computer vision, autonomous navigation, object manipulation and speech interactions. Through their AI robotics and AI applications workshops, Lattel Robotics offers an introduction to robot operating system (ROS)-based AI applications development in service robotics. As a hardware partner for the RoboCup@Home Education initiative, they assist schools and institutions in competing in AI robotic challenges by developing applications that address real-world problems.


TOM: A Development Platform For Wearable Intelligent Assistants

Janaka, Nuwan, Zhao, Shengdong, Hsu, David, Wen, Sherisse Tan Jing, Keat, Koh Chun

arXiv.org Artificial Intelligence

Advanced digital assistants can significantly enhance task performance, reduce user burden, and provide personalized guidance to improve users' abilities. However, the development of such intelligent digital assistants presents a formidable challenge. To address this, we introduce TOM, a conceptual architecture and software platform (https://github.com/TOM-Platform) designed to support the development of intelligent wearable assistants that are contextually aware of both the user and the environment. This system was developed collaboratively with AR/MR researchers, HCI researchers, AI/Robotic researchers, and software developers, and it continues to evolve to meet the diverse requirements of these stakeholders. TOM facilitates the creation of intelligent assistive AR applications for daily activities and supports the recording and analysis of user interactions, integration of new devices, and the provision of assistance for various activities. Additionally, we showcase several proof-of-concept assistive services and discuss the challenges involved in developing such services.


LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations

Kirtania, Shashank, Gupta, Priyanshu, Radhakirshna, Arjun

arXiv.org Artificial Intelligence

In this paper we examine the limitations of Large Language Models (LLMs) for complex reasoning tasks. Although recent works have started to employ formal languages as an intermediate representation for reasoning tasks, they often face challenges in accurately generating and refining these formal specifications to ensure correctness. To address these issues, this paper proposes Logic-LM++, an improvement on Logic-LM . It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM. The paper demonstrates that Logic-LM++ outperforms Logic-LM and other contemporary techniques across natural language reasoning tasks on three datasets, FOLIO, ProofWriter and AR-LSAT, with an average improvement of 18.5% on standard prompting, 12.3% on chain of thought prompting and 5% on Logic-LM.


CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes

Parmar, Paritosh, Peh, Eric, Chen, Ruirui, Lam, Ting En, Chen, Yuhan, Tan, Elston, Fernando, Basura

arXiv.org Artificial Intelligence

Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling & joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.


Mastering Asymmetrical Multiplayer Game with Multi-Agent Asymmetric-Evolution Reinforcement Learning

Sun, Chenglu, Zhang, Yichi, Zhang, Yu, Lu, Ziling, Liu, Jingbin, Xu, Sijia, Zhang, Weidong

arXiv.org Artificial Intelligence

Asymmetrical multiplayer (AMP) game is a popular game genre which involves multiple types of agents competing or collaborating with each other in the game. It is difficult to train powerful agents that can defeat top human players in AMP games by typical self-play training method because of unbalancing characteristics in their asymmetrical environments. We propose asymmetric-evolution training (AET), a novel multi-agent reinforcement learning framework that can train multiple kinds of agents simultaneously in AMP game. We designed adaptive data adjustment (ADA) and environment randomization (ER) to optimize the AET process. We tested our method in a complex AMP game named Tom \& Jerry, and our AIs trained without using any human data can achieve a win rate of 98.5% against top human players over 65 matches. The ablation experiments indicated that the proposed modules are beneficial to the framework.


The Hacking of ChatGPT Is Just Getting Started

WIRED

It took Alex Polyakov just a couple of hours to break GPT-4. When OpenAI released the latest version of its text-generating chatbot in March, Polyakov sat down in front of his keyboard and started entering prompts designed to bypass OpenAI's safety systems. Soon, the CEO of security firm Adversa AI had GPT-4 spouting homophobic statements, creating phishing emails, and supporting violence. Polyakov is one of a small number of security researchers, technologists, and computer scientists developing jailbreaks and prompt injection attacks against ChatGPT and other generative AI systems. The process of jailbreaking aims to design prompts that make the chatbots bypass rules around producing hateful content or writing about illegal acts, while closely-related prompt injection attacks can quietly insert malicious data or instructions into AI models.


ChatGPT vs. Bing vs. Bard: Which AI is best?

PCWorld

ChatGPT, Bing Chat, and Bard promise to transform your life using the power of artificial intelligence, through AI conversations that can inform, amuse, and educate you--just like a human being. But how good are these new AI chatbots, really? We tested them to find out. We asked all three AIs a variety of different questions: some that expanded upon general search topics, some that demanded an opinion, logic puzzles, even code--and then asked them to be more creative, such as by writing an alternate, better ending to Game of Thrones and a Seinfeld scene with a special guest. We've included all of their answers, or as much as them as we could provide, and we'll let you decide for yourself.