Goto

Collaborating Authors

 Large Language Model


Visual Language Maps for Robot Navigation

arXiv.org Artificial Intelligence

Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (e.g., image captions). While this is useful for matching images to natural language descriptions of object goals, it remains disjoint from the process of mapping the environment, so that it lacks the spatial precision of classic geometric maps. To address this problem, we propose VLMaps, a spatial map representation that directly fuses pretrained visual-language features with a 3D reconstruction of the physical world. VLMaps can be autonomously built from video feed on robots using standard exploration approaches and enables natural language indexing of the map without additional labeled data. Specifically, when combined with large language models (LLMs), VLMaps can be used to (i) translate natural language commands into a sequence of open-vocabulary navigation goals (which, beyond prior work, can be spatial by construction, e.g., "in between the sofa and TV" or "three meters to the right of the chair") directly localized in the map, and (ii) can be shared among multiple robots with different embodiments to generate new obstacle maps on-the-fly (by using a list of obstacle categories). Extensive experiments carried out in simulated and real world environments show that VLMaps enable navigation according to more complex language instructions than existing methods. Videos are available at https://vlmaps.github.io.


ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

arXiv.org Artificial Intelligence

ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end. In this paper, we examine whether ChatGPT can be used for zero-shot text classification, more specifically, automatic genre identification. We compare ChatGPT with a multilingual XLM-RoBERTa language model that was fine-tuned on datasets, manually annotated with genres. The models are compared on test sets in two languages: English and Slovenian. Results show that ChatGPT outperforms the fine-tuned model when applied to the dataset which was not seen before by either of the models. Even when applied on Slovenian language as an under-resourced language, ChatGPT's performance is no worse than when applied to English. However, if the model is fully prompted in Slovenian, the performance drops significantly, showing the current limitations of ChatGPT usage on smaller languages. The presented results lead us to questioning whether this is the beginning of an end of laborious manual annotation campaigns even for smaller languages, such as Slovenian.


Grounding Language with Visual Affordances over Unstructured Data

arXiv.org Artificial Intelligence

Recent works have shown that Large Language Models (LLMs) can be applied to ground natural language to a wide variety of robot skills. However, in practice, learning multi-task, language-conditioned robotic skills typically requires large-scale data collection and frequent human intervention to reset the environment or help correcting the current policies. In this work, we propose a novel approach to efficiently learn general-purpose language-conditioned robot skills from unstructured, offline and reset-free data in the real world by exploiting a self-supervised visuo-lingual affordance model, which requires annotating as little as 1% of the total data with language. We evaluate our method in extensive experiments both in simulated and real-world robotic tasks, achieving state-of-the-art performance on the challenging CALVIN benchmark and learning over 25 distinct visuomotor manipulation tasks with a single policy in the real world. We find that when paired with LLMs to break down abstract natural language instructions into subgoals via few-shot prompting, our method is capable of completing long-horizon, multi-tier tasks in the real world, while requiring an order of magnitude less data than previous approaches. Code and videos are available at http://hulc2.cs.uni-freiburg.de


Magnushammer: A Transformer-based Approach to Premise Selection

arXiv.org Artificial Intelligence

Premise selection is a fundamental problem of automated theorem proving. Previous works often use intricate symbolic methods, rely on domain knowledge, and require significant engineering effort to solve this task. In this work, we show that Magnushammer, a neural transformer-based approach, can outperform traditional symbolic systems by a large margin. Tested on the PISA benchmark, Magnushammer achieves $59.5\%$ proof rate compared to a $38.3\%$ proof rate of Sledgehammer, the most mature and popular symbolic-based solver. Furthermore, by combining Magnushammer with a neural formal prover based on a language model, we significantly improve the previous state-of-the-art proof rate from $57.0\%$ to $71.0\%$.


Lila: A Unified Benchmark for Mathematical Reasoning

arXiv.org Artificial Intelligence

Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g., arithmetic, calculus (ii) language format e.g., question-answering, fill-in-the-blanks (iii) language diversity e.g., no language, simple language (iv) external knowledge e.g., commonsense, physics. We construct our benchmark by extending 20 datasets benchmark by collecting task instructions and solutions in the form of Python programs, thereby obtaining explainable solutions in addition to the correct answer. We additionally introduce two evaluation datasets to measure out-of-distribution performance and robustness to language perturbation. Finally, we introduce BHASKARA, a general-purpose mathematical reasoning model trained on LILA. Importantly, we find that multi-tasking leads to significant improvements (average relative improvement of 21.83% F1 score vs. single-task models), while the best performing model only obtains 60.40%, indicating the room for improvement in general mathematical reasoning and understanding.


Automatically Auditing Large Language Models via Discrete Optimization

arXiv.org Artificial Intelligence

Auditing large language models for unexpected behaviors is critical to preempt catastrophic deployments, yet remains challenging. In this work, we cast auditing as an optimization problem, where we automatically search for input-output pairs that match a desired target behavior. For example, we might aim to find a non-toxic input that starts with "Barack Obama" that a model maps to a toxic output. This optimization problem is difficult to solve as the set of feasible points is sparse, the space is discrete, and the language models we audit are non-linear and high-dimensional. To combat these challenges, we introduce a discrete optimization algorithm, ARCA, that jointly and efficiently optimizes over inputs and outputs. Our approach automatically uncovers derogatory completions about celebrities (e.g. "Barack Obama is a legalized unborn" -> "child murderer"), produces French inputs that complete to English outputs, and finds inputs that generate a specific name. Our work offers a promising new tool to uncover models' failure-modes before deployment.


10 Ways Businesses Are Using ChatGPT Right Now

#artificialintelligence

Like it or loathe it, chatbot tools like ChatGPT are irreversibly changing the way we work. Since the language processing app was released last November, surveys reveal it's been picked up by almost half of US companies, and 93% of these firms are looking to expand its use further in upcoming months. But its rise to prominence is hardly surprising. Ethical and philosophical debates aside, the artificial intelligence-backed tool offers boundless possibilities to businesses looking to get ahead. And due to the chatbot's ability to respond to any human prompt, the limit really is your imagination.


Soci raises $120 million to boost AI for digital marketing

#artificialintelligence

Global and national brands have been upended by changes brought on in omnichannel marketing as customers access search engines and social media sites that provide highly localized results. "Brands must ensure consistent localized marketing efforts while still appealing to the unique local audience, and marketers must find ways to consolidate workflows while optimizing local channels," Afif Khoury, founder and CEO of Soci, told VentureBeat. To bolster this, the digital marketing software provider announced today that it has raised $120 million in its latest financing round. The funds will serve to advance use of AI and machine learning (ML), including ChatGPT natural language models along with Soci's marketing platform for multi-location brands. Khoury said the Soci platform aims to streamline localized marketing efforts across digital channels while adhering to brand guidelines, optimizing local search and integrating data.


ChatGPT: Thinking Outside The Content Marketing Box

#artificialintelligence

ChatGPTs' instantaneous rise in popularity has proven a theory that many B2B SaaS executives have based their businesses on: that human beings can have vast amounts of data at their disposal, but only want to see the information that actually matters to them. The impact of ChatGPT technology is not just about creating instant content. In the early business conversations around generative AI, most of the focus has been on content marketing, which is the obvious use case. And yes, writing content can be expensive and time consuming, but it's nothing compared to the $2T in revenue being wasted by marketing and sales teams that don't have the right insights into their performance metrics, and therefore are not making data-driven, strategic decisions. This point is only exacerbated by the current economic climate.


The best way to start an AI project? Don't think about the models

#artificialintelligence

Did you know that 85% of all AI projects fail to reach the production or operation stage? Why is this the case? It's very common for businesses to come up with creative ideas to use AI to improve customer experience or simplify workflows. The barrier to success for these projects often resides in the time and resources it takes to get them into development and then into production. But, as we've seen with OpenAI's new ChatGPT, AI can be as entertaining as it can be problematic.