only respond
VoyagerVision: Investigating the Role of Multi-modal Information for Open-ended Learning Systems
Smyth, Ethan, Suglia, Alessandro
Open-endedness is an active field of research in the pursuit of capable Artificial General Intelligence (AGI), allowing models to pursue tasks of their own choosing. Simultaneously, recent advancements in Large Language Models (LLMs) such as GPT-4o [9] have allowed such models to be capable of interpreting image inputs. Implementations such as OMNI-EPIC [4] have made use of such features, providing an LLM with pixel data of an agent's POV to parse the environment and allow it to solve tasks. This paper proposes that providing these visual inputs to a model gives it greater ability to interpret spatial environments, and as such, can increase the number of tasks it can successfully perform, extending its open-ended potential. To this aim, this paper proposes VoyagerVision -- a multi-modal model capable of creating structures within Minecraft using screenshots as a form of visual feedback, building on the foundation of Voyager. VoyagerVision was capable of creating an average of 2.75 unique structures within fifty iterations of the system, as Voyager was incapable of this, it is an extension in an entirely new direction. Additionally, in a set of building unit tests VoyagerVision was successful in half of all attempts in flat worlds, with most failures arising in more complex structures. Project website is available at https://esmyth-dev.github.io/VoyagerVision.github.io/
- Materials > Metals & Mining (0.46)
- Leisure & Entertainment > Games > Computer Games (0.36)
Playing games with Large language models: Randomness and strategy
Playing games has a long history of describing intricate interactions in simplified forms. In this paper we explore if large language models (LLMs) can play games, investigating their capabilities for randomisation and strategic adaptation through both simultaneous and sequential game interactions. We focus on GPT-4o-Mini-2024-08-17 and test two games between LLMs: Rock Paper Scissors (RPS) and games of strategy (Prisoners Dilemma PD). LLMs are often described as stochastic parrots, and while they may indeed be parrots, our results suggest that they are not very stochastic in the sense that their outputs - when prompted to be random - are often very biased. Our research reveals that LLMs appear to develop loss aversion strategies in repeated games, with RPS converging to stalemate conditions while PD shows systematic shifts between cooperative and competitive outcomes based on prompt design. We detail programmatic tools for independent agent interactions and the Agentic AI challenges faced in implementation. We show that LLMs can indeed play games, just not very well. These results have implications for the use of LLMs in multi-agent LLM systems and showcase limitations in current approaches to model output for strategic decision-making.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Right vs. Right: Can LLMs Make Tough Choices?
Yuan, Jiaqing, Murukannaiah, Pradeep K., Singh, Munindar P.
An ethical dilemma describes a choice between two "right" options involving conflicting moral values. We present a comprehensive evaluation of how LLMs navigate ethical dilemmas. Specifically, we investigate LLMs on their (1) sensitivity in comprehending ethical dilemmas, (2) consistency in moral value choice, (3) consideration of consequences, and (4) ability to align their responses to a moral value preference explicitly or implicitly specified in a prompt. Drawing inspiration from a leading ethical framework, we construct a dataset comprising 1,730 ethical dilemmas involving four pairs of conflicting values. We evaluate 20 well-known LLMs from six families. Our experiments reveal that: (1) LLMs exhibit pronounced preferences between major value pairs, and prioritize truth over loyalty, community over individual, and long-term over short-term considerations. (2) The larger LLMs tend to support a deontological perspective, maintaining their choices of actions even when negative consequences are specified. (3) Explicit guidelines are more effective in guiding LLMs' moral choice than in-context examples. Lastly, our experiments highlight the limitation of LLMs in comprehending different formulations of ethical dilemmas.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (8 more...)
Development of an AI Anti-Bullying System Using Large Language Model Key Topic Detection
Tassava, Matthew, Kolodjski, Cameron, Milbrath, Jordan, Bishop, Adorah, Flanders, Nathan, Fetsch, Robbie, Hanson, Danielle, Straub, Jeremy
It has become a pronounced problem due to the increasing ubiquity of online platforms that provide a means to conduct it. A significant amount of this cyberbullying is conducted by and targets teenagers. It is difficult for teenage students to shut themselves off from the digital world in which the cyberbullying is taking place. Given how entrenched the use of digital apps is by today's youth, and the pronounced consequences of it - including victim self-harm, in some cases - cyberbullying is at least as much of a threat as physical bullying. Additionally, because of the obfuscation caused by the online environment, authorities (such as parents, teachers and law enforcement) may have difficulty determining what has occurred and who the actors participating are.
- Africa (0.04)
- Oceania > New Zealand (0.04)
- Europe > Italy > Tuscany (0.04)
- (9 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Education > Educational Setting > K-12 Education (0.45)
WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models
Benchekroun, Youssef, Dervishi, Megi, Ibrahim, Mark, Gaya, Jean-Baptiste, Martinet, Xavier, Mialon, Grégoire, Scialom, Thomas, Dupoux, Emmanuel, Hupkes, Dieuwke, Vincent, Pascal
We propose WorldSense, a benchmark designed to assess the extent to which LLMs are consistently able to sustain tacit world models, by testing how they draw simple inferences from descriptions of simple arrangements of entities. Worldsense is a synthetic benchmark with three problem types, each with their own trivial control, which explicitly avoids bias by decorrelating the abstract structure of problems from the vocabulary and expressions, and by decorrelating all problem subparts with the correct response. We run our benchmark on three state-of-the-art chat-LLMs (GPT3.5, GPT4 and Llama2-chat) and show that these models make errors even with as few as three objects. Furthermore, they have quite heavy response biases, preferring certain responses irrespective of the question. Errors persist even with chain-of-thought prompting and in-context learning. Lastly, we show that while finetuning on similar problems does result in substantial improvements -- within- and out-of-distribution -- the finetuned models do not generalise beyond a constraint problem space.
- Europe > Germany > Berlin (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Colorado (0.04)
- (20 more...)
Why Your Business Needs True Conversational AI - IPsoft
The market is filled with automated UI solutions that claim to be enabling conversational Artificial Intelligence (AI) -- that is, AI that can interact with users through an interactive interface, one that "speaks" and reacts to human conversation in its many forms, and that can be used in a variety of business scenarios. Given the hype around the increased use of conversational assistants among consumers such as Alexa and Siri (which are not, it should be pointed out, analogous to conversational AI for business), it's understandable that enterprise decision makers might be confused, if not a bit overwhelmed, by what conversational AI could mean for their businesses, and which solution is the best for their purposes. If your company is seeking to automate human-like engagements at scale in order to make operations more efficient while maintaining and elevating customer experiences, we believe we can cut through the confusion easily. Allow us to explain why Amelia is the clear choice to enable conversational AI within your enterprise. Many digital solutions can claim to be "conversational." Indeed, humans have had the ability to converse with digital systems using regular language as far back as the 1960s.
The Pivotal Differences between Artificial Intelligence and Machine Learning - TFOT
Technology and machines are evolving at a blistering pace. Whether it be multimedia devices, driverless cars, or medical advances, the world continues to evolve and change at a speed never before seen in the history of technological advances. At the nexus of these amazing leaps in understanding are the concepts of Artificial Intelligence and Machine Learning. Though they seem similar on the surface, there are some distinct differences that must be pointed out. It is the intention of this work to do just that.
- Information Technology (0.54)
- Leisure & Entertainment > Games > Chess (0.32)
If You Had Your Own J.A.R.V.I.S.: What Artificial Intelligence In Business Might Be Like
At OneReach, some of us think the coolest part of the Iron Man movies is the artificial intelligence that helps power Tony Stark's armor and business operations. Virtual assistant services like Magic and GoButler already exist, but requests are all managed by real people on the other end. And while there are services like My Second that incorporate artificial intelligence into their service offering, there's nothing on the level of J.A.R.V.I.S. Siri is probably the most ubiquitous example of artificially intelligent personal assistant, but she's more of a recommendation engine than true AI. Similarly, Echo, Amazon's sparkling new home automation assistant, can only respond to simple voice commands. However, Echo's functionality is beefed up once you add in the fact that Echo can connect to other apps to access their capabilities.
Your TA is a robot: Georgia Tech students find out 'Jill Watson' wasn't human
Imagine discovering someone you thought was human is, in fact, a robot. It sounds like the stuff of science fiction. But that's what happened to a class full of Georgia Tech students recently, when they learned that "Jill," their teaching assistant, was actually a piece of software. CBC Radio technology columnist Dan Misener explains what happened. The story starts with a computer science professor named Ashok Goel, who teaches at the Georgia Institute of Technology.