Goto

Collaborating Authors

 Personal


How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis

arXiv.org Artificial Intelligence

Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources. With rapidly growing interest in using large language models (LLMs) to act as agents on behalf of human users, such LLM agents would also need to be able to negotiate. In this paper, we study how well LLMs can negotiate with each other. We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents. We implemented three types of scenarios in NegotiationArena to assess LLM's behaviors in allocating shared resources (ultimatum games), aggregate resources (trading games) and buy/sell goods (price negotiations). Each scenario allows for multiple turns of flexible dialogues between LLM agents to allow for more complex negotiations. Interestingly, LLM agents can significantly boost their negotiation outcomes by employing certain behavioral tactics. For example, by pretending to be desolate and desperate, LLMs can improve their payoffs by 20\% when negotiating against the standard GPT-4. We also quantify irrational negotiation behaviors exhibited by the LLM agents, many of which also appear in humans. Together, \NegotiationArena offers a new environment to investigate LLM interactions, enabling new insights into LLM's theory of mind, irrationality, and reasoning abilities.


Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations

arXiv.org Artificial Intelligence

Long-form generations from large language models (LLMs) contains a mix of factual and non-factual claims, making evaluating factuality difficult. To evaluate factual precision of long-form generations in a more fine-grained way, prior works propose to decompose long-form generations into multiple verifiable facts and verify those facts independently. The factuality of the generation is the proportion of verifiable facts among all the facts. Such methods assume that combining factual claims forms a factual paragraph. This paper shows that the assumption can be violated due to entity ambiguity. We show that LLMs can generate paragraphs that contain verifiable facts, but the facts are combined to form a non-factual paragraph due to entity ambiguity. We further reveal that existing factual precision metrics, including FActScore and citation recall, cannot properly evaluate the factuality of these non-factual paragraphs. To address this, we introduce an enhanced metric, D-FActScore, specifically designed for content with ambiguous entities. We evaluate the D-FActScores of people biographies generated with retrieval-augmented generation (RAG). We show that D-FActScore can better assess the factuality of paragraphs with entity ambiguity than FActScore. We also find that four widely used open-source LLMs tend to mix information of distinct entities to form non-factual paragraphs.


Quadratic Time-Frequency Analysis of Vibration Signals for Diagnosing Bearing Faults

arXiv.org Artificial Intelligence

Diagnosis of bearing faults is paramount to reducing maintenance costs and operational breakdowns. Bearing faults are primary contributors to machine vibrations, and analyzing their signal morphology offers insights into their health status. Unfortunately, existing approaches are optimized for controlled environments, neglecting realistic conditions such as time-varying rotational speeds and the vibration's non-stationary nature. This paper presents a fusion of time-frequency analysis and deep learning techniques to diagnose bearing faults under time-varying speeds and varying noise levels. First, we formulate the bearing fault-induced vibrations and discuss the link between their non-stationarity and the bearing's inherent and operational parameters. We also elucidate quadratic time-frequency distributions and validate their effectiveness in resolving distinctive dynamic patterns associated with different bearing faults. Based on this, we design a time-frequency convolutional neural network (TF-CNN) to diagnose various faults in rolling-element bearings. Our experimental findings undeniably demonstrate the superior performance of TF-CNN in comparison to recently developed techniques. They also assert its versatility in capturing fault-relevant non-stationary features that couple with speed changes and show its exceptional resilience to noise, consistently surpassing competing methods across various signal-to-noise ratios and performance metrics. Altogether, the TF-CNN achieves substantial accuracy improvements up to 15%, in severe noise conditions.


Confessions of an AI Clickbait Kingpin

WIRED

"I'm not a fan of AI," Nebojša Vujinović Vujo says. The admission surprises me: He has built a bustling business by snapping up abandoned news outlets and other websites and stuffing them full of algorithmically generated articles. Although he accepts that his model rankles writers and readers alike, he says he's simply embracing an unstoppable new tool--large language models--in the same way people rationally swapped horse-drawn buggies for gas-powered vehicles. They're making my planet bad," he says. I connected with Vujo after digging into the strange afterlife of indie women's blog The Hairpin, which shut down in 2018. In place of the voicey, funny blog posts it was known for, the site began churning out AI-generated, search-engine-optimized pablum about dream interpretations and painfully generic relationship advice like "effective communication is vital." When I emailed an address listed on the zombie site's About Us page, Vujo responded, claiming that it was just one of more than 2,000 sites he operates, in an AI-content-fueled fiefdom built by acquiring once-popular domains fallen on hard times. He's the CEO of the digital marketing firm Shantel, which monetizes its AI-populated sites through programmatic ads, sponsored content, and selling the placement of "backlinks" to website owners trying to boost their credibility with search engines. He often targets distressed media sites because they have built-in audiences and a history of ranking highly in search results. The foundation of that business is a long-established practice known as domain squatting--buying up web domains that once belonged to established brands and profiting off their reputations with Google and other search engines. Lily Ray, senior director of SEO at the marketing agency Ampsive, calls it "the underbelly of the SEO industry." But Vujo is part of a wave of entrepreneurs giving this old trade a new twist by using generative AI. It's dusk where I live in Chicago when I talk via Zoom with Nebojša Vujinović Vujo. It's midnight in Belgrade, Serbia, where he lives with his girlfriend and their toddler, but he's wide awake and chatty. Vujo attributes his erratic sleep schedule to years of late nights working as a DJ and still makes music--he likes to mix pop with Balkan folk and is working on a new song called "Fat Lady." But right now he's eager to talk, human-to-human, about his AI-fueled hustle. He gets why writers are unhappy that their work has been erased and replaced by clickbait. But he defends his choices, pointing out that his life has been tougher than that of the average American blogger. Although ethnically Serbian, Vujo was born in what is now known as Bosnia and Herzegovina, and his family fled during the breakup of Yugoslavia. "I had two wars I escaped.


On the Standardization of Behavioral Use Clauses and Their Adoption for Responsible Licensing of AI

arXiv.org Artificial Intelligence

Growing concerns over negligent or malicious uses of AI have increased the appetite for tools that help manage the risks of the technology. In 2018, licenses with behaviorial-use clauses (commonly referred to as Responsible AI Licenses) were proposed to give developers a framework for releasing AI assets while specifying their users to mitigate negative applications. As of the end of 2023, on the order of 40,000 software and model repositories have adopted responsible AI licenses licenses. Notable models licensed with behavioral use clauses include BLOOM (language) and LLaMA2 (language), Stable Diffusion (image), and GRID (robotics). This paper explores why and how these licenses have been adopted, and why and how they have been adapted to fit particular use cases. We use a mixed-methods methodology of qualitative interviews, clustering of license clauses, and quantitative analysis of license adoption. Based on this evidence we take the position that responsible AI licenses need standardization to avoid confusing users or diluting their impact. At the same time, customization of behavioral restrictions is also appropriate in some contexts (e.g., medical domains). We advocate for ``standardized customization'' that can meet users' needs and can be supported via tooling.


ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12

arXiv.org Artificial Intelligence

As Computational Thinking (CT) continues to permeate younger age groups in K-12 education, established CT platforms such as Scratch face challenges in catering to these younger learners, particularly those in the elementary school (ages 6-12). Through formative investigation with Scratch experts, we uncover three key obstacles to children's autonomous Scratch learning: artist's block in project planning, bounded creativity in asset creation, and inadequate coding guidance during implementation. To address these barriers, we introduce ChatScratch, an AI-augmented system to facilitate autonomous programming learning for young children. ChatScratch employs structured interactive storyboards and visual cues to overcome artist's block, integrates digital drawing and advanced image generation technologies to elevate creativity, and leverages Scratch-specialized Large Language Models (LLMs) for professional coding guidance. Our study shows that, compared to Scratch, ChatScratch efficiently fosters autonomous programming learning, and contributes to the creation of high-quality, personally meaningful Scratch projects for children.


A list of resources, articles, and opinion pieces relating to generative AI models – February 2024 update

AIHub

We've collected some of the articles, opinion pieces, videos and resources relating to generative AI models. We periodically update this list to add further resources of interest. This article represents the fourth in the series.


The World of Generative AI: Deepfakes and Large Language Models

arXiv.org Artificial Intelligence

The latest development in artificial intelligence (AI), chatbots, the product of generative AI, has captivated the public in the last two years. But it similarly poses an unprecedented challenge and can have potentially unwanted effects on our lives. OpenAI released the chatbot ChatGPT on November 30, 2022. The overwhelming response of the public towards ChatGPT usage pushed Google to release Bard, ChatGPT's rival, and Microsoft to release AI-powered Bing. But the recent GPT-4 topped the list as it has more capabilities than any other existing chatbot. Being LLM-based, these chatbots create synthetic media with the intention of creating better content, enhanced quality, or professional voices. The capabilities of such chatbots raise questions on the ethical use of AI. In the meantime, deepfakes, which are high-quality AI-generated fake videos, have been circulating online. Synthetically generated deepfake videos have exceeded acceptable limits in terms of reality distortion.


Inside OpenAI's Plan to Make AI More 'Democratic'

TIME - Tech

He was surrounded by seven staff from the world's leading artificial intelligence lab, which had launched ChatGPT a few months earlier. One of them was Wojciech Zaremba, an OpenAI co-founder. For over a decade, Megill had been toiling in relative obscurity as the co-founder of Polis, a nonprofit open-source tech platform for carrying out public deliberations. Democracy, in Megill's view, had barely evolved in hundreds of years even as the world around it had transformed unrecognizably. Each voter has a multitude of beliefs they must distill down into a single signal: one vote, every few years. The heterogeneity of every individual gets lost and distorted, with the result that democratic systems often barely reflect the will of the people and tend toward polarization.


Transductive Reward Inference on Graph

arXiv.org Artificial Intelligence

In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too costly or unethical and the reward functions are rarely accessible, such as in healthcare and robotics. Our research focuses on developing a reward inference method based on the contextual properties of information propagation on graphs that capitalizes on a constrained number of human reward annotations to infer rewards for unlabelled data. We leverage both the available data and limited reward annotations to construct a reward propagation graph, wherein the edge weights incorporate various influential factors pertaining to the rewards. Subsequently, we employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data. Furthermore, we establish the existence of a fixed point during several iterations of the transductive inference process and demonstrate its at least convergence to a local optimum. Empirical evaluations on locomotion and robotic manipulation tasks validate the effectiveness of our approach. The application of our inferred rewards improves the performance in offline reinforcement learning tasks.