Goto

Collaborating Authors

 Personal


LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

arXiv.org Artificial Intelligence

How to efficiently transform large language models (LLMs) into instruction followers is recently a popular research direction, while training LLM for multi-modal reasoning remains less explored. Although the recent LLaMA-Adapter demonstrates the potential to handle visual inputs with LLMs, it still cannot generalize well to open-ended visual instructions and lags behind GPT-4. In this paper, we present LLaMA-Adapter V2, a parameter-efficient visual instruction model. Specifically, we first augment LLaMA-Adapter by unlocking more learnable parameters (e.g., norm, bias and scale), which distribute the instruction-following ability across the entire LLaMA model besides adapters. Secondly, we propose an early fusion strategy to feed visual tokens only into the early LLM layers, contributing to better visual knowledge incorporation. Thirdly, a joint training paradigm of image-text pairs and instruction-following data is introduced by optimizing disjoint groups of learnable parameters. This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset. During inference, we incorporate additional expert models (e.g. captioning/OCR systems) into LLaMA-Adapter to further enhance its image understanding capability without incurring training costs. Compared to the original LLaMA-Adapter, our LLaMA-Adapter V2 can perform open-ended multi-modal instructions by merely introducing 14M parameters over LLaMA. The newly designed framework also exhibits stronger language-only instruction-following capabilities and even excels in chat interactions. Our code and models are available at https://github.com/ZrrSkywalker/LLaMA-Adapter.


"Computing and Technology Ethics: Engaging Through Science Fiction" โ€“ an interview with the authors

AIHub

Emanuelle Burton, Judy Goldsmith, Nicholas Mattei, Cory Siler and Sara-Jo Swiatek are the authors of a new book entitled: Computing and Technology Ethics: Engaging Through Science Fiction. We caught up with them to find out more about the book, what it covers, and what inspired them to use science fiction as a tool to teach about ethics. In addition to the content chapters there is a science fiction anthology at the end of the book containing 12 stories from contemporary authors including Ken Liu, T.C. Boyle, Elizabeth Bear, Paolo Bacigalupi, and Rebecca Roanhorse. The book also provides Story Frames for each story that includes an introduction and reflection questions that tie the story, the characters, and their choices to the ethical frameworks. Each of these stories is anchored in multiple places in the content chapters through what we call Story Points where that story picks up on themes and/or ideas from the chapter.


Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System

arXiv.org Artificial Intelligence

Large-scale Language Models (LLMs) are constrained by their inability to process lengthy inputs. To address this limitation, we propose the Self-Controlled Memory (SCM) system to unleash infinite-length input capacity for large-scale language models. Our SCM system is composed of three key modules: the language model agent, the memory stream, and the memory controller. The language model agent iteratively processes ultra-long inputs and stores all historical information in the memory stream. The memory controller provides the agent with both long-term memory (archived memory) and short-term memory (flash memory) to generate precise and coherent responses. The controller determines which memories from archived memory should be activated and how to incorporate them into the model input. Our SCM system can be integrated with any LLMs to enable them to process ultra-long texts without any modification or fine-tuning. Experimental results show that our SCM system enables LLMs, which are not optimized for multi-turn dialogue, to achieve multi-turn dialogue capabilities that are comparable to ChatGPT, and to outperform ChatGPT in scenarios involving ultra-long document summarization or long-term conversations. Additionally, we will supply a test set, which covers common long-text input scenarios, for evaluating the abilities of LLMs in processing long documents.~\footnote{Working in progress.}\footnote{\url{https://github.com/wbbeyourself/SCM4LLMs}}


The face of PlayStation: Shuhei Yoshida on the joy and future of video games

The Guardian

In early 1993, Shuhei Yoshida joined Sony's nascent PlayStation division as a business development guy โ€“ the first member of the team who didn't have an engineering background. When he was working with Ken Kutaragi and the other architects of the original PlayStation, and later producing games from Crash Bandicoot and Gran Turismo alongside game development legends Mark Cerny and Kazunori Yamauchi, he freely admits that he could scarcely believe his luck. When I speak to him, on the eve of receiving Bafta's prestigious fellowship award for his contribution to video games, he still seems endearingly surprised by his own success. "The people who have received [this award] before are all creators! I don't know how I fit in," he says.


Can YOU tell the difference between a real person and an AI bot?

Daily Mail - Science & tech

Popular AI chatbots like ChatGPT and Bard have been designed to replicate human speech as closely as possible. And as deep learning technology gets more and more sophisticated, it's becoming difficult to discern these computer models from real people. Now, a free online game gives you two minutes to have a conversation with someone (or something) and guess whether they're a fellow human or an AI. 'Human or not?' was inspired by the Turing Test, devised by legendary British computer scientist Alan Turing in 1950. A computer passes the so-called test when someone cannot correctly tell the difference between a response from a human and a response from an AI.


Reducing Opinion Echo-Chambers by Intelligent Placement of Moderate-Minded Agents

arXiv.org Artificial Intelligence

In the era of social media, people frequently share their own opinions online on various issues and also in the way, get exposed to others' opinions. Be it for selective exposure of news feed recommendation algorithms or our own inclination to listen to opinions that support ours, the result is that we get more and more exposed to opinions closer to ours. Further, any population is inherently heterogeneous i.e. people will hold a varied range of opinions regarding a topic and showcase a varied range of openness to get influenced by others. In this paper, we demonstrate the different behavior put forward by open- and close-minded agents towards an issue, when allowed to freely intermix and communicate. We have shown that the intermixing among people leads to formation of opinion echo chambers i.e. a small closed network of people who hold similar opinions and are not affected by opinions of people outside the network. Echo chambers are evidently harmful for a society because it inhibits free healthy communication among all and thus, prevents exchange of opinions, spreads misinformation and increases extremist beliefs. This calls for reduction in echo chambers, because a total consensus of opinion is neither possible nor is welcome. We show that the number of echo chambers depends on the number of close-minded agents and cannot be lessened by increasing the number of open-minded agents. We identify certain 'moderate'-minded agents, who possess the capability of manipulating and reducing the number of echo chambers. The paper proposes an algorithm for intelligent placement of moderate-minded agents in the opinion-time spectrum by which the opinion echo chambers can be maximally reduced. With various experimental setups, we demonstrate that the proposed algorithm fares well when compared to placement of other agents (open- or close-minded) and random placement of 'moderate'-minded agents.


The End of Recommendation Letters

The Atlantic - Technology

I was lunching with a group of fellow professors, and, as happens these days when we assemble, generative artificial intelligence was discussed. Are your students using it? What are you doing to prevent cheating? Heads were shaken in chagrin as iced teas were sipped for comfort. But then, one of my colleagues wondered: Could he use AI to generate a reference letter for a student?


A list of resources, articles, and opinion pieces relating to large language models & robotics

Robohub

Figuring out how humans and robots can collaborate to effectively carry out tasks together is a rapidly growing area of interest. For successful collaboration between humans and robots, communication is key.


Broad Recommender System: An Efficient Nonlinear Collaborative Filtering Approach

arXiv.org Artificial Intelligence

Recently, Deep Neural Networks (DNNs) have been widely introduced into Collaborative Filtering (CF) to produce more accurate recommendation results due to their capability of capturing the complex nonlinear relationships between items and users.However, the DNNs-based models usually suffer from high computational complexity, i.e., consuming very long training time and storing huge amount of trainable parameters. To address these problems, we propose a new broad recommender system called Broad Collaborative Filtering (BroadCF), which is an efficient nonlinear collaborative filtering approach. Instead of DNNs, Broad Learning System (BLS) is used as a mapping function to learn the complex nonlinear relationships between users and items, which can avoid the above issues while achieving very satisfactory recommendation performance. However, it is not feasible to directly feed the original rating data into BLS. To this end, we propose a user-item rating collaborative vector preprocessing procedure to generate low-dimensional user-item input data, which is able to harness quality judgments of the most similar users/items. Extensive experiments conducted on seven benchmark datasets have confirmed the effectiveness of the proposed BroadCF algorithm


Award-winning photograph revealed to be AI-generated image, photographer turns down prize

FOX News

Fox News correspondent Grady Trimble has the latest on fears the technology will spiral out of control on'Special Report.' A German artist who won a major prize for photography has turned down the award after revealing his work was created with help from artificial intelligence (AI). Photographer Boris Eldagsen won the creative category of the open competition for the Sony World Photography Awards 2023 with his "photograph," titled "Pseudomnesia: The Electrician." The image, which depicted an older woman holding a younger in black and white, was "the first AI generated image to win in a prestigious international Photography competition," Eldagsen said in a statement posted on his website. "How many of you knew or suspected that it was AI generated? Something about this doesn't feel right, does it? "AI images and photography should not compete with each other in an award like this.