Goto

Collaborating Authors

 Fernando, Chrisantha


Gemini: A Family of Highly Capable Multimodal Models

arXiv.org Artificial Intelligence

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.


Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

arXiv.org Artificial Intelligence

Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.


Generative Art Using Neural Visual Grammars and Dual Encoders

arXiv.org Artificial Intelligence

Whilst there are perhaps only a few scientific methods, there seem to be almost as many artistic methods as there are artists. Artistic processes appear to inhabit the highest order of open-endedness. To begin to understand some of the processes of art making it is helpful to try to automate them even partially. In this paper, a novel algorithm for producing generative art is described which allows a user to input a text string, and which in a creative response to this string, outputs an image which interprets that string. It does so by evolving images using a hierarchical neural Lindenmeyer system, and evaluating these images along the way using an image text dual encoder trained on billions of images and their associated text from the internet. In doing so we have access to and control over an instance of an artistic process, allowing analysis of which aspects of the artistic process become the task of the algorithm, and which elements remain the responsibility of the artist.


From Language Games to Drawing Games

arXiv.org Artificial Intelligence

Sadly, no other animal represents the world with language or drawing. Early examples of drawing date back to 60,000 years ago [19], though red pigments for mark making are already found 200,000 years ago in the middle stone age [18]. "The first man to make a mammoth appear on the wall of a cave was, I am confident, amazed by what he had done" writes Gibson, because they had discovered that by means of lines they could delineate something [10] p263. What allowed humans to learn to create (visual) abstractions, e.g., the Western child's human stick figure, the Australian aboriginal top-down projections of people seated around a fireplace, the Egyptian formalism for representing things in orthographic projection with multiple station-points, and with social dominance relations transformed into size differences, or the 16th Century Japanese affine projections that have a birdseye viewpoint? Making our own abstraction creating (drawing) machines is one way to find out the answer. A wonderful start was made by Harold Cohen's abstract drawing programs "Aaron" [6]. Aaron and Harold produced beautiful and interesting abstracted drawings that looked as if they had been made by a human alone. But it had no learning, was not conditioned on looking at the world, and was an entirely hand designed production system (a complex set of hierarchical rules for drawing).


Hierarchical Representations for Efficient Architecture Search

arXiv.org Machine Learning

We explore efficient neural architecture search methods and show that a simple yet powerful evolutionary algorithm can discover new architectures with excellent performance. Our approach combines a novel hierarchical genetic representation scheme that imitates the modularized design pattern commonly adopted by human experts, and an expressive search space that supports complex topologies. Our algorithm efficiently discovers architectures that outperform a large number of manually designed models for image classification, obtaining top-1 error of 3.6% on CIFAR-10 and 20.3% when transferred to ImageNet, which is competitive with the best existing neural architecture search approaches. We also present results using random search, achieving 0.3% less top-1 accuracy on CIFAR-10 and 0.1% less on ImageNet whilst reducing the search time from 36 hours down to 1 hour.