Goto

Collaborating Authors

 chollet


Hey Pentti, We Did It Again!: Differentiable vector-symbolic types that prove polynomial termination

Tomkins-Flanagan, Eilene, Hanley, Connor, Kelly, Mary A.

arXiv.org Artificial Intelligence

We present a typed computer language, Doug, in which all typed programs may be proved to halt in polynomial time, encoded in a vector-symbolic architecture (VSA). Doug is just an encoding of the light linear functional programming language (LLFPL) described by (Schimanski2009, ch. 7). The types of Doug are encoded using a slot-value encoding scheme based on holographic declarative memory (HDM; Kelly, 2020). The terms of Doug are encoded using a variant of the Lisp VSA defined by (Flanagan, 2024). Doug allows for some points on the embedding space of a neural network to be interpreted as types, where the types of nearby points are similar both in structure and content. Types in Doug are therefore learnable by a neural network. Following (Chollet, 2019), (Card, 1983), and (Newell, 1981), we view skill as the application of a procedure, or program of action, that causes a goal to be satisfied. Skill acquisition may therefore be expressed as program synthesis. Using Doug, we hope to describe a form of learning of skilled behaviour that follows a human-like pace of skill acquisition (i.e., substantially faster than brute force; Heathcote, 2000), exceeding the efficiency of all currently existing approaches (Kaplan, 2020; Jones, 2021; Chollet, 2024). Our approach brings us one step closer to modeling human mental representations, as they must actually exist in the brain, and those representations' acquisition, as they are actually learned.


ARC-NCA: Towards Developmental Solutions to the Abstraction and Reasoning Corpus

Guichard, Etienne, Reimers, Felix, Kvalsund, Mia, Lepperød, Mikkel, Nichele, Stefano

arXiv.org Artificial Intelligence

The Abstraction and Reasoning Corpus (ARC), later renamed ARC-AGI, poses a fundamental challenge in artificial general intelligence (AGI), requiring solutions that exhibit robust abstraction and reasoning capabilities across diverse tasks, while only few (with median count of three) correct examples are presented. While ARC-AGI remains very challenging for artificial intelligence systems, it is rather easy for humans. This paper introduces ARC-NCA, a developmental approach leveraging standard Neural Cellular Automata (NCA) and NCA enhanced with hidden memories (EngramNCA) to tackle the ARC-AGI benchmark. NCAs are employed for their inherent ability to simulate complex dynamics and emergent patterns, mimicking developmental processes observed in biological systems. Developmental solutions may offer a promising avenue for enhancing AI's problem-solving capabilities beyond mere training data extrapolation. ARC-NCA demonstrates how integrating developmental principles into computational models can foster adaptive reasoning and abstraction. We show that our ARC-NCA proof-of-concept results may be comparable to, and sometimes surpass, that of ChatGPT 4.5, at a fraction of the cost.


The Man Out to Prove How Dumb AI Still Is

The Atlantic - Technology

They want to build AI models that achieve "artificial general intelligence," or AGI--matching or exceeding the capabilities of the human mind. The difference between these two men is that Altman has suggested that his company, OpenAI, has practically built the technology already. Chollet, a French computer scientist and one of the industry's sharpest skeptics, has said that notion is "absolutely clown shoes." When I spoke with him earlier this year, Chollet told me that AI companies have long been "intellectually lazy" in suggesting that their machines are on the path to a kind of supreme knowledge. At this point, those claims are based largely on the programs' ability to pass specific tests (such as the LSAT, Advanced Placement Biology, and even an introductory sommelier exam).


The role of positional encodings in the ARC benchmark

Costa, Guilherme H. Bandeira, Freire, Miguel, Oliveira, Arlindo L.

arXiv.org Artificial Intelligence

The Abstraction and Reasoning Corpus challenges AI systems to perform abstract reasoning with minimal training data, a task intuitive for humans but demanding for machine learning models. Using CodeT5+ as a case study, we demonstrate how limitations in positional encoding hinder reasoning and impact performance. This work further examines the role of positional encoding across transformer architectures, highlighting its critical influence on models of varying sizes and configurations. Comparing several strategies, we find that while 2D positional encoding and Rotary Position Embedding offer competitive performance, 2D encoding excels in data-constrained scenarios, emphasizing its effectiveness for ARC tasks


Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI

Pfister, Rolf, Jud, Hansueli

arXiv.org Artificial Intelligence

OpenAI's o3 achieves a high score of 87.5 % on ARC-AGI, a benchmark proposed to measure intelligence. This raises the question whether systems based on Large Language Models (LLMs), particularly o3, demonstrate intelligence and progress towards artificial general intelligence (AGI). Building on the distinction between skills and intelligence made by Fran\c{c}ois Chollet, the creator of ARC-AGI, a new understanding of intelligence is introduced: an agent is the more intelligent, the more efficiently it can achieve the more diverse goals in the more diverse worlds with the less knowledge. An analysis of the ARC-AGI benchmark shows that its tasks represent a very specific type of problem that can be solved by massive trialling of combinations of predefined operations. This method is also applied by o3, achieving its high score through the extensive use of computing power. However, for most problems in the physical world and in the human domain, solutions cannot be tested in advance and predefined operations are not available. Consequently, massive trialling of predefined operations, as o3 does, cannot be a basis for AGI - instead, new approaches are required that can reliably solve a wide variety of problems without existing skills. To support this development, a new benchmark for intelligence is outlined that covers a much higher diversity of unknown tasks to be solved, thus enabling a comprehensive assessment of intelligence and of progress towards AGI.


OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI

New Scientist

OpenAI's new o3 artificial intelligence model has achieved a breakthrough high score on a prestigious AI reasoning test called the ARC Challenge, inspiring some AI fans to speculate that o3 has achieved artificial general intelligence (AGI). But even as ARC Challenge organisers described o3's achievement as a major milestone, they also cautioned that it has not won the competition's grand prize – and it is only one step on the path towards AGI, a term for hypothetical future AI with human-like intelligence. The o3 model is the latest in a line of AI releases that follow on from the large language models powering ChatGPT. "This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models," said François Chollet, an engineer at Google and the main creator of the ARC Challenge, in a blog post. How does ChatGPT work and do AI-powered chatbots "think" like us? Chollet designed the Abstraction and Reasoning Corpus (ARC) Challenge in 2019 to test how well AIs can find correct patterns linking pairs of coloured grids. Such visual puzzles are intended to make AIs demonstrate a form of general intelligence with basic reasoning capabilities.


On a measure of intelligence

Gurevich, Yuri

arXiv.org Artificial Intelligence

The measure of intelligence is the ability to change. Abstract The Fall 2024 Logic in Computer Science column of the Bulletin of EATCS is a little discussion on intelligence, measuring intelligence, and related issues, provoked by a fascinating must-read article "On the measure of intelligence" by François Chollet. The discussion includes a modicum of critique of the article. Q: Is it about psychology? Chollet is a prominent figure in AI. Q: We spoke about AI last spring. But you didn't seem to be interested in AI before that. A: This is largely correct, though I read Norbert Wiener's "Cybernetics" [18], when it was translated to Russian in 1968, and was taken with it. For a while I tried to follow cybernetics developments, at least in the USSR.


'If journalism is going up in smoke, I might as well get high off the fumes': confessions of a chatbot helper

The Guardian

For several hours a week, I write for a technology company worth billions of dollars. Alongside me are published novelists, rising academics and several other freelance journalists. The workload is flexible, the pay better than we are used to, and the assignments never run out. But what we write will never be read by anyone outside the company. We are writing for an AI.


Ted Chiang Is Wrong About AI Art

The Atlantic - Technology

Artists and writers all over the world have spent the past two years engaged in an existential battle. Generative-AI programs such as ChatGPT and DALL-E are built on work stolen from humans, and machines threaten to replace the artists and writers who made the material in the first place. Their outrage is well warranted--but their arguments don't always make sense or substantively help defend humanity. Over the weekend, the legendary science-fiction writer Ted Chiang stepped into the fray, publishing an essay in The New Yorker arguing, as the headline says, that AI "isn't going to make art." Chiang writes not simply that AI's outputs can be or are frequently lacking value but that AI cannot be used to make art, really ever, leaving no room for the many different ways someone might use the technology.


Intelligence Analysis of Language Models

Galanti, Liane, Baron, Ethan

arXiv.org Artificial Intelligence

In this project, we test the effectiveness of Large Language Models (LLMs) on the Abstraction and Reasoning Corpus (ARC) dataset. This dataset serves as a representative benchmark for testing abstract reasoning abilities, requiring a fundamental understanding of key concepts such as object identification, basic counting, and elementary geometric principles. Tasks from this dataset are converted into a prompt-based format for evaluation. Initially, we assess the models' potential through a Zero-shot approach. Subsequently, we investigate the application of the Chain-of-Thought (CoT) technique, aiming to determine its role in improving model performance. Our results suggest that, despite the high expectations placed on contemporary LLMs, these models still struggle in non-linguistic domains, even when dealing with simpler subsets of the ARC dataset. Our study is the first to concentrate on the capabilities of open-source models in this context. The code, dataset, and prompts supporting this project's findings can be found in our GitHub repository, accessible at: https://github.com/Lianga2000/LLMsOnARC.