Goto

Collaborating Authors

 dolly


LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning

Akgül, Ömer Faruk, Kalaycı, Yusuf Hakan, Kannan, Rajgopal, Neiswanger, Willie, Prasanna, Viktor

arXiv.org Artificial Intelligence

Large reasoning models achieve strong performance on complex tasks by generating extended chains of thought, but they often "overthink": continuing to reason long after they have enough information to answer correctly. This wastes inference-time compute and can hurt accuracy. Existing attempts to stop early either manipulate decoding with extra sampling and heuristics, rely on auxiliary verifier models, or operate only as post-hoc analysis pipelines without formal guarantees. We introduce LYNX, an online early-exit mechanism that turns a model's own hidden-state awareness into confidence-controlled stopping decisions. LYNX attaches exit decisions to naturally occurring reasoning cues (e.g., "hmm", "wait") during generation, trains a lightweight probe on hidden states at those cue tokens using supervision from forced exits, and wraps the resulting scores in split conformal prediction to obtain distribution-free control over premature exits. Crucially, we train and calibrate this probe once on a generic mathematical corpus and reuse it unchanged across benchmarks, decoding temperatures, and even non-mathematical tasks. Across three model families spanning 1.5B to 32B parameters, a single mathematically trained probe per base model yields strong accuracy--efficiency tradeoffs. On GSM8K, LYNX matches or improves baseline accuracy while reducing tokens by 40--65\%; on MATH-500 it improves accuracy by up to 12 points with roughly 35--60\% fewer tokens; on AIME 2024 it recovers baseline accuracy with more than 50\% token savings; and on CommonsenseQA, a non-math benchmark, it transfers zero-shot with modest accuracy gains and up to 70\% fewer tokens. Compared to state-of-the-art early-exit methods, LYNX offers competitive or superior Pareto frontiers while remaining fully online, requiring no proxy models at inference, and providing explicit, user-tunable confidence guarantees.


Video models are zero-shot learners and reasoners

Wiedemer, Thaddäus, Li, Yuxuan, Vicol, Paul, Gu, Shixiang Shane, Matarese, Nick, Swersky, Kevin, Kim, Been, Jaini, Priyank, Geirhos, Robert

arXiv.org Artificial Intelligence

The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today's generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo's emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.


FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE

Le, Khiem, Tran, Tuan, Hua, Ting, Chawla, Nitesh V.

arXiv.org Artificial Intelligence

Existing resource-adaptive LoRA federated fine-tuning methods enable clients to fine-tune models using compressed versions of global LoRA matrices, in order to accommodate various compute resources across clients. This compression requirement will lead to suboptimal performance due to information loss. To address this, we propose FLAME, a novel federated learning framework based on the Sparse Mixture-of-Experts (SMoE) architecture. Unlike prior approaches, FLAME retains full (uncompressed) global LoRA matrices and achieves client-side adaptability by varying the number of activated experts per client. However, incorporating SMoE into federated learning introduces unique challenges, specifically, the mismatch in output magnitude from partial expert activation and the imbalance in expert training quality across clients. FLAME tackles these challenges through a lightweight rescaling mechanism and an activation-aware aggregation scheme. Empirical results across diverse computational settings demonstrate that FLAME consistently outperforms existing methods, providing a robust and effective solution for resource-adaptive federated learning.


Action is the primary key: a categorical framework for episode description and logical reasoning

Fukada, Yoshiki

arXiv.org Artificial Intelligence

This research presents a computational framework for describing and recognizing episodes and for logical reasoning. This framework, named cognitive-logs, consists of a set of relational and graph databases. Cognitive-logs record knowledge, particularly in episodes that consist of "actions" represented by verbs in natural languages and "participants" who perform the actions. These objects are connected by arrows (morphisms) that link each action to its participant and link cause to effect. Operations based on category theory enable comparisons between episodes and deductive inferences, including abstractions of stories. One of the goals of this study is to develop a database-driven artificial intelligence. This artificial intelligence thinks like a human but possesses the accuracy and rigour of a machine. The vast capacities of databases (up to petabyte scales in current technologies) enable the artificial intelligence to store a greater volume of knowledge than neural-network based artificial intelligences. Cognitive-logs serve as a model of human cognition and designed with references to cognitive linguistics. Cognitive-logs also have the potential to model various human mind activities.


Battle of the Large Language Models: Dolly vs LLaMA vs Vicuna vs Guanaco vs Bard vs ChatGPT -- A Text-to-SQL Parsing Comparison

Sun, Shuo, Zhang, Yuchen, Yan, Jiahuan, Gao, Yuze, Ong, Donovan, Chen, Bin, Su, Jian

arXiv.org Artificial Intelligence

The success of ChatGPT has ignited an AI race, with researchers striving to develop new large language models (LLMs) that can match or surpass the language understanding and generation abilities of commercial ones. In recent times, a number of models have emerged, claiming performance near that of GPT-3.5 or GPT-4 through various instruction-tuning methods. As practitioners of Text-to-SQL parsing, we are grateful for their valuable contributions to open-source research. However, it is important to approach these claims with a sense of scrutiny and ascertain the actual effectiveness of these models. Therefore, we pit six popular large language models against each other, systematically evaluating their Text-to-SQL parsing capability on nine benchmark datasets with five different prompting strategies, covering both zero-shot and few-shot scenarios. Regrettably, the open-sourced models fell significantly short of the performance achieved by closed-source models like GPT-3.5, highlighting the need for further work to bridge the performance gap between these models.


Hello Dolly: Democratizing the magic of ChatGPT with open models

#artificialintelligence

Update Apr 12, 2023: We have released Dolly 2.0, licensed for both research and commercial use. See the new blog post here. We show that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in 30 minutes on one machine, using high-quality training data. Surprisingly, instruction-following does not seem to require the latest or largest models: our model is only 6 billion parameters, compared to 175 billion for GPT-3. We open source the code for our model (Dolly) and show how it can be re-created on Databricks.


Databricks open-sources its Dolly large language AI model

#artificialintelligence

In an attempt to open up its technology to a wider audience, enterprise software company Databricks has released Dolly, a large language model and its associated training code under an open-source licence. Despite being based on a much smaller underlying model, the company says it has ChatGPT-like functionality and can be run "in-house". The move was inspired by the success of OpenAI's natural language platform ChatGPT, which became one of the fastest-growing consumer apps within a couple of months of its release in November last year. It has since caused some of the world's largest companies including Microsoft and Google to pivot and release generative and natural language AI tools. "We show that anyone can take a dated off-the-shelf open source LLM and give it magical ChatGPT-like instruction-following ability by training it in 30 minutes on one machine, using high-quality training data," Databricks wrote in a blog post explaining the decision.


This AI newsletter is all you need #40

#artificialintelligence

With the surging demand for generative AI, this week saw preparatory developments for the next wave of AI. Companies are fast-tracking the development of AI products, and generative AI tools are closer to becoming consumer products than ever before. They are already becoming powerful assistants for writers and programmers and rapidly taking on more challenges. The open-source community is also making significant progress in running local LLMs. For instance, Facebook's LLama model has continued to be a focal point for building in the academic and open source community following the leaked weights on 4Chan.


'Killer robots' will be nothing like the movies show – here's where the real threats lie

#artificialintelligence

You might suppose Hollywood is good at predicting the future. Indeed, Robert Wallace, head of the CIA's Office of Technical Service and the US equivalent of MI6's fictional Q, has recounted how Russian spies would watch the latest Bond movie to see what technologies might be coming their way. Hollywood's continuing obsession with killer robots might therefore be of significant concern. The newest such movie is Apple TV's forthcoming sex robot courtroom drama Dolly. I never thought I'd write the phrase "sex robot courtroom drama", but there you go.


'Killer Robots' Are Already Here. They Just Don't Look Like You Think

#artificialintelligence

You might suppose Hollywood is good at predicting the future. Indeed, Robert Wallace, head of the CIA's Office of Technical Service and the US equivalent of MI6's fictional Q, has recounted how Russian spies would watch the latest Bond movie to see what technologies might be coming their way. Hollywood's continuing obsession with killer robots might therefore be of significant concern. The newest such movie is Apple TV's forthcoming sex robot courtroom drama Dolly. I never thought I'd write the phrase "sex robot courtroom drama", but there you go.