AITopics | Rush, Alexander M

Collaborating Authors

Rush, Alexander M

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Turn Code Generation Through Single-Step Rewards

Jain, Arnav Kumar, Gonzalez-Pumariega, Gonzalo, Chen, Wayne, Rush, Alexander M, Zhao, Wenting, Choudhury, Sanjiban

arXiv.org Artificial IntelligenceFeb-27-2025

We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple yet scalable approach, $\mu$Code, that solves multi-turn code generation using only single-step rewards. Our key insight is that code generation is a one-step recoverable MDP, where the correct code can be recovered from any intermediate code state in a single turn. $\mu$Code iteratively trains both a generator to provide code solutions conditioned on multi-turn execution feedback and a verifier to score the newly generated code. Experimental evaluations show that our approach achieves significant improvements over the state-of-the-art baselines. We provide analysis of the design choices of the reward models and policy, and show the efficacy of $\mu$Code at utilizing the execution feedback. Our code is available at https://github.com/portal-cornell/muCode.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2502.2038

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback

Commit0: Library Generation from Scratch

Zhao, Wenting, Jiang, Nan, Lee, Celine, Chiu, Justin T, Cardie, Claire, Gallé, Matthias, Rush, Alexander M

arXiv.org Artificial IntelligenceDec-2-2024

Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests, with the goal of producing an implementation of this API accordingly. The implementation is validated through running these unit tests. Our experiments demonstrate that while current agents can pass some unit tests, none can yet fully reproduce full libraries. Results also show that interactive feedback is quite useful for models to generate code that passes more unit tests, validating the benchmarks that facilitate its use. AI agents have been increasing rapidly in ability, particularly in domains such as problem-solving, math, and coding. Tasks related to software development have been particularly promising areas due to both their clarity of evaluation and economic value. This has motivated the release of several coding benchmarks in recent years (Hendrycks et al., 2021a; Chen et al., 2021; Zhuo et al., 2024). A notable example is SWE-bench (Jimenez et al., 2024), which assesses the ability of agents to generate patches to resolve real-world GitHub issues. While critical, these tasks generally remain within the skill set of an experienced software engineer. If LLM systems continue to improve at current rates, these tasks will be completely solvable. We are interested in benchmarks that exist further beyond both the frontier of expert human ability as well as current model ability.

large language model, machine learning, programming language, (20 more...)

arXiv.org Artificial Intelligence

2412.01769

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

Akhauri, Yash, AbouElhamayed, Ahmed F, Dotzel, Jordan, Zhang, Zhiru, Rush, Alexander M, Huda, Safeen, Abdelfattah, Mohamed S

arXiv.org Artificial IntelligenceJun-24-2024

The high power consumption and latency-sensitive deployments of large language models (LLMs) have motivated techniques like quantization and sparsity. Contextual sparsity, where the sparsity pattern is input-dependent, is crucial in LLMs because the permanent removal of attention heads or neurons from LLMs can significantly degrade accuracy. Prior work has attempted to model contextual sparsity using neural networks trained to predict activation magnitudes, which can be used to dynamically prune structures with low predicted activation magnitude. In this paper, we look beyond magnitude-based pruning criteria to assess attention head and neuron importance in LLMs. We developed a novel predictor called ShadowLLM, which can shadow the LLM behavior and enforce better sparsity patterns, resulting in over 15% improvement in end-to-end accuracy without increasing latency compared to previous methods. ShadowLLM achieves up to a 20\% speed-up over the state-of-the-art DejaVu framework. These enhancements are validated on models with up to 30 billion parameters. Our code is available at \href{https://github.com/abdelfattah-lab/shadow_llm/}{ShadowLLM}.

large language model, natural language, predictor, (16 more...)

arXiv.org Artificial Intelligence

2406.16635

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MambaByte: Token-free Selective State Space Model

Wang, Junxiong, Gangavarapu, Tushaar, Yan, Jing Nathan, Rush, Alexander M

arXiv.org Artificial IntelligenceJan-24-2024

Token-free language models learn directly from raw bytes and remove the bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences, and standard autoregressive Transformers scale poorly in such settings. We experiment with MambaByte, a token-free adaptation of the Mamba state space model, trained autoregressively on byte sequences. Our experiments indicate the computational efficiency of MambaByte compared to other byte-level models. We also find MambaByte to be competitive with and even outperform state-of-the-art subword Transformers. Furthermore, owing to linear scaling in length, MambaByte benefits from fast inference compared to Transformers. Our findings establish the viability of MambaByte in enabling token-free language modeling.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.1366

Country:

North America > Canada (0.46)
North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Ground > Rail (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback