AITopics | Azerbayev, Zhangir

Collaborating Authors

Azerbayev, Zhangir

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Llemma: An Open Language Model For Mathematics

Azerbayev, Zhangir, Schoelkopf, Hailey, Paster, Keiran, Santos, Marco Dos, McAleer, Stephen, Jiang, Albert Q., Deng, Jia, Biderman, Stella, Welleck, Sean

arXiv.org Artificial IntelligenceNov-30-2023

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.10631

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

Paster, Keiran, Santos, Marco Dos, Azerbayev, Zhangir, Ba, Jimmy

arXiv.org Artificial IntelligenceOct-10-2023

There is growing evidence that pretraining on high quality, carefully thought-out tokens such as code or mathematics plays an important role in improving the reasoning abilities of large language models. For example, Minerva, a PaLM model finetuned on billions of tokens of mathematical documents from arXiv and the web, reported dramatically improved performance on problems that require quantitative reasoning. However, because all known open source web datasets employ preprocessing that does not faithfully preserve mathematical notation, the benefits of large scale training on quantitive web documents are unavailable to the research community. We introduce OpenWebMath, an open dataset inspired by these works containing 14.7B tokens of mathematical webpages from Common Crawl. We describe in detail our method for extracting text and LaTeX content and removing boilerplate from HTML documents, as well as our methods for quality filtering and deduplication. Additionally, we run small-scale experiments by training 1.4B parameter language models on OpenWebMath, showing that models trained on 14.7B tokens of our dataset surpass the performance of models trained on over 20x the amount of general language data. We hope that our dataset, openly released on the Hugging Face Hub, will help spur advances in the reasoning abilities of large language models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2310.06786

Country:

North America > United States (0.93)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Explicit Knowledge Transfer for Weakly-Supervised Code Generation

Azerbayev, Zhangir, Ni, Ansong, Schoelkopf, Hailey, Radev, Dragomir

arXiv.org Artificial IntelligenceJun-7-2023

Large language models (LLMs) can acquire strong code-generation capabilities through few-shot learning. In contrast, supervised fine-tuning is still needed for smaller models to achieve good performance. Such fine-tuning demands a large number of task-specific NL-code pairs, which are expensive to obtain. In this paper, we attempt to transfer the code generation ability of an LLM to a smaller model with the aid of weakly-supervised data. More specifically, we propose explicit knowledge transfer (EKT), which uses the few-shot capabilities of a teacher LLM to create NL-code pairs that we then filter for correctness and fine-tune the student on. We evaluate EKT on the task of generating code solutions to math word problems from the GSM8k dataset. We find that EKT not only yields better performance than training with expert iteration, but also outperforms knowledge distillation, another form of knowledge transfer. A GPT-Neo 1.3B model trained using EKT with a GPT-J teacher achieves a 12.4% pass@100 on GSM8k, while the same student and teacher trained with knowledge distillation yield only a 3.7% pass@100. We also show that it is possible for a student model to outperform the teacher using EKT.

expert iteration, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.1674

Country:

Oceania > Australia (0.14)
North America > Canada (0.14)

Genre: Research Report (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

Azerbayev, Zhangir, Piotrowski, Bartosz, Schoelkopf, Hailey, Ayers, Edward W., Radev, Dragomir, Avigad, Jeremy

arXiv.org Artificial IntelligenceFeb-23-2023

We introduce ProofNet, a benchmark for autoformalization and formal proving of undergraduate-level mathematics. The ProofNet benchmarks consists of 371 examples, each consisting of a formal theorem statement in Lean 3, a natural language theorem statement, and a natural language proof. The problems are primarily drawn from popular undergraduate pure mathematics textbooks and cover topics such as real and complex analysis, linear algebra, abstract algebra, and topology. We intend for ProofNet to be a challenging benchmark that will drive progress in autoformalization and automatic theorem proving. We report baseline results on statement autoformalization via in-context learning. Moreover, we introduce two novel statement autoformalization methods: prompt retrieval and distilled backtranslation.

logic & formal reasoning, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.12433

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)

Add feedback

Learning a Metacognition for Object Detection

Berke, Marlene, Belledonne, Mario, Azerbayev, Zhangir, Jara-Ettinger, Julian

arXiv.org Artificial IntelligenceOct-6-2021

In contrast to object recognition models, humans do not blindly trust their perception when building representations of the world, instead recruiting metacognition to detect percepts that are unreliable or false, such as when we realize that we mistook one object for another. We propose METAGEN, an unsupervised model that enhances object recognition models through a metacognition. Given noisy output from an object-detection model, METAGEN learns a meta-representation of how its perceptual system works and uses it to infer the objects in the world responsible for the detections. METAGEN achieves this by conditioning its inference on basic principles of objects that even human infants understand (known as Spelke principles: object permanence, cohesion, and spatiotemporal continuity). We test METAGEN on a variety of state-of-the-art object detection neural networks. We find that METAGEN quickly learns an accurate metacognitive representation of the neural network, and that this improves detection accuracy by filling in objects that the detection model missed and removing hallucinated objects. This approach enables generalization to out-of-sample data and outperforms comparison models that lack a metacognition. Learning accurate representations of the world is critical for prediction, inference, and planning in complex environments (Lake et al., 2017). In human vision, these representations are generated by a perceptual system that transforms light entering the retina into representations of the physical space and the objects in it (Kar & DiCarlo, 2021; Güçlü & van Gerven, 2017).

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2110.03105

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback