sabharwal
The Transformer Cookbook
Yang, Andy, Watson, Christopher, Xue, Anton, Bhattamishra, Satwik, Llarena, Jose, Merrill, William, Ferreira, Emile Dos Santos, Svete, Anej, Chiang, David
We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a problem exacerbated by a fragmented literature where key results are scattered across numerous papers. In particular, we synthesize this disparate body of findings into a curated set of recipes that demonstrate how to implement everything from basic arithmetic in feed-forward layers to complex data routing via self-attention. Our mise en place of formulations is for both newcomers seeking an accessible entry point and experts in need of a systematic reference. This unified presentation of transformer constructions provides a foundation for future work spanning theoretical research in computational complexity to empirical investigations in architecture design and interpretability.
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Pennsylvania (0.04)
- (3 more...)
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
Merrill, William, Sabharwal, Ashish
Recent theoretical results show transformers cannot express sequential reasoning problems over long input lengths, intuitively because their computational depth is bounded. However, prior work treats the depth as a constant, leaving it unclear to what degree bounded depth may suffice for solving problems over short inputs, or how increasing the transformer's depth affects its expressive power. We address these questions by analyzing the expressive power of transformers whose depth can grow minimally with context length $n$. We show even highly uniform transformers with depth $\Theta(\log n)$ can express two important problems: recognizing regular languages, which captures state tracking abilities, and graph connectivity, which underlies multi-step reasoning. Notably, both of these problems cannot be expressed by fixed-depth transformers under standard complexity conjectures, demonstrating the expressivity benefit of growing depth. Moreover, our theory quantitatively predicts how depth must grow with input length to express these problems, showing that depth scaling is more efficient than scaling width or chain-of-thought steps. Empirically, we find our theoretical depth requirements for regular language recognition match the practical depth requirements of transformers remarkably well. Thus, our results clarify precisely how depth affects transformers' reasoning capabilities, providing potential practical insights for designing models that are better at sequential reasoning.
- North America > United States (0.14)
- North America > Canada > Alberta (0.14)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Amiri, Alireza, Huang, Xinting, Rofin, Mark, Hahn, Michael
Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers' expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of CoT steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.
- Europe > Germany > Saarland (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)
Solving Satisfiability Modulo Counting for Symbolic and Statistical AI Integration With Provable Guarantees
Li, Jinzhao, Jiang, Nan, Xue, Yexiang
Satisfiability Modulo Counting (SMC) encompasses problems that require both symbolic decision-making and statistical reasoning. Its general formulation captures many real-world problems at the intersection of symbolic and statistical Artificial Intelligence. SMC searches for policy interventions to control probabilistic outcomes. Solving SMC is challenging because of its highly intractable nature($\text{NP}^{\text{PP}}$-complete), incorporating statistical inference and symbolic reasoning. Previous research on SMC solving lacks provable guarantees and/or suffers from sub-optimal empirical performance, especially when combinatorial constraints are present. We propose XOR-SMC, a polynomial algorithm with access to NP-oracles, to solve highly intractable SMC problems with constant approximation guarantees. XOR-SMC transforms the highly intractable SMC into satisfiability problems, by replacing the model counting in SMC with SAT formulae subject to randomized XOR constraints. Experiments on solving important SMC problems in AI for social good demonstrate that XOR-SMC finds solutions close to the true optimum, outperforming several baselines which struggle to find good approximations for the intractable model counting in SMC.
- North America > United States > Hawaii (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Social Sector (0.54)
- Energy (0.46)
Tighter Bounds on the Expressivity of Transformer Encoders
Chiang, David, Cholak, Peter, Pillay, Anand
Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.
General-Purpose Question-Answering with Macaw
Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available. In response, we present Macaw, a versatile, generative question-answering (QA) system that we are making available to the community. Macaw is built on UnifiedQA, itself built on T5, and exhibits strong performance, zero-shot, on a wide variety of topics, including outperforming GPT-3 by over 10% (absolute) on Challenge300, a suite of 300 challenge questions, despite being an order of magnitude smaller (11 billion vs. 175 billion parameters). In addition, Macaw allows different permutations ("angles") of its inputs and outputs to be used, for example Macaw can take a question and produce an answer; or take an answer and produce a question; or take an answer and question, and produce multiple-choice options. We describe the system, and illustrate a variety of question types where it produces surprisingly good answers, well outside the training setup. We also identify question classes where it still appears to struggle, offering insights into the limitations of pretrained language models. Macaw is freely available, and we hope that it proves useful to the community. Macaw is available at https://github.com/allenai/macaw
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge
Bhakthavatsalam, Sumithra, Khashabi, Daniel, Khot, Tushar, Mishra, Bhavana Dalvi, Richardson, Kyle, Sabharwal, Ashish, Schoenick, Carissa, Tafjord, Oyvind, Clark, Peter
We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARC-DA dataset addresses these concerns by converting questions to direct-answer format using a combination of crowdsourcing and expert review. The resulting dataset contains 2985 questions with a total of 8436 valid answers (questions typically have more than one valid answer). ARC-DA is one of the first DA datasets of natural questions that often require reasoning, and where appropriate question decompositions are not evident from the questions themselves. We describe the conversion approach taken, appropriate evaluation metrics, and several strong models. Although high, the best scores (81% GENIE, 61.4% F1, 63.2% ROUGE-L) still leave considerable room for improvement. In addition, the dataset provides a natural setting for new research on explanation, as many questions require reasoning to construct answers. We hope the dataset spurs further advances in complex question-answering by the community. ARC-DA is available at https://allenai.org/data/arc-da
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Texas > Sterling County (0.04)
- North America > United States > New York (0.04)
Role of AI soars in tackling Covid-19 pandemic
For the first time in a pandemic, Artificial Intelligence (AI) is playing a role like never before in areas ranging from diagnosing risk to doubt-clearing, from delivery of services to drug discovery in tackling the Covid-19 outbreak. While BlueDoT, a Canadian health monitoring firm that crunches flight data and news reports using AI, is being credited by international reports to be the first to warn its clients of an impending outbreak on December 31, beating countries and international developmental agencies, the Indian tech space too is buzzing with coronavirus cracking activities. CoRover, a start-up in the AI space that has earlier developed chatbots for railways ticketing platform, has now created a "video-bot" by collaborating with a doctor from Fortis Healthcare. In this platform, a real doctor from Fortis Healthcare -- not a cartoon or an invisible knowledge bank -- will take questions from people about Covid-19. Apollo Hospitals has come up with a risk assessment scanner for Covid-19, which is available in six languages and guides people about the potential risk of having the virus. The Jaipur-based Sawai Man Singh Hospital is trying out a robot, made by robot maker Club First, to serve food and medicines to patients to lower the exposure of health workers to coronavirus patients.
Role of AI soars in tackling Covid-19 pandemic
For the first time in a pandemic, Artificial Intelligence (AI) is playing a role like never before in areas ranging from diagnosing risk to doubt-clearing, from delivery of services to drug discovery in tackling the Covid-19 outbreak. While BlueDoT, a Canadian health monitoring firm that crunches flight data and news reports using AI, is being credited by international reports to be the first to warn its clients of an impending outbreak on December 31, beating countries and international developmental agencies, the Indian tech space too is buzzing with coronavirus cracking activities. CoRover, a start-up in the AI space that has earlier developed chatbots for railways ticketing platform, has now created a "video-bot" by collaborating with a doctor from Fortis Healthcare. In this platform, a real doctor from Fortis Healthcare -- not a cartoon or an invisible knowledge bank -- will take questions from people about Covid-19. Apollo Hospitals has come up with a risk assessment scanner for Covid-19, which is available in six languages and guides people about the potential risk of having the virus. The Jaipur-based Sawai Man Singh Hospital is trying out a robot, made by robot maker Club First, to serve food and medicines to patients to lower the exposure of health workers to coronavirus patients.
SLCM makes its agri warehousing call centre paperless & digital – AgriculturePost
Sohan Lal Commodity Management (SLCM), India's leading agri services solutions provider with operations across India & Myanmar, has made its agri warehousing call centre paperless. SLCM had set up its first dedicated 24X7 call centre in the early 2010 to cater to its agri-warehousing operations in both the countries. The call centre has now been digitally transformed into a paperless entity integrating Artificial Intelligence, to make it more efficient and seamless. The call centre is a part of SLCM's endeavour to provide real time technologies for managing the agriculture operations, in line with its multiple awards winning Agri Reach technology. It will have a dedicated team of customer support executives who will provide support to the field staff.