AITopics | sabharwal

Collaborating Authors

sabharwal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Transformer Cookbook

Yang, Andy, Watson, Christopher, Xue, Anton, Bhattamishra, Satwik, Llarena, Jose, Merrill, William, Ferreira, Emile Dos Santos, Svete, Anej, Chiang, David

arXiv.org Artificial IntelligenceOct-2-2025

We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a problem exacerbated by a fragmented literature where key results are scattered across numerous papers. In particular, we synthesize this disparate body of findings into a curated set of recipes that demonstrate how to implement everything from basic arithmetic in feed-forward layers to complex data routing via self-attention. Our mise en place of formulations is for both newcomers seeking an accessible entry point and experts in need of a systematic reference. This unified presentation of transformer constructions provides a foundation for future work spanning theoretical research in computational complexity to empirical investigations in architecture design and interpretability.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.00368

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Merrill, William, Sabharwal, Ashish

arXiv.org Artificial IntelligenceMar-5-2025

Recent theoretical results show transformers cannot express sequential reasoning problems over long input lengths, intuitively because their computational depth is bounded. However, prior work treats the depth as a constant, leaving it unclear to what degree bounded depth may suffice for solving problems over short inputs, or how increasing the transformer's depth affects its expressive power. We address these questions by analyzing the expressive power of transformers whose depth can grow minimally with context length $n$. We show even highly uniform transformers with depth $\Theta(\log n)$ can express two important problems: recognizing regular languages, which captures state tracking abilities, and graph connectivity, which underlies multi-step reasoning. Notably, both of these problems cannot be expressed by fixed-depth transformers under standard complexity conjectures, demonstrating the expressivity benefit of growing depth. Moreover, our theory quantitatively predicts how depth must grow with input length to express these problems, showing that depth scaling is more efficient than scaling width or chain-of-thought steps. Empirically, we find our theoretical depth requirements for regular language recognition match the practical depth requirements of transformers remarkably well. Thus, our results clarify precisely how depth affects transformers' reasoning capabilities, providing potential practical insights for designing models that are better at sequential reasoning.

expressive power, residual stream, transformer, (16 more...)

arXiv.org Artificial Intelligence

2503.03961

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers

Amiri, Alireza, Huang, Xinting, Rofin, Mark, Hahn, Michael

arXiv.org Artificial IntelligenceFeb-4-2025

Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers' expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of CoT steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.02393

Country:

Europe > Germany > Saarland (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)

Add feedback

Solving Satisfiability Modulo Counting for Symbolic and Statistical AI Integration With Provable Guarantees

Li, Jinzhao, Jiang, Nan, Xue, Yexiang

arXiv.org Artificial IntelligenceDec-30-2023

Satisfiability Modulo Counting (SMC) encompasses problems that require both symbolic decision-making and statistical reasoning. Its general formulation captures many real-world problems at the intersection of symbolic and statistical Artificial Intelligence. SMC searches for policy interventions to control probabilistic outcomes. Solving SMC is challenging because of its highly intractable nature($\text{NP}^{\text{PP}}$-complete), incorporating statistical inference and symbolic reasoning. Previous research on SMC solving lacks provable guarantees and/or suffers from sub-optimal empirical performance, especially when combinatorial constraints are present. We propose XOR-SMC, a polynomial algorithm with access to NP-oracles, to solve highly intractable SMC problems with constant approximation guarantees. XOR-SMC transforms the highly intractable SMC into satisfiability problems, by replacing the model counting in SMC with SAT formulae subject to randomized XOR constraints. Experiments on solving important SMC problems in AI for social good demonstrate that XOR-SMC finds solutions close to the true optimum, outperforming several baselines which struggle to find good approximations for the intractable model counting in SMC.

constraint, probability, supplier, (16 more...)

arXiv.org Artificial Intelligence

2309.08883

Country:

North America > United States > Hawaii (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.82)

Industry:

Social Sector (0.54)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tighter Bounds on the Expressivity of Transformer Encoders

Chiang, David, Cholak, Peter, Pillay, Anand

arXiv.org Artificial IntelligenceNov-13-2023

Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.

artificial intelligence, machine learning, transformer encoder, (13 more...)

arXiv.org Artificial Intelligence

2301.10743

Country: North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

General-Purpose Question-Answering with Macaw

Tafjord, Oyvind, Clark, Peter

arXiv.org Artificial IntelligenceSep-6-2021

Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available. In response, we present Macaw, a versatile, generative question-answering (QA) system that we are making available to the community. Macaw is built on UnifiedQA, itself built on T5, and exhibits strong performance, zero-shot, on a wide variety of topics, including outperforming GPT-3 by over 10% (absolute) on Challenge300, a suite of 300 challenge questions, despite being an order of magnitude smaller (11 billion vs. 175 billion parameters). In addition, Macaw allows different permutations ("angles") of its inputs and outputs to be used, for example Macaw can take a question and produce an answer; or take an answer and produce a question; or take an answer and question, and produce multiple-choice options. We describe the system, and illustrate a variety of question types where it produces surprisingly good answers, well outside the training setup. We also identify question classes where it still appears to struggle, offering insights into the limitations of pretrained language models. Macaw is freely available, and we hope that it proves useful to the community. Macaw is available at https://github.com/allenai/macaw

acaw, dataset, explanation, (17 more...)

arXiv.org Artificial Intelligence

2109.02593

Country: North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.50)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

Bhakthavatsalam, Sumithra, Khashabi, Daniel, Khot, Tushar, Mishra, Bhavana Dalvi, Richardson, Kyle, Sabharwal, Ashish, Schoenick, Carissa, Tafjord, Oyvind, Clark, Peter

arXiv.org Artificial IntelligenceFeb-5-2021

We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARC-DA dataset addresses these concerns by converting questions to direct-answer format using a combination of crowdsourcing and expert review. The resulting dataset contains 2985 questions with a total of 8436 valid answers (questions typically have more than one valid answer). ARC-DA is one of the first DA datasets of natural questions that often require reasoning, and where appropriate question decompositions are not evident from the questions themselves. We describe the conversion approach taken, appropriate evaluation metrics, and several strong models. Although high, the best scores (81% GENIE, 61.4% F1, 63.2% ROUGE-L) still leave considerable room for improvement. In addition, the dataset provides a natural setting for new research on explanation, as many questions require reasoning to construct answers. We hope the dataset spurs further advances in complex question-answering by the community. ARC-DA is available at https://allenai.org/data/arc-da

arc-da, dataset, reasoning, (13 more...)

arXiv.org Artificial Intelligence

2102.03315

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)
Information Technology > Communications > Social Media > Crowdsourcing (0.49)

Add feedback

Role of AI soars in tackling Covid-19 pandemic

#artificialintelligenceMar-30-2020, 02:08:15 GMT

For the first time in a pandemic, Artificial Intelligence (AI) is playing a role like never before in areas ranging from diagnosing risk to doubt-clearing, from delivery of services to drug discovery in tackling the Covid-19 outbreak. While BlueDoT, a Canadian health monitoring firm that crunches flight data and news reports using AI, is being credited by international reports to be the first to warn its clients of an impending outbreak on December 31, beating countries and international developmental agencies, the Indian tech space too is buzzing with coronavirus cracking activities. CoRover, a start-up in the AI space that has earlier developed chatbots for railways ticketing platform, has now created a "video-bot" by collaborating with a doctor from Fortis Healthcare. In this platform, a real doctor from Fortis Healthcare -- not a cartoon or an invisible knowledge bank -- will take questions from people about Covid-19. Apollo Hospitals has come up with a risk assessment scanner for Covid-19, which is available in six languages and guides people about the potential risk of having the virus. The Jaipur-based Sawai Man Singh Hospital is trying out a robot, made by robot maker Club First, to serve food and medicines to patients to lower the exposure of health workers to coronavirus patients.

covid-19 pandemic, fortis healthcare, platform, (11 more...)

#artificialintelligence

Country:

Asia > India > Rajasthan > Jaipur (0.26)
Europe > Germany (0.06)
Asia > Taiwan (0.06)
Asia > China (0.06)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

Role of AI soars in tackling Covid-19 pandemic

#artificialintelligenceMar-29-2020, 09:16:24 GMT

covid-19 pandemic, fortis healthcare, platform, (11 more...)

#artificialintelligence

Country:

Asia > India > Rajasthan > Jaipur (0.26)
Europe > Germany (0.06)
Asia > Taiwan (0.06)
Asia > China (0.06)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

SLCM makes its agri warehousing call centre paperless & digital – AgriculturePost

#artificialintelligenceJan-23-2020, 11:22:16 GMT

Sohan Lal Commodity Management (SLCM), India's leading agri services solutions provider with operations across India & Myanmar, has made its agri warehousing call centre paperless. SLCM had set up its first dedicated 24X7 call centre in the early 2010 to cater to its agri-warehousing operations in both the countries. The call centre has now been digitally transformed into a paperless entity integrating Artificial Intelligence, to make it more efficient and seamless. The call centre is a part of SLCM's endeavour to provide real time technologies for managing the agriculture operations, in line with its multiple awards winning Agri Reach technology. It will have a dedicated team of customer support executives who will provide support to the field staff.

agri warehousing call centre, agriculturepost, call centre, (7 more...)

#artificialintelligence

Country:

Asia > India (0.49)
Asia > Myanmar (0.27)

Technology:

Information Technology > Architecture > Real Time Systems (0.80)
Information Technology > Artificial Intelligence (0.61)

Add feedback