AITopics | Guha, Etash

Collaborating Authors

Guha, Etash

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Awadalla, Anas, Xue, Le, Shu, Manli, Yan, An, Wang, Jun, Purushwalkam, Senthil, Shen, Sheng, Lee, Hannah, Lo, Oscar, Park, Jae Sung, Guha, Etash, Savarese, Silvio, Schmidt, Ludwig, Choi, Yejin, Xiong, Caiming, Xu, Ran

arXiv.org Artificial IntelligenceNov-11-2024

Table 1: Comparison of open-source synthetic image-text datasets: We compare various datasets in terms of scale (number of samples), density (average number of words per sample), whether they are knowledge-augmented (meaning that the caption includes information found in image's web scraped alt-text), and the size of the captioning model used to generate the descriptions. For KALE, we create an initial pool of 100M captions from a 17B parameter model and use it to distill a 2B parameter model that matches the performance of the larger 17B model. We introduce BLIP3-KALE, a dataset of 218 million image-text pairs that advances the state of knowledge-augmented image captioning. KALE builds upon recent work in this area, particularly CapsFusion [28], which pioneered the use of large language models to fuse synthetically generated captions with alt-text to incorporate real-world knowledge.

caption, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.07461

Country: North America > United States > California (0.14)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

Archon: An Architecture Search Framework for Inference-Time Techniques

Saad-Falcon, Jon, Lafuente, Adrian Gamarra, Natarajan, Shlok, Maru, Nahum, Todorov, Hristo, Guha, Etash, Buchanan, E. Kelly, Chen, Mayee, Guha, Neel, Ré, Christopher, Mirhoseini, Azalia

arXiv.org Artificial IntelligenceOct-3-2024

Inference-time techniques are emerging as highly effective tools to enhance large language model (LLM) capabilities. However, best practices for developing systems that combine these techniques remain underdeveloped due to our limited understanding of the utility of individual inference-time techniques and the interactions between them. Additionally, efficiently and automatically searching the space of model choices, inference-time techniques, and their compositions is challenging due to the large design space. To address these challenges, we introduce Archon, a modular framework for selecting, combining, and stacking layers of inference-time techniques to construct optimized LLM systems for target benchmarks. Rather than relying on a single LLM called once, we leverage a diverse set of LLMs and inference-time techniques, creating LLM systems greater than the sum of their parts. Archon defines an extensible design space, encompassing techniques such as generation ensembling, repeated sampling, ranking, fusion, critiquing, verification, and unit testing. It transforms the problem of building LLM systems into a hyperparameter optimization objective. Given the available LLMs, inference-time techniques, and compute budget, Archon utilizes hyperparameter search techniques to discover optimized architectures for target benchmark(s). We evaluate Archon architectures across a range of instruction-following, reasoning, and coding benchmarks, including MT-Bench, Arena-Hard-Auto, AlpacaEval 2.0, MixEval, MixEval Hard, MATH, and CodeContests. Archon architectures outperform frontier models, such as GPT-4o and Claude 3.5 Sonnet, on these benchmarks, achieving an average accuracy increase of 15.1 percentage points by using all available LLMs. We make our code and datasets available publicly on Github: https://github.com/ScalingIntelligence/Archon.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.15254

Country:

North America > United States (0.28)
Asia > Middle East (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.45)
Semiconductors & Electronics (0.45)
Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

On the Diminishing Returns of Width for Continual Learning

Guha, Etash, Lakshman, Vihan

arXiv.org Artificial IntelligenceJun-18-2024

While deep neural networks have demonstrated groundbreaking performance in various settings, these models often suffer from \emph{catastrophic forgetting} when trained on new tasks in sequence. Several works have empirically demonstrated that increasing the width of a neural network leads to a decrease in catastrophic forgetting but have yet to characterize the exact relationship between width and continual learning. We design one of the first frameworks to analyze Continual Learning Theory and prove that width is directly related to forgetting in Feed-Forward Networks (FFN). Specifically, we demonstrate that increasing network widths to reduce forgetting yields diminishing returns. We empirically verify our claims at widths hitherto unexplored in prior studies where the diminishing returns are clearly observed as predicted by our theory.

artificial intelligence, diminishing return, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2403.06398

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Conformal Prediction via Regression-as-Classification

Guha, Etash, Natarajan, Shlok, Möllenhoff, Thomas, Khan, Mohammad Emtiyaz, Ndiaye, Eugene

arXiv.org Machine LearningApr-11-2024

Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classification problem and then use CP for classification to obtain CP sets for regression.~To preserve the ordering of the continuous-output space, we design a new loss function and make necessary modifications to the CP classification techniques.~Empirical results on many benchmarks shows that this simple approach gives surprisingly good results on many practical problems.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2404.08168

Country:

Europe > Middle East > Cyprus (0.14)
Europe > Finland (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback