AITopics | Dutta, Abhratanu

Collaborating Authors

Dutta, Abhratanu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Satyrn: A Platform for Analytics Augmented Generation

Sterbentz, Marko, Barrie, Cameron, Shahi, Shubham, Dutta, Abhratanu, Hooshmand, Donna, Pack, Harper, Hammond, Kristian J.

arXiv.org Artificial IntelligenceJun-17-2024

Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% accurate claims while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.12069

Country:

North America > United States > Illinois > Lake County (0.15)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Lightweight Knowledge Representations for Automating Data Analysis

Sterbentz, Marko, Barrie, Cameron, Hooshmand, Donna, Shahi, Shubham, Dutta, Abhratanu, Pack, Harper, Zhao, Andong Li, Paley, Andrew, Einarsson, Alexander, Hammond, Kristian

arXiv.org Artificial IntelligenceOct-15-2023

The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.

artificial intelligence, data mining, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.12848

Country:

North America > United States > Illinois > Cook County (0.14)
North America > United States > Illinois > DuPage County (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Clustering Stable Instances of Euclidean k-means.

Vijayaraghavan, Aravindan, Dutta, Abhratanu, Wang, Alex

Neural Information Processing SystemsDec-31-2017

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback