AITopics | Dhulipala, Laxman

Collaborating Authors

Dhulipala, Laxman

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering

Yu, Shangdi, Shi, Jessica, Meindl, Jamison, Eisenstat, David, Ju, Xiaoen, Tavakkol, Sasan, Dhulipala, Laxman, Łącki, Jakub, Mirrokni, Vahab, Shun, Julian

arXiv.org Artificial IntelligenceNov-15-2024

We introduce the ParClusterers Benchmark Suite (PCBS) -- a collection of highly scalable parallel graph clustering algorithms and benchmarking tools that streamline comparing different graph clustering algorithms and implementations. The benchmark includes clustering algorithms that target a wide range of modern clustering use cases, including community detection, classification, and dense subgraph mining. The benchmark toolkit makes it easy to run and evaluate multiple instances of different clustering algorithms, which can be useful for fine-tuning the performance of clustering on a given task, and for comparing different clustering algorithms based on different metrics of interest, including clustering quality and running time. Using PCBS, we evaluate a broad collection of real-world graph clustering datasets. Somewhat surprisingly, we find that the best quality results are obtained by algorithms that not included in many popular graph clustering toolkits. The PCBS provides a standardized way to evaluate and judge the quality-performance tradeoffs of the active research area of scalable graph clustering algorithms. We believe it will help enable fair, accurate, and nuanced evaluation of graph clustering algorithms in the future.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.1029

Country:

Europe (0.93)
North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Results of the Big ANN: NeurIPS'23 competition

Simhadri, Harsha Vardhan, Aumüller, Martin, Ingber, Amir, Douze, Matthijs, Williams, George, Manohar, Magdalen Dobson, Baranchuk, Dmitry, Liberty, Edo, Liu, Frank, Landrum, Ben, Karjikar, Mazin, Dhulipala, Laxman, Chen, Meng, Chen, Yue, Ma, Rui, Zhang, Kai, Cai, Yuzheng, Shi, Jiayang, Chen, Yizhuo, Zheng, Weiguo, Wan, Zihao, Yin, Jie, Huang, Ben

arXiv.org Artificial IntelligenceSep-25-2024

The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect the growing complexity and diversity of workloads. Unlike prior challenges that emphasized scaling up classical ANN search ~\cite{DBLP:conf/nips/SimhadriWADBBCH21}, this competition addressed filtered search, out-of-distribution data, sparse and streaming variants of ANNS. Participants developed and submitted innovative solutions that were evaluated on new standard datasets with constrained computational resources. The results showcased significant improvements in search accuracy and efficiency over industry-standard baselines, with notable contributions from both academic and industrial teams. This paper summarizes the competition tracks, datasets, evaluation metrics, and the innovative approaches of the top-performing submissions, providing insights into the current advancements and future directions in the field of approximate nearest neighbor search.

artificial intelligence, information retrieval, natural language, (21 more...)

arXiv.org Artificial Intelligence

2409.17424

Country:

Europe (0.28)
North America > Canada (0.14)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Add feedback

Approximate Nearest Neighbor Search with Window Filters

Engels, Joshua, Landrum, Benjamin, Yu, Shangdi, Dhulipala, Laxman, Shun, Julian

arXiv.org Artificial IntelligenceFeb-1-2024

The nearest neighbor search problem has been widely studied for more than 30 years (Arya & Mount, 1993). Given Although this problem has many motivating examples, there a dataset D, the problem requires the construction of an is a dearth of papers examining it in the literature. Some vector index that can efficiently answer queries of the form "what databases analyze window search-like problem instances is the closest vector to x in D?" Solving this problem exactly as an additional feature of their system, but this analysis degrades to a brute force linear search in high dimensions is typically secondary to their main approach and too slow (Rubinstein, 2018), so instead both theoreticians and for large-scale real-world systems; as far as we are aware, practitioners focus on the relaxed c-approximate nearest we are the first to propose, analyze, and experiment with a neighbor search problem (ANNS), which asks "what is a non-trivial solution to the window search problem.

information retrieval, large language model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2402.00943

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Scaling Graph-Based ANNS Algorithms to Billion-Size Datasets: A Comparative Analysis

Dobson, Magdalen, Shen, Zheqi, Blelloch, Guy E., Dhulipala, Laxman, Gu, Yan, Simhadri, Harsha Vardhan, Sun, Yihan

arXiv.org Artificial IntelligenceMay-7-2023

Algorithms for approximate nearest-neighbor search (ANNS) have been the topic of significant recent interest in the research community. However, evaluations of such algorithms are usually restricted to a small number of datasets with millions or tens of millions of points, whereas real-world applications require algorithms that work on the scale of billions of points. Furthermore, existing evaluations of ANNS algorithms are typically heavily focused on measuring and optimizing for queries-per second (QPS) at a given accuracy, which can be hardware-dependent and ignores important metrics such as build time. In this paper, we propose a set of principled measures for evaluating ANNS algorithms which refocuses on their scalability to billion-size datasets. These measures include ability to be efficiently parallelized, build times, and scaling relationships as dataset size increases. We also expand on the QPS measure with machine-agnostic measures such as the number of distance computations per query, and we evaluate ANNS data structures on their accuracy in more demanding settings required in modern applications, such as evaluating range queries and running on out-of-distribution data. We optimize four graph-based algorithms for the billion-scale setting, and in the process provide a general framework for making many incremental ANNS graph algorithms lock-free. We use our framework to evaluate the aforementioned graph-based ANNS algorithms as well as two alternative approaches.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.04359

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Scalable Community Detection via Parallel Correlation Clustering

Shi, Jessica, Dhulipala, Laxman, Eisenstat, David, Łącki, Jakub, Mirrokni, Vahab

arXiv.org Artificial IntelligenceJul-27-2021

Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LambdaCC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. For example, on a 30-core machine with two-way hyper-threading, our implementations achieve orders of magnitude speedups over other correlation clustering baselines, and up to 28.44x speedups over our own sequential baselines while maintaining or improving quality.

artificial intelligence, data mining, objective, (19 more...)

arXiv.org Artificial Intelligence

2108.01731

Country:

Asia (0.67)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)

Add feedback

Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time

Dhulipala, Laxman, Eisenstat, David, Łącki, Jakub, Mirrokni, Vahab, Shi, Jessica

arXiv.org Artificial IntelligenceJun-10-2021

We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient $\tilde{O}(m)$ time exact algorithms for classic linkage measures, such as complete- and WPGMA-linkage, as well as other measures. Furthermore, for average-linkage, arguably the most popular variant of HAC, we provide an algorithm that runs in $\tilde{O}(n\sqrt{m})$ time. For this variant, this is the first exact algorithm that runs in subquadratic time, as long as $m=n^{2-\epsilon}$ for some constant $\epsilon > 0$. We complement this result with a simple $\epsilon$-close approximation algorithm for average-linkage in our framework that runs in $\tilde{O}(m)$ time. As an application of our algorithms, we consider clustering points in a metric space by first using $k$-NN to generate a graph from the point set, and then running our algorithms on the resulting weighted graph. We validate the performance of our algorithms on publicly available datasets, and show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2106.0561

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback