AITopics | Jindal, Alekh

Collaborating Authors

Jindal, Alekh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sibyl: Forecasting Time-Evolving Query Workloads

Huang, Hanxian, Siddiqui, Tarique, Alotaibi, Rana, Curino, Carlo, Leeka, Jyoti, Jindal, Alekh, Zhao, Jishen, Camacho-Rodriguez, Jesus, Tian, Yuanyuan

arXiv.org Artificial IntelligenceJan-8-2024

For workload-based optimization, the input workload plays a crucial role and needs to be a good representation of the expected Database systems often rely on historical query traces to perform workload. Traditionally, historical query traces have been used as workload-based performance tuning. However, real production input workloads with the assumption that workloads are mostly workloads are time-evolving, making historical queries ineffective static. However, as we discuss in 2, many real workloads exhibit for optimizing future workloads. To address this challenge, we propose highly recurring query structures with changing patterns in both Sibyl, an end-to-end machine learning-based framework that their arrival intervals and data accesses. For instance, query templates accurately forecasts a sequence of future queries, with the entire are often shared across users, teams, and applications, but query statements, in various prediction windows. Drawing insights may be customized with different parameter values to access varying from real-workloads, we propose template-based featurization techniques data at different points in time. Consider a log analysis query and develop a stacked-LSTM with an encoder-decoder architecture that reports errors for different devices and error types: "SELECT for accurate forecasting of query workloads. We also * FROM T WHERE deviceType =? AND errorType =? AND develop techniques to improve forecasting accuracy over large prediction eventDate BETWEEN?

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3639308

2401.03723

Country:

North America > United States > California (0.14)
North America > United States > Minnesota (0.14)
Africa > Middle East > Egypt (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.93)

Add feedback

GEqO: ML-Accelerated Semantic Equivalence Detection

Haynes, Brandon, Alotaibi, Rana, Pavlenko, Anna, Leeka, Jyoti, Jindal, Alekh, Tian, Yuanyuan

arXiv.org Artificial IntelligenceJan-2-2024

Large scale analytics engines have become a core dependency for modern data-driven enterprises to derive business insights and drive actions. These engines support a large number of analytic jobs processing huge volumes of data on a daily basis, and workloads are often inundated with overlapping computations across multiple jobs. Reusing common computation is crucial for efficient cluster resource utilization and reducing job execution time. Detecting common computation is the first and key step for reducing this computational redundancy. However, detecting equivalence on large-scale analytics engines requires efficient and scalable solutions that are fully automated. In addition, to maximize computation reuse, equivalence needs to be detected at the semantic level instead of just the syntactic level (i.e., the ability to detect semantic equivalence of seemingly different-looking queries). Unfortunately, existing solutions fall short of satisfying these requirements. In this paper, we take a major step towards filling this gap by proposing GEqO, a portable and lightweight machine-learning-based framework for efficiently identifying semantically equivalent computations at scale. GEqO introduces two machine-learning-based filters that quickly prune out nonequivalent subexpressions and employs a semi-supervised learning feedback loop to iteratively improve its model with an intelligent sampling mechanism. Further, with its novel database-agnostic featurization method, GEqO can transfer the learning from one workload and database to another. Our extensive empirical evaluation shows that, on TPC-DS-like queries, GEqO yields significant performance gains-up to 200x faster than automated verifiers-and finds up to 2x more equivalences than optimizer and signature-based equivalence detection approaches.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3626710

2401.0128

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
(4 more...)

Add feedback

Deploying a Steered Query Optimizer in Production at Microsoft

Zhang, Wangda, Interlandi, Matteo, Mineiro, Paul, Qiao, Shi, Lie, Nasim Ghazanfari Karlen, Friedman, Marc, Hosn, Rafah, Patel, Hiren, Jindal, Alekh

arXiv.org Artificial IntelligenceOct-24-2022

Modern analytical workloads are highly heterogeneous and massively complex, making generic query optimizers untenable for many customers and scenarios. As a result, it is important to specialize these optimizers to instances of the workloads. In this paper, we continue a recent line of work in steering a query optimizer towards better plans for a given workload, and make major strides in pushing previous research ideas to production deployment. Along the way we solve several operational challenges including, making steering actions more manageable, keeping the costs of steering within budget, and avoiding unexpected performance regressions in production. Our resulting system, QQ-advisor, essentially externalizes the query planner to a massive offline pipeline for better exploration and specialization. We discuss various aspects of our design and show detailed results over production SCOPE workloads at Microsoft, where the system is currently enabled by default.

data mining, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3514221.3526052

2210.13625

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Phoebe: A Learning-based Checkpoint Optimizer

Zhu, Yiwen, Interlandi, Matteo, Roy, Abhishek, Das, Krishnadhan, Patel, Hiren, Bag, Malay, Sharma, Hitesh, Jindal, Alekh

arXiv.org Artificial IntelligenceOct-5-2021

Easy-to-use programming interfaces paired with cloud-scale processing engines have enabled big data system users to author arbitrarily complex analytical jobs over massive volumes of data. However, as the complexity and scale of analytical jobs increase, they encounter a number of unforeseen problems, hotspots with large intermediate data on temporary storage, longer job recovery time after failures, and worse query optimizer estimates being examples of issues that we are facing at Microsoft. To address these issues, we propose Phoebe, an efficient learning-based checkpoint optimizer. Given a set of constraints and an objective function at compile-time, Phoebe is able to determine the decomposition of job plans, and the optimal set of checkpoints to preserve their outputs to durable global storage. Phoebe consists of three machine learning predictors and one optimization module. For each stage of a job, Phoebe makes accurate predictions for: (1) the execution time, (2) the output size, and (3) the start/end time taking into account the inter-stage dependencies. Using these predictions, we formulate checkpoint optimization as an integer programming problem and propose a scalable heuristic algorithm that meets the latency requirement of the production environment. We demonstrate the effectiveness of Phoebe in production workloads, and show that we can free the temporary storage on hotspots by more than 70% and restart failed jobs 68% faster on average with minimum performance impact. Phoebe also illustrates that adding multiple sets of checkpoints is not cost-efficient, which dramatically reduces the complexity of the optimization.

data mining, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

doi: 10.14778/3476249.3476298

2110.02313

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Databases (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
(4 more...)

Add feedback