AITopics | Jiang, Huaijun

Collaborating Authors

Jiang, Huaijun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards General and Efficient Online Tuning for Spark

Li, Yang, Jiang, Huaijun, Shen, Yu, Fang, Yide, Yang, Xiaofeng, Huang, Danqing, Zhang, Xinyi, Zhang, Wentao, Zhang, Ce, Chen, Peng, Cui, Bin

arXiv.org Artificial IntelligenceSep-4-2023

The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance. Recent studies try to employ auto-tuning techniques to solve this problem but suffer from three issues: limited functionality, high overhead, and inefficient search. In this paper, we present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. First, we introduce a generalized tuning formulation, which can support multiple tuning goals and constraints conveniently, and a Bayesian optimization (BO) based solution to solve this generalized optimization problem. Second, to avoid high overhead from additional offline evaluations in existing methods, we propose to tune parameters along with the actual periodic executions of each job (i.e., online evaluations). To ensure safety during online job executions, we design a safe configuration acquisition method that models the safe region. Finally, three innovative techniques are leveraged to further accelerate the search process: adaptive sub-space generation, approximate gradient descent, and meta-learning method. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent. The empirical results on both public benchmarks and large-scale production tasks demonstrate its superiority in terms of practicality, generality, and efficiency. Notably, this service saves an average of 57.00% memory cost and 34.93% CPU cost on 25K in-production tasks within 20 iterations, respectively.

artificial intelligence, general and efficient online tuning, machine learning, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.14778/3611540.3611548

2309.01901

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rover: An online Spark SQL tuning service via generalized transfer learning

Shen, Yu, Ren, Xinyuyang, Lu, Yupeng, Jiang, Huaijun, Xu, Huanyong, Peng, Di, Li, Yang, Zhang, Wentao, Cui, Bin

arXiv.org Artificial IntelligenceMay-29-2023

Distributed data analytic engines like Spark are common choices to process massive data in industry. However, the performance of Spark SQL highly depends on the choice of configurations, where the optimal ones vary with the executed workloads. Among various alternatives for Spark SQL tuning, Bayesian optimization (BO) is a popular framework that finds near-optimal configurations given sufficient budget, but it suffers from the re-optimization issue and is not practical in real production. When applying transfer learning to accelerate the tuning process, we notice two domain-specific challenges: 1) most previous work focus on transferring tuning history, while expert knowledge from Spark engineers is of great potential to improve the tuning performance but is not well studied so far; 2) history tasks should be carefully utilized, where using dissimilar ones lead to a deteriorated performance in production. In this paper, we present Rover, a deployed online Spark SQL tuning service for efficient and safe search on industrial workloads. To address the challenges, we propose generalized transfer learning to boost the tuning performance based on external knowledge, including expert-assisted Bayesian optimization and controlled history transfer. Experiments on public benchmarks and real-world tasks show the superiority of Rover over competitive baselines. Notably, Rover saves an average of 50.1% of the memory cost on 12k real-world Spark SQL tasks in 20 iterations, among which 76.2% of the tasks achieve a significant memory reduction of over 60%.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2302.04046

Country:

North America > United States (0.48)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.50)

Industry:

Energy > Oil & Gas (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.82)
Information Technology > Data Science > Data Mining > Big Data (0.68)

Add feedback

OpenBox: A Python Toolkit for Generalized Black-box Optimization

Jiang, Huaijun, Shen, Yu, Li, Yang, Zhang, Wentao, Zhang, Ce, Cui, Bin

arXiv.org Artificial IntelligenceApr-26-2023

Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, experimental design, and database knob tuning. However, users still face challenges when applying BBO methods to their problems at hand with existing software packages in terms of applicability, performance, and efficiency. This paper presents OpenBox, an open-source BBO toolkit with improved usability. It implements user-friendly inferfaces and visualization for users to define and manage their tasks. The modular design behind OpenBox facilitates its flexible deployment in existing systems. Experimental results demonstrate the effectiveness and efficiency of OpenBox over existing systems. The source code of OpenBox is available at https://github.com/PKU-DAIR/open-box.

evolutionary algorithm, machine learning, optimization, (13 more...)

arXiv.org Artificial Intelligence

2304.13339

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry: Transportation > Air (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.69)

Add feedback

OpenBox: A Generalized Black-box Optimization Service

Li, Yang, Shen, Yu, Zhang, Wentao, Chen, Yuanwei, Jiang, Huaijun, Liu, Mingchao, Jiang, Jiawei, Gao, Jinyang, Wu, Wentao, Yang, Zhi, Zhang, Ce, Cui, Bin

arXiv.org Artificial IntelligenceJun-6-2021

Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. However, it remains a challenge for users to apply BBO methods to their problems at hand with existing software packages, in terms of applicability, performance, and efficiency. In this paper, we build OpenBox, an open-source and general-purpose BBO service with improved usability. The modular design behind OpenBox also facilitates flexible abstraction and optimization of basic BBO components that are common in other existing systems. OpenBox is distributed, fault-tolerant, and scalable. To improve efficiency, OpenBox further utilizes "algorithm agnostic" parallelization and transfer learning. Our experimental results demonstrate the effectiveness and efficiency of OpenBox compared to existing systems.

air transportation, optimization, optimization problem, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3447548.3467061

2106.00421

Country:

Asia > China (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (1.00)
Transportation > Air (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback