AITopics | Cao, Jian

Collaborating Authors

Cao, Jian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Real-Time Decision-Making for Digital Twin in Additive Manufacturing with Model Predictive Control using Time-Series Deep Neural Networks

Chen, Yi-Ping, Karkaria, Vispi, Tsai, Ying-Kuan, Rolark, Faith, Quispe, Daniel, Gao, Robert X., Cao, Jian, Chen, Wei

arXiv.org Artificial IntelligenceJan-10-2025

Digital Twin-a virtual replica of a physical system enabling real-time monitoring, model updating, prediction, and decision-making-combined with recent advances in machine learning (ML), offers new opportunities for proactive control strategies in autonomous manufacturing. However, achieving real-time decision-making with Digital Twins requires efficient optimization driven by accurate predictions of highly nonlinear manufacturing systems. This paper presents a simultaneous multi-step Model Predictive Control (MPC) framework for real-time decision-making, using a multi-variate deep neural network (DNN), named Time-Series Dense Encoder (TiDE), as the surrogate model. Different from the models in conventional MPC which only provide one-step ahead prediction, TiDE is capable of predicting future states within the prediction horizon in one shot (multi-step), significantly accelerating MPC. Using Directed Energy Deposition additive manufacturing as a case study, we demonstrate the effectiveness of the proposed MPC in achieving melt pool temperature tracking to ensure part quality, while reducing porosity defects by regulating laser power to maintain melt pool depth constraints. In this work, we first show that TiDE is capable of accurately predicting melt pool temperature and depth. Second, we demonstrate that the proposed MPC achieves precise temperature tracking while satisfying melt pool depth constraints within a targeted dilution range (10%-30%), reducing potential porosity defects. Compared to the PID controller, MPC results in smoother and less fluctuating laser power profiles with competitive or superior melt pool temperature control performance. This demonstrates MPC's proactive control capabilities, leveraging time-series prediction and real-time optimization, positioning it as a powerful tool for future Digital Twin applications and real-time process optimization in manufacturing.

machine learning, melt pool temperature, real time system, (17 more...)

arXiv.org Artificial Intelligence

2501.07601

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey

Gu, Yang, You, Hengyu, Cao, Jian, Yu, Muran, Fan, Haoran, Qian, Shiyou

arXiv.org Artificial IntelligenceDec-25-2024

In the era of big data, machine learning (ML) workflows have become essential across various sectors for processing and analyzing large-scale data Xin et al. [2021], Nikitin et al. [2022]. To support the development and sharing of ML workflows, numerous repositories have been established, showcasing diverse paradigms for data analysis. For instance, KNIME offers a repository with over 25,000 workflows and 2,200 components Ordenes and Silipo [2021], providing a comprehensive collection of rigorously tested, practical models complete with detailed specifications. However, despite the availability of these resources, manually constructing and optimizing workflows to meet complex task requirements remains a knowledge-intensive and time-consuming challenge for most people. The advent of Large Language Models (LLMs) has recently revolutionized artificial intelligence (AI) and ML, delivering advanced capabilities in natural language understanding and generation Hollmann et al. [2024], Wang et al. [2024a]. Models such as OpenAI's GPT-4 Achiam et al. [2023] and Meta AI's LLaMA-3 Touvron et al. [2023] have demonstrated exceptional performance across a wide range of natural language processing (NLP) tasks, thanks to their extensive training on large-scale text datasets. Additionally, multimodal LLMs Hu et al. [2024], Tai et al. [2024], Luo et al. [2024], which incorporate various data types like audio and images, allow for richer interactions by processing and generating non-textual information. Their impressive capabilities have led to widespread adoption across multiple domains Gu et al. [2023], Klievtsova et al. [2023], Zhang et al. [2023a].

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.10478

Country:

Asia (0.46)
North America > United States (0.46)

Genre: Workflow (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LogLLM: Log-based Anomaly Detection Using Large Language Models

Guan, Wei, Cao, Jian, Qian, Shiyou, Gao, Jianqi

arXiv.org Artificial IntelligenceNov-13-2024

Software systems often record important runtime information in logs to help with troubleshooting. Log-based anomaly detection has become a key research area that aims to identify system issues through log data, ultimately enhancing the reliability of software systems. Traditional deep learning methods often struggle to capture the semantic information embedded in log data, which is typically organized in natural language. In this paper, we propose LogLLM, a log-based anomaly detection framework that leverages large language models (LLMs). LogLLM employs BERT for extracting semantic vectors from log messages, while utilizing Llama, a transformer decoder-based model, for classifying log sequences. Additionally, we introduce a projector to align the vector representation spaces of BERT and Llama, ensuring a cohesive understanding of log semantics. Unlike conventional methods that require log parsers to extract templates, LogLLM preprocesses log messages with regular expressions, streamlining the entire process. Our framework is trained through a novel three-stage procedure designed to enhance performance and adaptability. Experimental results across four public datasets demonstrate that LogLLM outperforms state-of-the-art methods. Even when handling unstable logs, it effectively captures the semantic meaning of log messages and detects anomalies accurately.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2411.08561

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (1.00)

Industry: Energy (0.54)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FedL2G: Learning to Guide Local Training in Heterogeneous Federated Learning

Zhang, Jianqing, Liu, Yang, Hua, Yang, Cao, Jian, Yang, Qiang

arXiv.org Artificial IntelligenceOct-8-2024

Data and model heterogeneity are two core issues in Heterogeneous Federated Learning (HtFL). In scenarios with heterogeneous model architectures, aggregating model parameters becomes infeasible, leading to the use of prototypes (i.e., class representative feature vectors) for aggregation and guidance. However, they still experience a mismatch between the extra guiding objective and the client's original local objective when aligned with global prototypes. With theoretical guarantees, FedL2G efficiently implements the learning-to-guide process using only first-order derivatives w.r.t. We conduct extensive experiments on two data heterogeneity and six model heterogeneity settings using 14 heterogeneous model architectures (e.g., CNNs and ViTs) to demonstrate FedL2G's superior performance compared to six counterparts. With the rapid development of AI techniques (Touvron et al., 2023; Achiam et al., 2023), public data has been consumed gradually, raising the need to access local data inside devices or institutions (Ye et al., 2024). However, directly using local data often raises privacy concerns (Nguyen et al., 2021). Federated Learning (FL) is a promising privacy-preserving approach that enables collaborative model training across multiple clients (devices or institutions) in a distributed manner without the need to move the actual data outside clients (Kairouz et al., 2019; Li et al., 2020).

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.0649

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.50)

Industry:

Education (0.83)
Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models

Guan, Wei, Cao, Jian, Gao, Jianqi, Zhao, Haiyan, Qian, Shiyou

arXiv.org Artificial IntelligenceJun-22-2024

Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anomaly detection methods treat a trace (i.e., process instance) as multiple event pairs, disrupting long-distance dependencies. In this paper, we introduce DABL, a novel approach for detecting semantic anomalies in business processes using large language models (LLMs). We collect 143,137 real-world process models from various domains. By generating normal traces through the playout of these process models and simulating both ordering and exclusion anomalies, we fine-tune Llama 2 using the resulting log. Through extensive experiments, we demonstrate that DABL surpasses existing state-of-the-art semantic anomaly detection methods in terms of both generalization ability and learning of given processes. Users can directly apply DABL to detect semantic anomalies in their own datasets without the need for additional training. Furthermore, DABL offers the capability to interpret the causes of anomalies in natural language, providing valuable insights into the detected anomalies.

anomaly, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2406.15781

Country: North America > United States (0.68)

Genre: Research Report > Promising Solution (0.66)

Industry: Banking & Finance (0.32)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Engineering software 2.0 by interpolating neural networks: unifying training, solving, and calibration

Park, Chanwook, Saha, Sourav, Guo, Jiachen, Xie, Xiaoyu, Mojumder, Satyajit, Bessa, Miguel A., Qian, Dong, Chen, Wei, Wagner, Gregory J., Cao, Jian, Liu, Wing Kam

arXiv.org Artificial IntelligenceApr-22-2024

The evolution of artificial intelligence (AI) and neural network theories has revolutionized the way software is programmed, shifting from a hard-coded series of codes to a vast neural network. However, this transition in engineering software has faced challenges such as data scarcity, multi-modality of data, low model accuracy, and slow inference. Here, we propose a new network based on interpolation theories and tensor decomposition, the interpolating neural network (INN). Instead of interpolating training data, a common notion in computer science, INN interpolates interpolation points in the physical space whose coordinates and values are trainable. It can also extrapolate if the interpolation points reside outside of the range of training data and the interpolation functions have a larger support domain. INN features orders of magnitude fewer trainable parameters, faster training, a smaller memory footprint, and higher model accuracy compared to feed-forward neural networks (FFNN) or physics-informed neural networks (PINN). INN is poised to usher in Engineering Software 2.0, a unified neural network that spans various domains of space, time, parameters, and initial/boundary conditions. This has previously been computationally prohibitive due to the exponentially growing number of trainable parameters, easily exceeding the parameter size of ChatGPT, which is over 1 trillion. INN addresses this challenge by leveraging tensor decomposition and tensor product, with adaptable network architecture.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2404.10296

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning

Zhang, Jianqing, Liu, Yang, Hua, Yang, Cao, Jian

arXiv.org Artificial IntelligenceMar-23-2024

Heterogeneous Federated Learning (HtFL) enables collaborative learning on multiple clients with different model architectures while preserving privacy. Despite recent research progress, knowledge sharing in HtFL is still difficult due to data and model heterogeneity. To tackle this issue, we leverage the knowledge stored in pre-trained generators and propose a new upload-efficient knowledge transfer scheme called Federated Knowledge-Transfer Loop (FedKTL). Our FedKTL can produce client-task-related prototypical image-vector pairs via the generator's inference on the server. With these pairs, each client can transfer pre-existing knowledge from the generator to its local model through an additional supervised local task. We conduct extensive experiments on four datasets under two types of data heterogeneity with 14 kinds of models including CNNs and ViTs. Results show that our upload-efficient FedKTL surpasses seven state-of-the-art methods by up to 7.31% in accuracy. Moreover, our knowledge transfer scheme is applicable in scenarios with only one edge client. Code: https://github.com/TsingZ0/FedKTL

artificial intelligence, fedktl, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.1576

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.68)
Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards a Digital Twin Framework in Additive Manufacturing: Machine Learning and Bayesian Optimization for Time Series Process Optimization

Karkaria, Vispi, Goeckner, Anthony, Zha, Rujing, Chen, Jie, Zhang, Jianjing, Zhu, Qi, Cao, Jian, Gao, Robert X., Chen, Wei

arXiv.org Artificial IntelligenceFeb-27-2024

Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for heat management are common in DED research, few integrate real-time monitoring, physics-based modeling, and control in a unified framework. Our work presents a digital twin (DT) framework for real-time predictive control of DED process parameters to meet specific design objectives. We develop a surrogate model using Long Short-Term Memory (LSTM)-based machine learning with Bayesian Inference to predict temperatures in DED parts. This model predicts future temperature states in real time. We also introduce Bayesian Optimization (BO) for Time Series Process Optimization (BOTSPO), based on traditional BO but featuring a unique time series process profile generator with reduced dimensions. BOTSPO dynamically optimizes processes, identifying optimal laser power profiles to attain desired mechanical properties. The established process trajectory guides online optimizations, aiming to enhance performance. This paper outlines the digital twin framework's components, promoting its integration into a comprehensive system for AM.

artificial intelligence, machine learning, real time system, (14 more...)

arXiv.org Artificial Intelligence

2402.17718

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for Data and Model Heterogeneity in Federated Learning

Zhang, Jianqing, Liu, Yang, Hua, Yang, Cao, Jian

arXiv.org Artificial IntelligenceJan-6-2024

Recently, Heterogeneous Federated Learning (HtFL) has attracted attention due to its ability to support heterogeneous models and data. To reduce the high communication cost of transmitting model parameters, a major challenge in HtFL, prototype-based HtFL methods are proposed to solely share class representatives, a.k.a, prototypes, among heterogeneous clients while maintaining the privacy of clients' models. However, these prototypes are naively aggregated into global prototypes on the server using weighted averaging, resulting in suboptimal global knowledge which negatively impacts the performance of clients. To overcome this challenge, we introduce a novel HtFL approach called FedTGP, which leverages our Adaptive-margin-enhanced Contrastive Learning (ACL) to learn Trainable Global Prototypes (TGP) on the server. By incorporating ACL, our approach enhances prototype separability while preserving semantic meaning. Extensive experiments with twelve heterogeneous models demonstrate that our FedTGP surpasses state-of-the-art methods by up to 9.08% in accuracy while maintaining the communication and privacy advantages of prototype-based HtFL. Our code is available at https://github.com/TsingZ0/FedTGP.

artificial intelligence, machine learning, prototype, (17 more...)

arXiv.org Artificial Intelligence

2401.0323

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PFLlib: Personalized Federated Learning Algorithm Library

Zhang, Jianqing, Liu, Yang, Hua, Yang, Wang, Hao, Song, Tao, Xue, Zhengui, Ma, Ruhui, Cao, Jian

arXiv.org Artificial IntelligenceDec-8-2023

Amid the ongoing advancements in Federated Learning (FL), a machine learning paradigm that allows collaborative learning with data privacy protection, personalized FL (pFL) has gained significant prominence as a research direction within the FL domain. Whereas traditional FL (tFL) focuses on jointly learning a global model, pFL aims to achieve a balance between the global and personalized objectives of each client in FL settings. To foster the pFL research community, we propose PFLlib, a comprehensive pFL algorithm library with an integrated evaluation platform. In PFLlib, We implement 34 state-of-the-art FL algorithms (including 7 classic tFL algorithms and 27 pFL algorithms) and provide various evaluation environments with three statistically heterogeneous scenarios and 14 datasets. At present, PFLlib has already gained 850 stars and 199 forks on GitHub.

artificial intelligence, conference, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2312.04992

Country:

Asia > China (0.29)
North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback