AITopics | Shen, Bo

Collaborating Authors

Shen, Bo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DeepOFormer: Deep Operator Learning with Domain-informed Features for Fatigue Life Prediction

Li, Chenyang, Kapure, Tanmay Sunil, Roy, Prokash Chandra, Gan, Zhengtao, Shen, Bo

arXiv.org Artificial IntelligenceMar-28-2025

Fatigue life characterizes the duration a material can function before failure under specific environmental conditions, and is traditionally assessed using stress-life (S-N) curves. While machine learning and deep learning offer promising results for fatigue life prediction, they face the overfitting challenge because of the small size of fatigue experimental data in specific materials. To address this challenge, we propose, DeepOFormer, by formulating S-N curve prediction as an operator learning problem. DeepOFormer improves the deep operator learning framework with a transformer-based encoder and a mean L2 relative error loss function. We also consider Stussi, Weibull, and Pascual and Meeker (PM) features as domain-informed features. These features are motivated by empirical fatigue models. To evaluate the performance of our DeepOFormer, we compare it with different deep learning models and XGBoost on a dataset with 54 S-N curves of aluminum alloys. With seven different aluminum alloys selected for testing, our DeepOFormer achieves an R2 of 0.9515, a mean absolute error of 0.2080, and a mean relative error of 0.5077, significantly outperforming state-of-the-art deep/machine learning methods including DeepONet, TabTransformer, and XGBoost, etc. The results highlight that our Deep0Former integrating with domain-informed features substantially improves prediction accuracy and generalization capabilities for fatigue life prediction in aluminum alloys.

artificial intelligence, deepoformer, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.22475

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Materials > Metals & Mining > Aluminum (0.77)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CodeV: Issue Resolving with Visual Data

Zhang, Linhao, Zan, Daoguang, Yang, Quanshun, Huang, Zhirong, Chen, Dong, Shen, Bo, Liu, Tianyu, Gong, Yongshun, Huang, Pengjie, Lu, Xudong, Liang, Guangtai, Cui, Lizhen, Wang, Qianxiang

arXiv.org Artificial IntelligenceDec-23-2024

Large Language Models (LLMs) have advanced rapidly in recent years, with their applications in software engineering expanding to more complex repository-level tasks. GitHub issue resolving is a key challenge among these tasks. While recent approaches have made progress on this task, they focus on textual data within issues, neglecting visual data. However, this visual data is crucial for resolving issues as it conveys additional knowledge that text alone cannot. We propose CodeV, the first approach to leveraging visual data to enhance the issue-resolving capabilities of LLMs. CodeV resolves each issue by following a two-phase process: data processing and patch generation. To evaluate CodeV, we construct a benchmark for visual issue resolving, namely Visual SWE-bench. Through extensive experiments, we demonstrate the effectiveness of CodeV, as well as provide valuable insights into leveraging visual data to resolve GitHub issues.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.17315

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Zhao, Haiyan, Zhao, Heng, Shen, Bo, Payani, Ali, Yang, Fan, Du, Mengnan

arXiv.org Artificial IntelligenceSep-30-2024

Probing learned concepts in large language models (LLMs) is crucial for understanding how semantic knowledge is encoded internally. Training linear classifiers on probing tasks is a principle approach to denote the vector of a certain concept in the representation space. However, the single vector identified for a concept varies with both data and training, making it less robust and weakening its effectiveness in real-world applications. To address this challenge, we propose an approach to approximate the subspace representing a specific concept. Built on linear probing classifiers, we extend the concept vectors into Gaussian Concept Subspace (GCS). We demonstrate GCS's effectiveness through measuring its faithfulness and plausibility across multiple LLMs with different sizes and architectures. Additionally, we use representation intervention tasks to showcase its efficacy in real-world applications such as emotion steering. Experimental results indicate that GCS concept vectors have the potential to balance steering performance and maintaining the fluency in natural language generation tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.00153

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural Operator for Accelerating Coronal Magnetic Field Model

Du, Yutao, Li, Qin, Gnanasambandam, Raghav, Du, Mengnan, Wang, Haimin, Shen, Bo

arXiv.org Artificial IntelligenceJun-26-2024

Studying the sun's outer atmosphere is challenging due to its complex magnetic fields impacting solar activities. Magnetohydrodynamics (MHD) simulations help model these interactions but are extremely time-consuming (usually on a scale of days). Our research applies the Fourier Neural Operator (FNO) to accelerate the coronal magnetic field modeling, specifically, the Bifrost MHD model. We apply Tensorized FNO (TFNO) to generate solutions from partial differential equations (PDEs) over a 3D domain efficiently. TFNO's performance is compared with other deep learning methods, highlighting its accuracy and scalability. Physics analysis confirms that TFNO is reliable and capable of accelerating MHD simulations with high precision. This advancement improves efficiency in data handling, enhances predictive capabilities, and provides a better understanding of magnetic topologies.

artificial intelligence, ground truth, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.12754

Country:

North America > United States > New Jersey (0.15)
North America > United States > Virginia (0.14)

Genre: Research Report (1.00)

Industry: Energy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CodeS: Natural Language to Code Repository via Multi-Layer Sketch

Zan, Daoguang, Yu, Ailun, Liu, Wei, Chen, Dong, Shen, Bo, Li, Wei, Yao, Yafen, Gong, Yongshun, Chen, Xiaolin, Guan, Bei, Yang, Zhiguang, Wang, Yongji, Wang, Qianxiang, Cui, Lizhen

arXiv.org Artificial IntelligenceMar-25-2024

The impressive performance of large language models (LLMs) on code-related tasks has shown the potential of fully automated software development. In light of this, we introduce a new software engineering task, namely Natural Language to code Repository (NL2Repo). This task aims to generate an entire code repository from its natural language requirements. To address this task, we propose a simple yet effective framework CodeS, which decomposes NL2Repo into multiple sub-tasks by a multi-layer sketch. Specifically, CodeS includes three modules: RepoSketcher, FileSketcher, and SketchFiller. RepoSketcher first generates a repository's directory structure for given requirements; FileSketcher then generates a file sketch for each file in the generated structure; SketchFiller finally fills in the details for each function in the generated file sketch. To rigorously assess CodeS on the NL2Repo task, we carry out evaluations through both automated benchmarking and manual feedback analysis. For benchmark-based evaluation, we craft a repository-oriented benchmark, SketchEval, and design an evaluation metric, SketchBLEU. For feedback-based evaluation, we develop a VSCode plugin for CodeS and engage 30 participants in conducting empirical studies. Extensive experiments prove the effectiveness and practicality of CodeS on the NL2Repo task.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.16443

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

Li, Chenyang, Chung, Jihoon, Cai, Biao, Wang, Haimin, Zhou, Xianlian, Shen, Bo

arXiv.org Machine LearningJan-4-2024

Model compression is a crucial part of deploying neural networks (NNs), especially when the memory and storage of computing devices are limited in many applications. This paper focuses on two model compression techniques: low-rank approximation and weight pruning in neural networks, which are very popular nowadays. However, training NN with low-rank approximation and weight pruning always suffers significant accuracy loss and convergence issues. In this paper, a holistic framework is proposed for model compression from a novel perspective of nonconvex optimization by designing an appropriate objective function. Then, we introduce NN-BCD, a block coordinate descent (BCD) algorithm to solve the nonconvex optimization. One advantage of our algorithm is that an efficient iteration scheme can be derived with closed-form, which is gradient-free. Therefore, our algorithm will not suffer from vanishing/exploding gradient problems. Furthermore, with the Kurdyka-{\L}ojasiewicz (K{\L}) property of our objective function, we show that our algorithm globally converges to a critical point at the rate of O(1/k), where k denotes the number of iterations. Lastly, extensive experiments with tensor train decomposition and weight pruning demonstrate the efficiency and superior performance of the proposed framework. Our code implementation is available at https://github.com/ChenyangLi-97/NN-BCD

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

2303.06815

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Linear RNNs Provably Learn Linear Dynamic Systems

Wang, Lifu, Wang, Tianyu, Yi, Shengwei, Shen, Bo, Hu, Bo, Cao, Xing

arXiv.org Artificial IntelligenceOct-22-2023

We study the learning ability of linear recurrent neural networks with Gradient Descent. We prove the first theoretical guarantee on linear RNNs to learn any stable linear dynamic system using any a large type of loss functions. For an arbitrary stable linear system with a parameter $\rho_C$ related to the transition matrix $C$, we show that despite the non-convexity of the parameter optimization loss if the width of the RNN is large enough (and the required width in hidden layers does not rely on the length of the input sequence), a linear RNN can provably learn any stable linear dynamic system with the sample and time complexity polynomial in $\frac{1}{1-\rho_C}$. Our results provide the first theoretical guarantee to learn a linear RNN and demonstrate how can the recurrent structure help to learn a dynamic system.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2211.10582

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Can Programming Languages Boost Each Other via Instruction Tuning?

Zan, Daoguang, Yu, Ailun, Shen, Bo, Zhang, Jiaxin, Chen, Taihong, Geng, Bing, Chen, Bei, Ji, Jichuan, Yao, Yafen, Wang, Yongji, Wang, Qianxiang

arXiv.org Artificial IntelligenceSep-3-2023

When human programmers have mastered a programming language, it would be easier when they learn a new programming language. In this report, we focus on exploring whether programming languages can boost each other during the instruction fine-tuning phase of code large language models. We conduct extensive experiments of 8 popular programming languages (Python, JavaScript, TypeScript, C, C++, Java, Go, HTML) on StarCoder. Results demonstrate that programming languages can significantly improve each other. For example, CodeM-Python 15B trained on Python is able to increase Java by an absolute 17.95% pass@1 on HumanEval-X. More surprisingly, we found that CodeM-HTML 7B trained on the HTML corpus can improve Java by an absolute 15.24% pass@1. Our training data is released at https://github.com/NL2Code/CodeM.

large language model, machine learning, programming language, (16 more...)

arXiv.org Artificial Intelligence

2308.16824

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Shen, Bo, Zhang, Jiaxin, Chen, Taihong, Zan, Daoguang, Geng, Bing, Fu, An, Zeng, Muhan, Yu, Ailun, Ji, Jichuan, Zhao, Jingyang, Guo, Yuenan, Wang, Qianxiang

arXiv.org Artificial IntelligenceJul-27-2023

Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches have been proposed to boost the code generation performance of pre-trained Code LLMs, such as supervised fine-tuning, instruction tuning, reinforcement learning, etc. In this paper, we propose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework, which can effectively and efficiently boost pre-trained large language models for code generation. Under this framework, we present PanGu-Coder2, which achieves 62.20% pass@1 on the OpenAI HumanEval benchmark. Furthermore, through an extensive evaluation on CoderEval and LeetCode benchmarks, we show that PanGu-Coder2 consistently outperforms all previous Code LLMs.

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2307.14936

Country: North America > Canada (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Anomaly Detection in Additive Manufacturing Processes using Supervised Classification with Imbalanced Sensor Data based on Generative Adversarial Network

Chung, Jihoon, Shen, Bo, Zhenyu, null, Kong, null

arXiv.org Artificial IntelligenceNov-25-2022

Supervised classification methods have been widely utilized for the quality assurance of the advanced manufacturing process, such as additive manufacturing (AM) for anomaly (defects) detection. However, since abnormal states (with defects) occur much less frequently than normal ones (without defects) in a manufacturing process, the number of sensor data samples collected from a normal state is usually much more than that from an abnormal state. This issue causes imbalanced training data for classification analysis, thus deteriorating the performance of detecting abnormal states in the process. It is beneficial to generate effective artificial sample data for the abnormal states to make a more balanced training set. To achieve this goal, this paper proposes a novel data augmentation method based on a generative adversarial network (GAN) using additive manufacturing process image sensor data. The novelty of our approach is that a standard GAN and classifier are jointly optimized with techniques to stabilize the learning process of standard GAN. The diverse and high-quality generated samples provide balanced training data to the classifier. The iterative optimization between GAN and classifier provides the high-performance classifier. The effectiveness of the proposed method is validated by both open-source data and real-world case studies in polymer and metal AM processes.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.17274

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Machinery > Industrial Machinery (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback