AITopics

2501.12023

Country: North America > United States > Oklahoma > Payne County > Cushing (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-6-2025

Detecting Defective Wafers Via Modular Networks

Zhang, Yifeng, Baker, Bryan, Chen, Shi, Zhang, Chao, Huang, Yu, Zhao, Qi, Bom, Sthitie

The growing availability of sensors within semiconductor manufacturing processes makes it feasible to detect defective wafers with data-driven models. Without directly measuring the quality of semiconductor devices, they capture the modalities between diverse sensor readings and can be used to predict key quality indicators (KQI, \textit{e.g.}, roughness, resistance) to detect faulty products, significantly reducing the capital and human cost in maintaining physical metrology steps. Nevertheless, existing models pay little attention to the correlations among different processes for diverse wafer products and commonly struggle with generalizability issues. To enable generic fault detection, in this work, we propose a modular network (MN) trained using time series stage-wise datasets that embodies the structure of the manufacturing process. It decomposes KQI prediction as a combination of stage modules to simulate compositional semiconductor manufacturing, universally enhancing faulty wafer detection among different wafer types and manufacturing processes. Extensive experiments demonstrate the usefulness of our approach, and shed light on how the compositional design provides an interpretable interface for more practical applications.

artificial intelligence, machine learning, manufacturing process, (18 more...)

2501.03368

Genre: Research Report (1.00)

Industry:

Semiconductors & Electronics (1.00)
Information Technology > Hardware (0.56)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.95)

arXiv.org Machine LearningJan-3-2025

Residual connections provably mitigate oversmoothing in graph neural networks

Chen, Ziang, Lin, Zhengjiang, Chen, Shi, Polyanskiy, Yury, Rigollet, Philippe

Graph neural networks (GNNs) have achieved remarkable empirical success in processing and representing graph-structured data across various domains. However, a significant challenge known as "oversmoothing" persists, where vertex features become nearly indistinguishable in deep GNNs, severely restricting their expressive power and practical utility. In this work, we analyze the asymptotic oversmoothing rates of deep GNNs with and without residual connections by deriving explicit convergence rates for a normalized vertex similarity measure. Our analytical framework is grounded in the multiplicative ergodic theorem. Furthermore, we demonstrate that adding residual connections effectively mitigates or prevents oversmoothing across several broad families of parameter distributions. The theoretical findings are strongly supported by numerical experiments.

artificial intelligence, machine learning, theorem 2, (13 more...)

2501.00762

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJan-9-2024

A Good Score Does not Lead to A Good Generative Model

Li, Sixu, Chen, Shi, Li, Qin

Score-based Generative Models (SGMs) is one leading method in generative modeling, renowned for their ability to generate high-quality samples from complex, high-dimensional data distributions. The method enjoys empirical success and is supported by rigorous theoretical convergence properties. In particular, it has been shown that SGMs can generate samples from a distribution that is close to the ground-truth if the underlying score function is learned well, suggesting the success of SGM as a generative model. We provide a counter-example in this paper. Through the sample complexity argument, we provide one specific setting where the score function is learned well. Yet, SGMs in this setting can only output samples that are Gaussian blurring of training data points, mimicking the effects of kernel density estimation. The finding resonates a series of recent finding that reveal that SGMs can demonstrate strong memorization effect and fail to generate.

artificial intelligence, machine learning, score function, (16 more...)

2401.04856

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.81)

arXiv.org Artificial IntelligenceOct-20-2023

No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution

Zhang, Mengxiao, Chen, Shi, Luo, Haipeng, Wang, Yingfei

Supply chain management (SCM) has been recognized as an important discipline with applications to many industries, where the two-echelon stochastic inventory model, involving one downstream retailer and one upstream supplier, plays a fundamental role for developing firms' SCM strategies. In this work, we aim at designing online learning algorithms for this problem with an unknown demand distribution, which brings distinct features as compared to classic online optimization problems. Specifically, we consider the two-echelon supply chain model introduced in [Cachon and Zipkin, 1999] under two different settings: the centralized setting, where a planner decides both agents' strategy simultaneously, and the decentralized setting, where two agents decide their strategy independently and selfishly. We design algorithms that achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both settings, and additionally for individual regret in the decentralized setting. Our algorithms are based on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically designed for our problem. We also implement our algorithms and show their empirical effectiveness.

agent 1, artificial intelligence, machine learning, (17 more...)

2210.12663

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Education > Educational Setting (0.34)
Retail (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Artificial IntelligenceOct-9-2023

Accelerating optimization over the space of probability measures

Chen, Shi, Li, Qin, Tse, Oliver, Wright, Stephen J.

Acceleration of gradient-based optimization methods is an issue of significant practical and theoretical interest, particularly in machine learning applications. Most research has focused on optimization over Euclidean spaces, but given the need to optimize over spaces of probability measures in many machine learning problems, it is of interest to investigate accelerated gradient methods in this context too. To this end, we introduce a Hamiltonian-flow approach that is analogous to moment-based approaches in Euclidean space. We demonstrate that algorithms based on this approach can achieve convergence rates of arbitrarily high order. Numerical examples illustrate our claim.

artificial intelligence, machine learning, probability measure, (15 more...)

2310.04006

Country:

Europe (0.67)
North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.64)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

arXiv.org Artificial IntelligenceJun-3-2023

Correcting auto-differentiation in neural-ODE training

Xu, Yewei, Chen, Shi, Li, Qin, Wright, Stephen J.

Does the use of auto-differentiation yield reasonable updates to deep neural networks that represent neural ODEs? Through mathematical analysis and numerical evidence, we find that when the neural network employs high-order forms to approximate the underlying ODE flows (such as the Linear Multistep Method (LMM)), brute-force computation using auto-differentiation often produces non-converging artificial oscillations. In the case of Leapfrog, we propose a straightforward post-processing technique that effectively eliminates these oscillations, rectifies the gradient computation and thus respects the updates of the underlying flow.

artificial intelligence, machine learning, order derivative, (18 more...)

2306.02192

Country: North America > United States > Wisconsin > Dane County > Madison (0.15)

Genre: Research Report (0.50)

Industry: Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceMar-27-2023

Learning Harmonic Molecular Representations on Riemannian Manifold

Wang, Yiqun, Shen, Yuning, Chen, Shi, Wang, Lihao, Ye, Fei, Zhou, Hao

Molecular representation learning plays a crucial role in AI-assisted drug discovery research. Encoding 3D molecular structures through Euclidean neural networks has become the prevailing method in the geometric deep learning community. However, the equivariance constraints and message passing in Euclidean space may limit the network expressive power. In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of its molecular surface. HMR offers a multi-resolution representation of molecular geometric and chemical features on 2D Riemannian manifold. We also introduce a harmonic message passing method to realize efficient spectral message passing over the surface manifold for better molecular encoding. Our proposed method shows comparable predictive power to current models in small molecule property prediction, and outperforms the state-of-the-art deep learning models for ligand-binding protein pocket classification and the rigid protein docking challenge, demonstrating its versatility in molecular representation learning.

artificial intelligence, eigenfunction, machine learning, (20 more...)

2303.1552

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningOct-6-2021

On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

Ding, Zhiyan, Chen, Shi, Li, Qin, Wright, Stephen

Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime. We study this phenomenon with mean-field analysis, by translating the training process of ResNet to a gradient-flow partial differential equation (PDE) and examining the convergence properties of this limiting process. The activation function is assumed to be $2$-homogeneous or partially $1$-homogeneous; the regularized ReLU satisfies the latter condition. We show that if the ResNet is sufficiently large, with depth and width depending algebraically on the accuracy and confidence levels, first-order optimization methods can find global minimizers that fit the training data.

artificial intelligence, machine learning, neural network, (17 more...)

2110.02926

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

arXiv.org Machine LearningMay-29-2021

Overparameterization of deep ResNet: zero loss and mean-field analysis

Ding, Zhiyan, Chen, Shi, Li, Qin, Wright, Stephen

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many practical situations. We examine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both the number of layers (depth) and the number of neurons in each layer (width) go to infinity. First, we use a mean-field-limit argument to prove that the gradient descent for parameter training becomes a gradient flow for a probability distribution that is characterized by a partial differential equation (PDE) in the large-NN limit. Next, we show that the solution to the PDE converges in the training time to a zero-loss solution. Together, these results imply that the training of the ResNet gives a near-zero loss if the ResNet is large enough. We give estimates of the depth and width needed to reduce the loss below a given threshold, with high probability.

deep learning, gradient descent, neural network, (18 more...)

2105.14417

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)