pham
Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training
The mean field theory of multilayer neural networks centers around a particular infinite-width scaling, in which the learning dynamics is shown to be closely tracked by the mean field limit. A random fluctuation around this infinite-width limit is expected from a large-width expansion to the next order. This fluctuation has been studied only in the case of shallow networks, where previous works employ heavily technical notions or additional formulation ideas amenable only to that case. Treatment of the multilayer case has been missing, with the chief difficulty in finding a formulation that must capture the stochastic dependency across not only time but also depth.In this work, we initiate the study of the fluctuation in the case of multilayer networks, at any network depth. Leveraging on the neuronal embedding framework recently introduced by Nguyen and Pham, we systematically derive a system of dynamical equations, called the second-order mean field limit, that captures the limiting fluctuation distribution. We demonstrate through the framework the complex interaction among neurons in this second-order mean field limit, the stochasticity with cross-layer dependency and the nonlinear time evolution inherent in the limiting fluctuation. A limit theorem is proven to relate quantitatively this limit to the fluctuation realized by large-width networks.We apply the result to show a stability property of gradient descent mean field training: in the large-width regime, along the training trajectory, it progressively biases towards a solution with minimal fluctuation (in fact, vanishing fluctuation) in the learned output function, even after the network has been initialized at or has converged (sufficiently fast) to a global optimum. This extends a similar phenomenon previously shown only for shallow networks with a squared loss in the empirical risk minimization setting, to multilayer networks with a loss function that is not necessarily convex in a more general setting.
Probabilistic Kernel Function for Fast Angle Testing
Lu, Kejing, Xiao, Chuan, Ishikawa, Yoshiharu
In this paper, we study the angle testing problem in the context of similarity search in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. Unlike existing approaches that rely on random projection vectors drawn from Gaussian distributions, our approach leverages reference angles and employs a deterministic structure for the projection vectors. Notably, our kernel functions do not require asymptotic assumptions, such as the number of projection vectors tending to infinity, and can be both theoretically and experimentally shown to outperform Gaussian-distribution-based kernel functions. We apply the proposed kernel function to Approximate Nearest Neighbor Search (ANNS) and demonstrate that our approach achieves a 2.5X ~ 3X higher query-per-second (QPS) throughput compared to the widely-used graph-based search algorithm HNSW.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.35)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)
Understand the Effectiveness of Shortcuts through the Lens of DCA
Sun, Youran, Liu, Yihua, Niu, Yi-Shuai
Difference-of-Convex Algorithm (DCA) is a well-known nonconvex optimization algorithm for minimizing a nonconvex function that can be expressed as the difference of two convex ones. Many famous existing optimization algorithms, such as SGD and proximal point methods, can be viewed as special DCAs with specific DC decompositions, making it a powerful framework for optimization. On the other hand, shortcuts are a key architectural feature in modern deep neural networks, facilitating both training and optimization. We showed that the shortcut neural network gradient can be obtained by applying DCA to vanilla neural networks, networks without shortcut connections. Therefore, from the perspective of DCA, we can better understand the effectiveness of networks with shortcuts. Moreover, we proposed a new architecture called NegNet that does not fit the previous interpretation but performs on par with ResNet and can be included in the DCA framework.
Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training
The mean field theory of multilayer neural networks centers around a particular infinite-width scaling, in which the learning dynamics is shown to be closely tracked by the mean field limit. A random fluctuation around this infinite-width limit is expected from a large-width expansion to the next order. This fluctuation has been studied only in the case of shallow networks, where previous works employ heavily technical notions or additional formulation ideas amenable only to that case. Treatment of the multilayer case has been missing, with the chief difficulty in finding a formulation that must capture the stochastic dependency across not only time but also depth.In this work, we initiate the study of the fluctuation in the case of multilayer networks, at any network depth. Leveraging on the neuronal embedding framework recently introduced by Nguyen and Pham, we systematically derive a system of dynamical equations, called the second-order mean field limit, that captures the limiting fluctuation distribution.
Machine Learning Design Interview: Machine Learning System Design Interview: Pham, Khang: 9798813031571: Amazon.com: Books
Khang Pham is a Software Engineer with 12 years of experience in Machine Learning and Big Data. Since 2019, he has helped hundreds of engineers get jobs at big tech companies like Google, Meta (Facebook), Amazon, Apple, LinkedIn, Twitter and Microsoft. He created Machine Learning System Design (https://rebrand.ly/mlsd_launch) in 2021 and his course became the number #1 Machine Learning course on educative.
Students work with NASA to launch early wildfire detection technology
Artificial intelligence could help catch wildfires sooner. It's far from a wildfire, but the small flame of a candle could help catch something much bigger. California State Polytechnic University, Pomona students rigged a special setup to train the artificial intelligence they developed called Bronco Ember. The AI uses an infrared camera to decide if something is burning from afar. "Our generation really does want to tackle the big problems in the world,' said Michael Pham, a student researcher.
- Government > Space Agency (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
Machine learning will be one of the best ways to identify habitable exoplanets
The field of extrasolar planet studies is undergoing a seismic shift. To date, 4,940 exoplanets have been confirmed in 3,711 planetary systems, with another 8,709 candidates awaiting confirmation. With so many planets available for study and improvements in telescope sensitivity and data analysis, the focus is transitioning from discovery to characterization. Instead of simply looking for more planets, astrobiologists will examine "potentially-habitable" worlds for potential "biosignatures." This refers to the chemical signatures associated with life and biological processes, one of the most important of which is water.
- North America > United States (0.16)
- North America > Canada > Ontario > Toronto (0.15)
Digital stethoscope with artificial intelligence may detect aortic stenosis
Screening for significant aortic stenosis was fast and effective through the assessment of phonocardiograms by a digital stethoscope and machine learning, according to results presented at the American Society of Echocardiography Scientific Sessions. "A machine-learning algorithm trained on heart sounds can rapidly and accurately detect a murmur in patients with clinically significant aortic stenosis," Steve Pham, MD, vice president of clinical and research affairs at Eko Devices, told Cardiology Today. "Front-line clinicians may be able to use Eko stethoscopes (Eko CORE) with this algorithm to refer patients for echocardiography to confirm aortic stenosis." Brent E. White, MD, of the Bluhm Cardiovascular Institute at Northwestern Memorial Hospital in Chicago, and colleagues analyzed 639 recordings from 161 patients who were undergoing transthoracic echocardiography. The 15-second phonocardiogram recordings were obtained from the digital stethoscope, which is wirelessly paired with a mobile app (Eko Mobile).
- North America > United States > Illinois > Cook County > Chicago (0.27)
- North America > United States > Oregon > Multnomah County > Portland (0.07)
A Difference-of-Convex Programming Approach With Parallel Branch-and-Bound For Sentence Compression Via A Hybrid Extractive Model
Niu, Yi-Shuai, You, Yu, Xu, Wenxu, Ding, Wentao, Hu, Junpeng
Sentence compression is an important problem in natural language processing with wide applications in text summarization, search engine and human-AI interaction system etc. In this paper, we design a hybrid extractive sentence compression model combining a probability language model and a parse tree language model for compressing sentences by guaranteeing the syntax correctness of the compression results. Our compression model is formulated as an integer linear programming problem, which can be rewritten as a Difference-of-Convex (DC) programming problem based on the exact penalty technique. We use a well known efficient DC algorithm -- DCA to handle the penalized problem for local optimal solutions. Then a hybrid global optimization algorithm combining DCA with a parallel branch-and-bound framework, namely PDCABB, is used for finding global optimal solutions. Numerical results demonstrate that our sentence compression model can provide excellent compression results evaluated by F-score, and indicate that PDCABB is a promising algorithm for solving our sentence compression model.
Canvs AI Appoints Michel Tuan Pham to Its Board of Advisors
Canvs connects consumer input to research insights with patented semantic AI technology that helps companies deeply understand and empathize with their audiences. New York: Canvs AI, which offers a patented semantic AI technology to boost understanding and empathy, is pleased to announce the addition of Michel Tuan Pham to the company's diverse Board of Advisors. Canvs was searching to refine its AI-powered platform to give customers a deeper understanding of how its audiences think and feel to optimize the decision-making process. Canvs connected with Pham because of his deep expertise in understanding how emotion drives consumer behavior, thus making him the perfect candidate to help the company. Pham is the Kravis Professor of Business in Marketing at Columbia Business School.