Goto

Collaborating Authors

Results


Researchers use AI to estimate focal mechanism parameters of earthquake

#artificialintelligence

The research team led by Prof. Zhang Jie from the University of Science and Technology of China (USTC) of the Chinese Academy of Sciences made progress on real-time determination of earthquake focal mechanisms through deep learning. The work was published in Nature Communications. Since there are connections between characteristics of the rupture surface of the source fault and seismic wave radiated by the source, it's vital to monitor the earthquake by immediate determination of the source focal mechanism which is inferred from multiple ground seismic records. However, it's hard to calculate the mechanism from the simple records. The parameters about focal mechanisms are either merely reported or reported after a few minutes or even longer.


Learning to Optimize: A Primer and A Benchmark

arXiv.org Machine Learning

Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods, aiming at reducing the laborious iterations of hand engineering. It automates the design of an optimization method based on its performance on a set of training problems. This data-driven procedure generates methods that can efficiently solve problems similar to those in the training. In sharp contrast, the typical and traditional designs of optimization methods are theory-driven, so they obtain performance guarantees over the classes of problems specified by the theory. The difference makes L2O suitable for repeatedly solving a certain type of optimization problems over a specific distribution of data, while it typically fails on out-of-distribution problems. The practicality of L2O depends on the type of target optimization, the chosen architecture of the method to learn, and the training procedure. This new paradigm has motivated a community of researchers to explore L2O and report their findings. This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization. We set up taxonomies, categorize existing works and research directions, present insights, and identify open challenges.


Deconvolution-and-convolution Networks

arXiv.org Artificial Intelligence

Recent findings, however, suggest that CNN may not be the best option for 1D pattern recognition, especially for datasets with over 1 M training samples, e.g., existing CNN-based methods for 1D signals are highly reliant on human pre-processing. Common practices include utilizing discrete Fourier transform (DFT) to reconstruct 1D signal into 2D array. To add to extant knowledge, in this paper, a novel 1D data processing algorithm is proposed for 1D big data analysis through learning a deep deconvolutional-convolutional network. Rather than resorting to human-based techniques, we employed deconvolution layers to convert 1 D signals into 2D data. On top of the deconvolution model, the data was identified by a 2D CNN. Compared with the existing 1D signal processing algorithms, DCNet boasts the advantages of less human-made inference and higher generalization performance. Our experimental results from a varying number of training patterns (50 K to 11 M) from classification and regression demonstrate the desirability of our new approach.


Artificial intelligence for detection and quantification of rust and leaf miner in coffee crop

arXiv.org Artificial Intelligence

Pest and disease control plays a key role in agriculture since the damage caused by these agents are responsible for a huge economic loss every year. Based on this assumption, we create an algorithm capable of detecting rust (Hemileia vastatrix) and leaf miner (Leucoptera coffeella) in coffee leaves (Coffea arabica) and quantify disease severity using a mobile application as a high-level interface for the model inferences. We used different convolutional neural network architectures to create the object detector, besides the OpenCV library, k-means, and three treatments: the RGB and value to quantification, and the AFSoft software, in addition to the analysis of variance, where we compare the three methods. The results show an average precision of 81,5% in the detection and that there was no significant statistical difference between treatments to quantify the severity of coffee leaves, proposing a computationally less costly method. The application, together with the trained model, can detect the pest and disease over different image conditions and infection stages and also estimate the disease infection stage.


Transferable Model for Shape Optimization subject to Physical Constraints

arXiv.org Artificial Intelligence

The interaction of neural networks with physical equations offers a wide range of applications. We provide a method which enables a neural network to transform objects subject to given physical constraints. Therefore an U-Net architecture is used to learn the underlying physical behaviour of fluid flows. The network is used to infer the solution of flow simulations, which will be shown for a wide range of generic channel flow simulations. Physical meaningful quantities can be computed on the obtained solution, e.g. the total pressure difference or the forces on the objects. A Spatial Transformer Network with thin-plate-splines is used for the interaction between the physical constraints and the geometric representation of the objects. Thus, a transformation from an initial to a target geometry is performed such that the object is fulfilling the given constraints. This method is fully differentiable i.e., gradient informations can be used for the transformation. This can be seen as an inverse design process. The advantage of this method over many other proposed methods is, that the physical constraints are based on the inferred flow field solution. Thus, we have a transferable model which can be applied to varying problem setups and is not limited to a given set of geometry parameters or physical quantities.


Deep learning: a statistical viewpoint

arXiv.org Machine Learning

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.


Partial Differential Equations is All You Need for Generating Neural Architectures -- A Theory for Physical Artificial Intelligence Systems

arXiv.org Artificial Intelligence

In this work, we generalize the reaction-diffusion equation in statistical physics, Schr\"odinger equation in quantum mechanics, Helmholtz equation in paraxial optics into the neural partial differential equations (NPDE), which can be considered as the fundamental equations in the field of artificial intelligence research. We take finite difference method to discretize NPDE for finding numerical solution, and the basic building blocks of deep neural network architecture, including multi-layer perceptron, convolutional neural network and recurrent neural networks, are generated. The learning strategies, such as Adaptive moment estimation, L-BFGS, pseudoinverse learning algorithms and partial differential equation constrained optimization, are also presented. We believe it is of significance that presented clear physical image of interpretable deep neural networks, which makes it be possible for applying to analog computing device design, and pave the road to physical artificial intelligence.


Injecting Knowledge in Data-driven Vehicle Trajectory Predictors

arXiv.org Artificial Intelligence

Vehicle trajectory prediction tasks have been commonly tackled from two distinct perspectives: either with knowledge-driven methods or more recently with data-driven ones. On the one hand, we can explicitly implement domain-knowledge or physical priors such as anticipating that vehicles will follow the middle of the roads. While this perspective leads to feasible outputs, it has limited performance due to the difficulty to hand-craft complex interactions in urban environments. On the other hand, recent works use data-driven approaches which can learn complex interactions from the data leading to superior performance. However, generalization, \textit{i.e.}, having accurate predictions on unseen data, is an issue leading to unrealistic outputs. In this paper, we propose to learn a "Realistic Residual Block" (RRB), which effectively connects these two perspectives. Our RRB takes any off-the-shelf knowledge-driven model and finds the required residuals to add to the knowledge-aware trajectory. Our proposed method outputs realistic predictions by confining the residual range and taking into account its uncertainty. We also constrain our output with Model Predictive Control (MPC) to satisfy kinematic constraints. Using a publicly available dataset, we show that our method outperforms previous works in terms of accuracy and generalization to new scenes. We will release our code and data split here: https://github.com/vita-epfl/RRB.


Physics-aware deep neural networks for surrogate modeling of turbulent natural convection

arXiv.org Machine Learning

Recent works have explored the potential of machine learning as data-driven turbulence closures for RANS and LES techniques. Beyond these advances, the high expressivity and agility of physics-informed neural networks (PINNs) make them promising candidates for full fluid flow PDE modeling. An important question is whether this new paradigm, exempt from the traditional notion of discretization of the underlying operators very much connected to the flow scales resolution, is capable of sustaining high levels of turbulence characterized by multi-scale features? We investigate the use of PINNs surrogate modeling for turbulent Rayleigh-B{\'e}nard (RB) convection flows in rough and smooth rectangular cavities, mainly relying on DNS temperature data from the fluid bulk. We carefully quantify the computational requirements under which the formulation is capable of accurately recovering the flow hidden quantities. We then propose a new padding technique to distribute some of the scattered coordinates-at which PDE residuals are minimized-around the region of labeled data acquisition. We show how it comes to play as a regularization close to the training boundaries which are zones of poor accuracy for standard PINNs and results in a noticeable global accuracy improvement at iso-budget. Finally, we propose for the first time to relax the incompressibility condition in such a way that it drastically benefits the optimization search and results in a much improved convergence of the composite loss function. The RB results obtained at high Rayleigh number Ra = 2 $\bullet$ 10 9 are particularly impressive: the predictive accuracy of the surrogate over the entire half a billion DNS coordinates yields errors for all flow variables ranging between [0.3% -- 4%] in the relative L 2 norm, with a training relying only on 1.6% of the DNS data points.


LQResNet: A Deep Neural Network Architecture for Learning Dynamic Processes

arXiv.org Artificial Intelligence

With the rapid development in sensor and measurement technology, time-series data of processes have become available in large amounts with high accuracy. Machine learning and data science play an important role in analyzing and perceiving information of the underlying process dynamics from these data. Building a model describing the dynamics is vital in designing and optimizing various processes, as well as predicting their long-term transient behavior. Inferring a dynamic process model from data, often called system identification, has a rich history; see, e.g., [30,46]. While linear system identification is well established, nonlinear system identification is still far from being as good understood as for linear systems, despite having a similarly long research history, see, e.g., [25, 44]. Nonlinear system identification often relies on a good hypothesis of the model; thus, it is not entirely a black-box technology. Fortunately, there are several scenarios where one can hypothesize a model structure based on a good understanding of the underlying dynamic behavior using expert knowledge or experience. Towards nonlinear system identification, a promising approach based on a symbolic regression was proposed [4] to determine the potential structure of a nonlinear system.