neufeld
Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks
Liang, Luxu, Neufeld, Ariel, Zhang, Ying
In this paper, we provide a non-asymptotic analysis of the convergence of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm to a target measure in Wasserstein-1 and Wasserstein-2 distance. Crucially, compared to the existing literature on SGHMC, we allow its stochastic gradient to be discontinuous. This allows us to provide explicit upper bounds, which can be controlled to be arbitrarily small, for the expected excess risk of non-convex stochastic optimization problems with discontinuous stochastic gradients, including, among others, the training of neural networks with ReLU activation function. To illustrate the applicability of our main results, we consider numerical experiments on quantile estimation and on several optimization problems involving ReLU neural networks relevant in finance and artificial intelligence.
Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient
Lim, Dong-Young, Neufeld, Ariel, Sabanis, Sotirios, Zhang, Ying
We introduce a new Langevin dynamics based algorithm, called e-TH$\varepsilon$O POULA, to solve optimization problems with discontinuous stochastic gradients which naturally appear in real-world applications such as quantile estimation, vector quantization, CVaR minimization, and regularized optimization problems involving ReLU neural networks. We demonstrate both theoretically and numerically the applicability of the e-TH$\varepsilon$O POULA algorithm. More precisely, under the conditions that the stochastic gradient is locally Lipschitz in average and satisfies a certain convexity at infinity condition, we establish non-asymptotic error bounds for e-TH$\varepsilon$O POULA in Wasserstein distances and provide a non-asymptotic estimate for the expected excess risk, which can be controlled to be arbitrarily small. Three key applications in finance and insurance are provided, namely, multi-period portfolio optimization, transfer learning in multi-period portfolio optimization, and insurance claim prediction, which involve neural networks with (Leaky)-ReLU activation functions. Numerical experiments conducted using real-world datasets illustrate the superior empirical performance of e-TH$\varepsilon$O POULA compared to SGLD, TUSLA, ADAM, and AMSGrad in terms of model accuracy.
Neufeld
Intelligent autonomous agents that are acting in dynamic environmentsin real-time are often required to follow long-termstrategies while also remaining reactive and being able to actdeliberately. In order to create intelligent behaviors for videogame characters, there are two common approaches โ plannersare used for long-term strategical planning, whereas BehaviorTrees allow for reactive acting. Although both methodologieshave their advantages, when used on their own, theyfail to fully achieve both requirements described above. Inthis work, we propose a hybrid approach combining a HierarchicalTask Network planner for high-level planning whiledelegating low-level decision making and acting to BehaviorTrees. Furthermore, we compare this approach with a pureplanner in a multi-agent environment.
Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function
Lim, Dong-Young, Neufeld, Ariel, Sabanis, Sotirios, Zhang, Ying
We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a non-asymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2021). In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive non-asymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example which supports our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g. ADAM, AMSGrad, RMSProp, and (vanilla) SGD, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution.
How AI could solve the U.S. construction industry's productivity puzzle
The days of construction projects running behind schedule and over budget could soon be over as AI technology tries to solve the U.S. productivity puzzle. Disperse, an AI-powered construction firm, has raised fresh finance to expand into the U.S. in a bid to tackle inefficiencies on building sites. The company's technology uses visual snapshots of construction projects to alert managers about potential problems before they happen. The construction sector has been grappling with low levels of productivity for decades, with underinvestment in technology one of the key factors. Closing the productivity gap in global construction could be worth $1.6 trillion a year, with a third of that coming in the U.S., according to the McKinsey Global Institute.
AI in the Workplace: What it Means to the Gender Wage Gap in 2019
As we saw in Minding the Gender Gap, women still lag far behind men in the tech field, both in terms of representations (which hovers around 25% in the United States), and in terms of pay, where the gap between men and women is close to 12%. While figures for pay disparity in tech don't focus on specialists in artificial intelligence (AI), female representation there is even lower. According to the report, Discriminating Systems: Gender, Race, and Power, conferences women make up only 18% of the represented authors at AI conferences and less than 20% of AI professors. They fare even worse in corporations where they make up only 15% of research staff positions at Facebook and a mere 10% at Google. Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.
Mace employs AI issue detection to track onsite progress BIM
Mace has become the latest contractor to adopt an artificial intelligence (AI)-powered issue detection system which tracks onsite progress. The move comes after construction tech start-up Disperse piloted its product concurrently with Canary Wharf Contractors (CWCL) and Kier. Disperse's system employs safety-trained site scanners which use 360 degree cameras in every room across all floors to capture progress on a project, before the firm's Computer Vision technology detects changes week-on-week, measures progress, and identifies anomalies. So far, the pilots have covered a 327-unit residential tower in London for CWCL and a 120-room hotel in Reading for Kier, with the system analysing the projects using the 360-degree imagery. Disperse said its goal is to create an issue detection system comparable to those used in manufacturing plants.
Conditioning on Disjunctive Knowledge: Defaults and Probabilities
Many writers have observed that default logics appear to contain the "lottery paradox" of probability theory. This arises when a default "proof by contradiction" lets us conclude that a typical X is not a Y where Y is an unusual subclass of X. We show that there is a similar problem with default "proof by cases" and construct a setting where we might draw a different conclusion knowing a disjunction than we would knowing any particular disjunct. Though Reiter's original formalism is capable of representing this distinction, other approaches are not. To represent and reason about this case, default logicians must specify how a "typical" individual is selected. The problem is closely related to Simpson's paradox of probability theory. If we accept a simple probabilistic account of defaults based on the notion that one proposition may favour or increase belief in another, the "multiple extension problem" for both conjunctive and disjunctive knowledge vanishes.