South America
Things to Consider for Implementing Voice-enabled AI
The period of voice-enabled search and commerce has shown up, and organizations need to prepare themselves for this new paradigm. By implanting brands into conversations, organizations can give customized solutions for pretty much any issue. Voice frameworks have quickly advanced from the simple voice commands to entire ecosystems of applications and interactions. Intelligent speakers like Amazon Echo, Apple Home Pod, and Google Home give various entertainment choices inside your home. These voice AI-empowered systems have cleared another dimension to execute tasks that were initially limited to a screen.
Evaluating Progress on Machine Learning for Longitudinal Electronic Healthcare Data
Bellamy, David, Celi, Leo, Beam, Andrew L.
The Large Scale Visual Recognition Challenge based on the well-known Imagenet dataset catalyzed an intense flurry of progress in computer vision. Benchmark tasks have propelled other sub-fields of machine learning forward at an equally impressive pace, but in healthcare it has primarily been image processing tasks, such as in dermatology and radiology, that have experienced similar benchmark-driven progress. In the present study, we performed a comprehensive review of benchmarks in medical machine learning for structured data, identifying one based on the Medical Information Mart for Intensive Care (MIMIC-III) that allows the first direct comparison of predictive performance and thus the evaluation of progress on four clinical prediction tasks: mortality, length of stay, phenotyping, and patient decompensation. We find that little meaningful progress has been made over a 3 year period on these tasks, despite significant community engagement. Through our meta-analysis, we find that the performance of deep recurrent models is only superior to logistic regression on certain tasks. We conclude with a synthesis of these results, possible explanations, and a list of desirable qualities for future benchmarks in medical machine learning.
Differentiable Weighted Finite-State Transducers
Hannun, Awni, Pratap, Vineel, Kahn, Jacob, Hsu, Wei-Ning
E B. (2) The primary difference between ASG and CTC is the inclusion of a blank token, b, represented by the graph in figure 3a. Constructing CTC amounts to including the blank token graph when constructing the full token graph T. The intersection T Y then results in the CTC alignment graph (Figure 1b). Note, this version of CTC does not force transitions on b between repeats tokens. This requires remembering the previous state and hence is more involved (see Appendix A.1 for details). A benefit of constructing sequence-level criteria by composing operations on simpler graphs is the access to a large design space of loss functions with which we can encode useful priors. For example we could construct a "spike" CTC, a "duration-limited" CTC, or an "equally spaced" CTC by substituting the appropriate token graphs into equation 2 (see Appendix A.2 for details).
Group Equivariant Stand-Alone Self-Attention For Vision
Romero, David W., Cordonnier, Jean-Baptiste
We provide a general self-attention formulation to impose group equivariance to arbitrary symmetry groups. This is achieved by defining positional encodings that are invariant to the action of the group considered. Since the group acts on the positional encoding directly, group equivariant self-attention networks (GSA-Nets) are steerable by nature. Our experiments on vision benchmarks demonstrate consistent improvements of GSA-Nets over non-equivariant self-attention networks.
Variance-Reduced Methods for Machine Learning
Gower, Robert M., Schmidt, Mark, Bach, Francis, Richtarik, Peter
Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory as well as practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at non-expert readers. We focus mainly on the convex setting, and leave pointers to readers interested in extensions for minimizing non-convex functions.
Effective Regularization Through Loss-Function Metalearning
Gonzalez, Santiago, Miikkulainen, Risto
Loss-function metalearning can be used to discover novel, customized loss functions for deep neural networks, resulting in improved performance, faster training, and improved data utilization. A likely explanation is that such functions discourage overfitting, leading to effective regularization. This paper theoretically demonstrates that this is indeed the case: decomposition of learning rules makes it possible to characterize the training dynamics and show that loss functions evolved through TaylorGLO regularize both in the beginning and end of learning, and maintain an invariant in between. The invariant can be utilized to make the metalearning process more efficient in practice, and the regularization can train networks that are robust against adversarial attacks. Loss-function optimization can thus be seen as a well-founded new aspect of metalearning in neural networks.
Learning Potentials of Quantum Systems using Deep Neural Networks
Sehanobish, Arijit, Corzo, Hector H., Kara, Onur, van Dijk, David
Machine Learning has wide applications in a broad range of subjects, including physics. Recent works have shown that neural networks can learn classical Hamiltonian mechanics. The results of these works motivate the following question: Can we endow neural networks with inductive biases coming from quantum mechanics and provide insights for quantum phenomena? In this work, we try answering these questions by investigating possible approximations for reconstructing the Hamiltonian of a quantum system given one of its wave--functions. Instead of handcrafting the Hamiltonian and a solution of the Schr\"odinger equation, we design neural networks that aim to learn it directly from our observations. We show that our method, termed Quantum Potential Neural Networks (QPNN), can learn potentials in an unsupervised manner with remarkable accuracy for a wide range of quantum systems, such as the quantum harmonic oscillator, particle in a box perturbed by an external potential, hydrogen atom, P\"oschl--Teller potential, and a solitary wave system. Furthermore, in the case of a particle perturbed by an external force, we also learn the perturbed wave function in a joint end-to-end manner.
Covariate Shift Adaptation in High-Dimensional and Divergent Distributions
Polo, Felipe Maia, Vicente, Renato
In real world applications of supervised learning methods, training and test sets are often sampled from the distinct distributions and we must resort to domain adaptation techniques. One special class of techniques is Covariate Shift Adaptation, which allows practitioners to obtain good generalization performance in the distribution of interest when domains differ only by the marginal distribution of features. Traditionally, Covariate Shift Adaptation is implemented using Importance Weighting which may fail in high-dimensional settings due to small Effective Sample Sizes (ESS). In this paper, we propose (i) a connection between ESS, high-dimensional settings and generalization bounds and (ii) a simple, general and theoretically sound approach to combine feature selection and Covariate Shift Adaptation. The new approach yields good performance with improved ESS.
National ambitions = dodgy AI policy and deployment?
An annual report by a consulting firm to world governments uncovers an interesting pattern when it comes to the responsible use of AI. According data developed by consultant Oxford Insights, with one debatable exception, none of the top 15 countries ranked for their responsible use of AI could reasonably be considered to have strategic ambitions to dominate globally or even their region. The report, which looked broadly at the AI readiness of world governments, created a sub-ranking focused on how responsible governments are being in four "dimensions" -- inclusivity, accountability, privacy and transparency. Rank was measured using nine indicators grouped under each dimension. Japan, which recently began debating an outward-facing military that operates independent of the United States, ranked No. 15.
Learning to be safe, in finite time
Castellano, Agustin, Bazerque, Juan, Mallada, Enrique
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials, provided that one is willing to relax its optimality requirements mildly. We focus on the canonical multi-armed bandit problem and seek to study the exploration-preservation trade-off intrinsic within safe learning. More precisely, by defining a handicap metric that counts the number of unsafe actions, we provide an algorithm for discarding unsafe machines (or actions), with probability one, that achieves constant handicap. Our algorithm is rooted in the classical sequential probability ratio test, redefined here for continuing tasks. Under standard assumptions on sufficient exploration, our rule provably detects all unsafe machines in an (expected) finite number of rounds. The analysis also unveils a trade-off between the number of rounds needed to secure the environment and the probability of discarding safe machines. Our decision rule can wrap around any other algorithm to optimize a specific auxiliary goal since it provides a safe environment to search for (approximately) optimal policies. Simulations corroborate our theoretical findings and further illustrate the aforementioned trade-offs.