Goto

Collaborating Authors

 Education


From Post-it Notes To Algorithms: How Automation Is Changing Legal Work

NPR Technology

While document review used to be tedious work for lawyers, Kirk says they can now sift through gigabytes of data within days with the help of artificial intelligence. While document review used to be tedious work for lawyers, Kirk says they can now sift through gigabytes of data within days with the help of artificial intelligence. This is part of an occasional series: Is My Job Safe? These stories look at jobs that might be at risk because of technology and automation. Shannon Capone Kirk's first job as a young lawyer in the late '90s was "document review."


Stephen-Hawking-says-technology-end-poverty-urges-caution.html?ITO=1490&ns_mchannel=rss&ns_campaign=1490

Daily Mail

A report by Human Rights Watch and the Harvard Law School International Human Rights Clinic calls for humans to remain in control over all weapons systems at a time of rapid technological advances. It says that requiring humans to remain in control of critical functions during combat, including the selection of targets, saves lives and ensures that fighters comply with international law. 'Machines have long served as instruments of war, but historically humans have directed how they are used,' said Bonnie Docherty, senior arms division researcher at Human Rights Watch, in a statement. 'Now there is a real threat that humans would relinquish their control and delegate life-and-death decisions to machines.' Some have argued in favour of robots on the battlefield, saying their use could save lives.


Tangent: Automatic Differentiation Using Source Code Transformation in Python

arXiv.org Machine Learning

Automatic differentiation (AD) is an essential primitive for machine learning programming systems. Tangent is a new library that performs AD using source code transformation (SCT) in Python. It takes numeric functions written in a syntactic subset of Python and NumPy as input, and generates new Python functions which calculate a derivative. This approach to automatic differentiation is different from existing packages popular in machine learning, such as TensorFlow and Autograd. Advantages are that Tangent generates gradient code in Python which is readable by the user, easy to understand and debug, and has no runtime overhead. Tangent also introduces abstractions for easily injecting logic into the generated gradient code, further improving usability.


A Tutorial on Canonical Correlation Methods

arXiv.org Machine Learning

Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a hands-on tool for applying canonical correlation methods in data analysis.


Finding Heavily-Weighted Features in Data Streams

arXiv.org Machine Learning

We introduce a new sub-linear space data structure---the Weight-Median Sketch---that captures the most heavily weighted features in linear classifiers trained over data streams. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. In contrast with related sketches that capture the most commonly occurring features (or items) in a data stream, the Weight-Median Sketch captures the features that are most discriminative of one stream (or class) compared to another. The Weight-Median sketch adopts the core data structure used in the Count-Sketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis of this approach that establishes recovery guarantees in the online learning setting, and demonstrate substantial empirical improvements in accuracy-memory trade-offs over alternatives, including count-based sketches and feature hashing.


Identification of Gaussian Process State Space Models

arXiv.org Machine Learning

The Gaussian process state space model (GPSSM) is a non-linear dynamical system, where unknown transition and/or measurement mappings are described by GPs. Most research in GPSSMs has focussed on the state estimation problem, i.e., computing a posterior of the latent state given the model. However, the key challenge in GPSSMs has not been satisfactorily addressed yet: system identification, i.e., learning the model. To address this challenge, we impose a structured Gaussian variational posterior distribution over the latent states, which is parameterised by a recognition model in the form of a bi-directional recurrent neural network. Inference with this structure allows us to recover a posterior smoothed over sequences of data. We provide a practical algorithm for efficiently computing a lower bound on the marginal likelihood using the reparameterisation trick. This further allows for the use of arbitrary kernels within the GPSSM. We demonstrate that the learnt GPSSM can efficiently generate plausible future trajectories of the identified system after only observing a small number of episodes from the true system.


Mathematics for Machine Learning

@machinelearnbot

Would you like to learn the mathematics behind machine learning? There aren't many resources out there that give simple detailed examples and that walk you through the topics step by step. If you're looking to gain a solid foundation in machine learning, allowing you to study on your own schedule at a fraction of the cost it would take at a traditional university, to further your career goals, this online course is for you. If you're a working professional needing a refresher on machine learning or a complete beginner who needs to learn machine learning for the first time, this online course is for you. Why you should take this online course: You need to refresh your knowledge of machine learning for your career to earn a higher salary.



Online Tool Condition Monitoring Based on Parsimonious Ensemble+

arXiv.org Artificial Intelligence

Accurate diagnosis of tool wear in metal turning process remains an open challenge for both scientists and industrial practitioners because of inhomogeneities in workpiece material, nonstationary machining settings to suit production requirements, and nonlinear relations between measured variables and tool wear. Common methodologies for tool condition monitoring still rely on batch approaches which cannot cope with a fast sampling rate of metal cutting process. Furthermore they require a retraining process to be completed from scratch when dealing with a new set of machining parameters. This paper presents an online tool condition monitoring approach based on Parsimonious Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly flexible principle where both ensemble structure and base-classifier structure can automatically grow and shrink on the fly based on the characteristics of data streams. Moreover, the online feature selection scenario is integrated to actively sample relevant input attributes. The paper presents advancement of a newly developed ensemble learning algorithm, pENsemble+, where online active learning scenario is incorporated to reduce operator labelling effort. The ensemble merging scenario is proposed which allows reduction of ensemble complexity while retaining its diversity. Experimental studies utilising real-world manufacturing data streams and comparisons with well known algorithms were carried out. Furthermore, the efficacy of pENsemble was examined using benchmark concept drift data streams. It has been found that pENsemble+ incurs low structural complexity and results in a significant reduction of operator labelling effort.


Online Learning for Changing Environments using Coin Betting

arXiv.org Machine Learning

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments. Recent studies have proposed "meta" algorithms that convert any online learning algorithm to one that is adaptive to changing environments, where the adaptivity is analyzed in a quantity called the strongly-adaptive regret. This paper describes a new meta algorithm that has a strongly-adaptive regret bound that is a factor of $\sqrt{\log(T)}$ better than other algorithms with the same time complexity, where $T$ is the time horizon. We also extend our algorithm to achieve a first-order (i.e., dependent on the observed losses) strongly-adaptive regret bound for the first time, to our knowledge. At its heart is a new parameter-free algorithm for the learning with expert advice (LEA) problem in which experts sometimes do not output advice for consecutive time steps (i.e., \emph{sleeping} experts). This algorithm is derived by a reduction from optimal algorithms for the so-called coin betting problem. Empirical results show that our algorithm outperforms state-of-the-art methods in both learning with expert advice and metric learning scenarios.