Goto

Collaborating Authors

 Materials


Explaining Memorization and Generalization: A Large-Scale Study with Coherent Gradients

arXiv.org Machine Learning

Coherent Gradients is a recently proposed hypothesis to explain why over-parameterized neural networks trained with gradient descent generalize well even though they have sufficient capacity to memorize the training set. Inspired by random forests, Coherent Gradients proposes that (Stochastic) Gradient Descent (SGD) finds common patterns amongst examples (if such common patterns exist) since descent directions that are common to many examples add up in the overall gradient, and thus the biggest changes to the network parameters are those that simultaneously help many examples. The original Coherent Gradients paper validated the theory through causal intervention experiments on shallow, fully connected networks on MNIST. In this work, we perform similar intervention experiments on more complex architectures (such as VGG, Inception and ResNet) on more complex datasets (such as CIFAR-10 and ImageNet). Our results are in good agreement with the small scale study in the original paper, thus providing the first validation of coherent gradients in more practically relevant settings. We also confirm in these settings that suppressing incoherent updates by natural modifications to SGD can significantly reduce overfitting--lending credence to the hypothesis that memorization occurs when few examples are responsible for most of the gradient used in the update. Furthermore, we use the coherent gradients theory to explore a new characterization of why some examples are learned earlier than other examples, i.e., "easy" and "hard" examples.


Difference Between Data Mining, Machine Learning and Big Data

#artificialintelligence

The amount of digital data that currently exists is now growing at a rapid pace. The number is doubling every two years and it is completely transforming our basic mode of existence. According to a paper from IBM, about 2.5 billion gigabytes of data had been generated on a daily basis in the year 2012. Another article from Forbes informs us that data is growing at a pace which is faster than ever. The same article suggests that this year, 2020, about 1.7 billion of new information will be developed per second for all the human inhabitants on this planet.


The future of construction – Intel RealSense Depth and Tracking Cameras

#artificialintelligence

In the U.S. alone, the construction industry creates around $1.3 trillion worth of structures every year, and employs around 7 million people. We spend much of our lives surrounded by the fruits of this labor, usually without thinking about what it takes to produce, and without a real awareness of how much we rely on those structures to be safe. We rarely enter buildings worrying that they are going to collapse, for example, or drive across a bridge fearful that it will crumble beneath our tires. That safety and public trust is important to maintain, and as the numbers of structures increase every year, that means regular inspection of more and more structures every year. A large number of the structures build rely upon concrete either in part or for the vast majority of their construction.


Evaluating Logical Generalization in Graph Neural Networks

arXiv.org Machine Learning

Recent research has highlighted the role of relational inductive biases in building learning agents that can generalize and reason in a compositional manner. However, while relational learning algorithms such as graph neural networks (GNNs) show promise, we do not understand how effectively these approaches can adapt to new tasks. In this work, we study the task of logical generalization using GNNs by designing a benchmark suite grounded in first-order logic. Our benchmark suite, GraphLog, requires that learning algorithms perform rule induction in different synthetic logics, represented as knowledge graphs. GraphLog consists of relation prediction tasks on 57 distinct logical domains. We use GraphLog to evaluate GNNs in three different setups: single-task supervised learning, multi-task pretraining, and continual learning. Unlike previous benchmarks, our approach allows us to precisely control the logical relationship between the different tasks. We find that the ability for models to generalize and adapt is strongly determined by the diversity of the logical rules they encounter during training, and our results highlight new challenges for the design of GNN models. We publicly release the dataset and code used to generate and interact with the dataset at https://www.cs.mcgill.ca/~ksinha4/graphlog.


AI, machine learning to deliver 'wave of discoveries'

#artificialintelligence

The past 20 years have seen remarkable advances in the mining industry, particularly in mineral exploration technologies with vast volumes of data generated from geologic, geophysical, geochemical, satellite and other surveying techniques. However, the abundance of data has not necessarily translated into the discovery of new deposits, according to Colin Barnett, co-founder of BW Mining, a Boulder, Colorado-based data mining and mineral exploration company. "One of the problems we're facing in exploration is the huge increase in the amounts of data we have to look at," said Barnett, in his presentation at the Managing and exploring big data through artificial intelligence and machine learning session at the recent PDAC 2020 convention in Toronto. "And although it's high-quality data, the sheer volume is becoming almost overwhelming for human interpreters, and so we need help in getting to the bottom of it." By integrating hundreds or even thousands of interdependent layers of data, with each layer making its own statistically determined contribution, machine learning offers a solution to the problem of tackling the massive amounts of data generated, and a powerful new tool in the search for mineral deposits. But, in an interview with The Northern Miner, he cautioned that to fully exploit the potential of machine learning in mineral exploration, "prospectors will still need to devote considerable time and effort to the preparation of data before machine learning techniques can add value for companies."


Artificial intelligence, machine learning primed to deliver 'a wave of discoveries'

#artificialintelligence

The past 20 years have seen remarkable advances in the mining industry, particularly in mineral exploration technologies with vast volumes of data generated from geologic, geophysical, geochemical, satellite and other surveying techniques. However, the abundance of data has not necessarily translated into the discovery of new deposits, according to Colin Barnett, co-founder of BW Mining, a Boulder, Colorado-based data mining and mineral exploration company. "One of the problems we're facing in exploration is the huge increase in the amounts of data we have to look at," said Barnett, in his presentation at the Managing and exploring big data through artificial intelligence and machine learning session the recent PDAC 2020 convention in Toronto. "And although it's high-quality data, the sheer volume is becoming almost overwhelming for human interpreters, and so we need help in getting to the bottom of it." By integrating hundreds or even thousands of interdependent layers of data, with each layer making its own statistically determined contribution, machine learning offers a solution to the problem of tackling the massive amounts of data generated, and a powerful new tool in the search for mineral deposits. But, in an interview with The Northern Miner, he cautioned that to fully exploit the potential of machine learning in mineral exploration, "prospectors will still need to devote considerable time and effort to the preparation of data before machine learning techniques can add value for companies."


Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology

arXiv.org Machine Learning

We propose a process model for the development of machine learning applications. It guides machine learning practitioners and project organizations from industry and academia with a checklist of tasks that spans the complete project life-cycle, ranging from the very first idea to the continuous maintenance of any machine learning application. With each task, we propose quality assurance methodology that is drawn from practical experience and scientific literature and that has proven to be general and stable enough to include them in best practices. We expand on CRISP-DM, a data mining process model that enjoys strong industry support but lacks to address machine learning specific tasks.


ENTMOOT: A Framework for Optimization over Ensemble Tree Models

arXiv.org Artificial Intelligence

Gradient boosted trees and other regression tree models perform well in a wide range of real-world, industrial applications. These tree models (i) offer insight into important prediction features, (ii) effectively manage sparse data, and (iii) have excellent prediction capabilities. Despite their advantages, they are generally unpopular for decision-making tasks and black-box optimization, which is due to their difficult-to-optimize structure and the lack of a reliable uncertainty measure. ENTMOOT is our new framework for integrating (already trained) tree models into larger optimization problems. The contributions of ENTMOOT include: (i) explicitly introducing a reliable uncertainty measure that is compatible with tree models, (ii) solving the larger optimization problems that incorporate these uncertainty aware tree models, (iii) proving that the solutions are globally optimal, i.e. no better solution exists. In particular, we show how the ENTMOOT approach allows a simple integration of tree models into decision-making and black-box optimization, where it proves as a strong competitor to commonly-used frameworks.


Directional Message Passing for Molecular Graphs

arXiv.org Machine Learning

Graph neural networks have recently achieved great successes in predicting quantum mechanical properties of molecules. These models represent a molecule as a graph using only the distance between atoms (nodes). They do not, however, consider the spatial direction from one atom to another, despite directional information playing a central role in empirical potentials for molecules, e.g. in angular potentials. To alleviate this limitation we propose directional message passing, in which we embed the messages passed between atoms instead of the atoms themselves. Each message is associated with a direction in coordinate space. These directional message embeddings are rotationally equivariant since the associated directions rotate with the molecule. We propose a message passing scheme analogous to belief propagation, which uses the directional information by transforming messages based on the angle between them. Additionally, we use spherical Bessel functions and spherical harmonics to construct theoretically well-founded, orthogonal representations that achieve better performance than the currently prevalent Gaussian radial basis representations while using fewer than 1/4 of the parameters. We leverage these innovations to construct the directional message passing neural network (DimeNet). DimeNet outperforms previous GNNs on average by 76% on MD17 and by 31% on QM9. Our implementation is available online.


A Bayesian algorithm for retrosynthesis

arXiv.org Machine Learning

The identification of synthetic routes that end with a desired product has been an inherently time-consuming process that is largely dependent on expert knowledge regarding a limited fraction of the entire reaction space. At present, emerging machine-learning technologies are overturning the process of retrosynthetic planning. The objective of this study is to discover synthetic routes backwardly from a given desired molecule to commercially available compounds. The problem is reduced to a combinatorial optimization task with the solution space subject to the combinatorial complexity of all possible pairs of purchasable reactants. We address this issue within the framework of Bayesian inference and computation. The workflow consists of two steps: a deep neural network is trained that forwardly predicts a product of the given reactants with a high level of accuracy, following which this forward model is inverted into the backward one via Bayes' law of conditional probability. Using the backward model, a diverse set of highly probable reaction sequences ending with a given synthetic target is exhaustively explored using a Monte Carlo search algorithm. The Bayesian retrosynthesis algorithm could successfully rediscover 80.3% and 50.0% of known synthetic routes of single-step and two-step reactions within top-10 accuracy, respectively, thereby outperforming state-of-the-art algorithms in terms of the overall accuracy. Remarkably, the Monte Carlo method, which was specifically designed for the presence of diverse multiple routes, often revealed a ranked list of hundreds of reaction routes to the same synthetic target. We investigated the potential applicability of such diverse candidates based on expert knowledge from synthetic organic chemistry.