Goto

Collaborating Authors

 npt






Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Kossen, Jannik, Band, Neil, Lyle, Clare, Gomez, Aidan N., Rainforth, Tom, Gal, Yarin

arXiv.org Machine Learning

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.


How UPS uses AI to outsmart bad weather

MIT Technology Review

If a snowstorm hits Denver, it can delay thousands of packages that travel through the city before reaching their final destinations on the other side of the country. But if UPS knows a storm is coming, what is the most efficient way to divert all those online orders and holiday gifts around the bad weather? UPS grapples with this question every winter. Identifying the facility best equipped to handle a large, unplanned shipment and the most efficient way to transport those packages is a tough call for even experienced UPS employees. The variables--among them the types of packages, their destinations, and the deadlines by which they need to be delivered--add complexity that could slow down UPS engineers and make it harder to nimbly shift resources.


Combining local and global smoothing in multivariate density estimation

Azzalini, Adelchi

arXiv.org Machine Learning

Nonparametric estimation of a multivariate density estimation is tackled via a method which combines traditional local smoothing with a form of global smoothing but without imposing a rigid structure. Simulation work delivers encouraging indications on the effectiveness of the method. An application to density-based clustering illustrates a possible usage. Consider estimation of the probability density function f(·) of a continuous random variable in cases when a parametric formulation for f is not considered appropriate. Given a random sample drawn form f, a variety of nonparametric estimation methods are available.