AITopics | Cvitkovic, Milan

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

Huang, Xin, Khetan, Ashish, Cvitkovic, Milan, Karnin, Zohar

arXiv.org Artificial IntelligenceDec-11-2020

We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models. Furthermore, we demonstrate that the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. Lastly, for the semi-supervised setting we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.

deep learning, neural network, tabtransformer, (20 more...)

arXiv.org Artificial Intelligence

2012.06678

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Banking & Finance (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.91)

Add feedback

A General Method for Amortizing Variational Filtering

Marino, Joseph, Cvitkovic, Milan, Yue, Yisong

Neural Information Processing SystemsFeb-14-2020, 19:57:44 GMT

We introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. filtering. The algorithm is derived from the variational objective in the filtering setting and consists of an optimization procedure at each time step. By performing each inference optimization procedure with an iterative amortized inference model, we obtain a computationally efficient implementation of the algorithm, which we call amortized variational filtering. We present experiments demonstrating that this general-purpose method improves inference performance across several recent deep dynamical latent variable models. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, variational, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Minimal Achievable Sufficient Statistic Learning

Cvitkovic, Milan, Koliander, Günther

arXiv.org Machine LearningJun-11-2019

We introduce Minimal Achievable Sufficient Statistic (MASS) Learning, a training method for machine learning models that attempts to produce minimal sufficient statistics with respect to a class of functions (e.g. deep networks) being optimized over. In deriving MASS Learning, we also introduce Conserved Differential Information (CDI), an information-theoretic quantity that - unlike standard mutual information - can be usefully applied to deterministically-dependent continuous random variables like the input and output of a deep network. In a series of experiments, we show that deep networks trained with MASS Learning achieve competitive performance on supervised learning and uncertainty quantification benchmarks.

deep learning, entropy 0, neural network, (18 more...)

arXiv.org Machine Learning

1905.07822

Country:

Europe (0.67)
North America > United States > California > Los Angeles County (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A General Method for Amortizing Variational Filtering

Marino, Joseph, Cvitkovic, Milan, Yue, Yisong

Neural Information Processing SystemsDec-31-2018

We introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. filtering. The algorithm is derived from the variational objective in the filtering setting and consists of an optimization procedure at each time step. By performing each inference optimization procedure with an iterative amortized inference model, we obtain a computationally efficient implementation of the algorithm, which we call amortized variational filtering. We present experiments demonstrating that this general-purpose method improves inference performance across several recent deep dynamical latent variable models.

artificial intelligence, machine learning, optimization problem, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Industry:

Media > Music (0.47)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A General Method for Amortizing Variational Filtering

Marino, Joseph, Cvitkovic, Milan, Yue, Yisong

Neural Information Processing SystemsDec-31-2018

We introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. filtering. The algorithm is derived from the variational objective in the filtering setting and consists of an optimization procedure at each time step. By performing each inference optimization procedure with an iterative amortized inference model, we obtain a computationally efficient implementation of the algorithm, which we call amortized variational filtering. We present experiments demonstrating that this general-purpose method improves inference performance across several recent deep dynamical latent variable models.

deep learning, inference, neural network, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Industry:

Media > Music (0.47)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A General Method for Amortizing Variational Filtering

Marino, Joseph, Cvitkovic, Milan, Yue, Yisong

arXiv.org Machine LearningNov-12-2018

We introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. filtering. The algorithm is derived from the variational objective in the filtering setting and consists of an optimization procedure at each time step. By performing each inference optimization procedure with an iterative amortized inference model, we obtain a computationally efficient implementation of the algorithm, which we call amortized variational filtering. We present experiments demonstrating that this general-purpose method improves performance across several deep dynamical latent variable models.

deep learning, inference, neural network, (18 more...)

arXiv.org Machine Learning

1811.0509

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.82)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Some Requests for Machine Learning Research from the East African Tech Scene

Cvitkovic, Milan

arXiv.org Machine LearningNov-7-2018

Based on 46 in-depth interviews with scientists, engineers, and CEOs, this document presents a list of concrete machine research problems, progress on which would directly benefit tech ventures in East Africa. The goal of this work is to give machine learning researchers a fuller picture of where and how their efforts as scientists can be useful. The goal is thus not to highlight research problems that are unique to East Africa -- indeed many of the problems listed below are of general interest in machine learning. The problems on the list are united solely by the fact that technology practitioners and organizations in East Africa reported a pressing need for their solution. The author is aware that listing machine learning problems without also providing data for them is not a recipe for getting those problems solved.

artificial intelligence, east africa, survey article, (17 more...)

arXiv.org Machine Learning

1810.11383

Country:

Africa > East Africa (0.75)
North America > United States > California (0.14)

Genre:

Research Report (0.51)
Questionnaire & Opinion Survey (0.49)
Overview (0.47)

Industry: Education (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Cvitkovic, Milan, Singh, Badal, Anandkumar, Anima

arXiv.org Machine LearningOct-18-2018

Often models that operate on source code consume ASTs by linearizing them (usually with a depth-first traversal) (Amodio et al., 2017; Liu et al., 2017; Li et al., 2017), but they can also be processed by deep learning models that take graphs as input, as in White et al. (2016) and Chen et al. (2018) who use Recursive Neural Networks (RveNNs) (Goller & Kuchler, 1996) on ASTs. RveNNs are models that operate on tree-topology graphs, and have been used extensively for language modeling (Socher et al., 2013) and on domains similar to source code, like mathematical expressions (Zaremba et al., 2014; Arabshahi et al., 2018). They can be considered a special case of Message Passing Neural Networks (MPNNs) in the framework of Gilmer et al. (2017): in this analogy RveNNs are to Belief Propagation as MPNNs are to Loopy Belief Propagation. They can also be considered a special case of Graph Networks in the framework of Battaglia et al. (2018). ASTs also serve as a natural basis for models that generate code as output, as in Maddison & Tarlow (2014), Yin & Neubig (2017), Rabinovich et al. (2017), Chen et al. (2018), and Brockschmidt et al. (2018).

deep learning, neural network, source code, (19 more...)

arXiv.org Machine Learning

1810.08305

Country: North America > United States > California (0.28)

Genre: Research Report (0.65)

Technology: