automatic statistician
Top 10 Automation Data Science And Machine Learning Platforms In 2020
The employment of Data Science and Machine Learning technologies is at a peak. We can see several software and tools with various innovative features in the market that serve us with the efficiency of new-age data technologies that can potentially increase a business's efficiency and value proposition. With continuous evolution at scale such solutions too, get revamped with time. Now is the era for automated data science and machine learning software that not only enhance the operational proficiency of such tools but also assist data scientists with great potential. They help automate the repetitive and mundane tasks within the ML or data science processes without compromising model performance and productivity. Therefore, here is the list of top 10 automated data science and machine learning software presented by some key players of the respective market.
Differentiable Compositional Kernel Learning for Gaussian Processes
Sun, Shengyang, Zhang, Guodong, Wang, Chaoqi, Zeng, Wenyuan, Li, Jiaman, Grosse, Roger
The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. We present the Neural Kernel Network (NKN), a flexible family of kernels represented by a neural network. The NKN's architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel. It can compactly approximate compositional kernel structures such as those used by the Automatic Statistician (Lloyd et al., 2014), but because the architecture is differentiable, it is end-to-end trainable with gradientbased optimization. We show that the NKN is universal for the class of stationary kernels. Empirically we demonstrate NKN's pattern discovery and extrapolation abilities on several tasks that depend crucially on identifying the underlying structure, including time series and texture extrapolation, as well as Bayesian optimization.
Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes
Automating statistical modelling is a challenging problem in artificial intelligence. The Automatic Statistician takes a first step in this direction, by employing a kernel search algorithm with Gaussian Processes (GP) to provide interpretable statistical models for regression problems. However this does not scale due to its $O(N^3)$ running time for the model selection. We propose Scalable Kernel Composition (SKC), a scalable kernel search algorithm that extends the Automatic Statistician to bigger data sets. In doing so, we derive a cheap upper bound on the GP marginal likelihood that sandwiches the marginal likelihood with the variational lower bound . We show that the upper bound is significantly tighter than the lower bound and thus useful for model selection.
Automatic Statistician
Making sense of data is one of the great challenges of the information age we live in. While it is becoming easier to collect and store all kinds of data, from personal medical data, to scientific data, to public data, and commercial data, there are relatively few people trained in the statistical and machine learning methods required to test hypotheses, make predictions, and otherwise create interpretable knowledge from this data. The Automatic Statistician project aims to build an artificial intelligence for data science, helping people make sense of their data. The project is at an early stage, but please have a look at our example analyses and feel free to contact us or subscribe to our mailing list.
The Automatic Statistician: A Relational Perspective
Hwang, Yunseong, Tong, Anh, Choi, Jaesik
Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets; US stock data, US house price index data and currency exchange rate data.