AITopics | boston housing data

Recent advances in Deep Gaussian Processes (DGPs) show the potential to have more expressive representation than that of traditional Gaussian Processes (GPs). However, there exists a pathology of deep Gaussian processes that their learning capacities reduce significantly when the number of layers increases. In this paper, we present a new analysis in DGPs by studying its corresponding nonlinear dynamic systems to explain the issue. Existing work reports the pathology for the squared exponential kernel function. We extend our investigation to four types of common stationary kernel functions. The recurrence relations between layers are analytically derived, providing a tighter bound and the rate of convergence of the dynamic systems. We demonstrate our finding with a number of experimental results.

artificial intelligence, machine learning, recurrence relation, (18 more...)

arXiv.org Machine Learning

2010.09301

Country: Asia > South Korea > Ulsan > Ulsan (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.73)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)

Add feedback

Explaining black box decisions by Shapley cohort refinement

Mase, Masayoshi, Owen, Art B., Seiler, Benjamin

arXiv.org Artificial IntelligenceNov-1-2019

Black box prediction models used in statistics, machine learning and artificial intelligence have been able to make increasingly accurate predictions, but it remains hard to understand those predictions. See for example, ˇ Strumbelj and Kononenko (2010, 2014), Ribeiro et al. (2016), Sundararajan and Najmi (2019) and the book of Molnar (2018). Part of understanding predictions is understanding which variables are important. A variable could be important because changing it makes a causal difference, or because changing it makes a large change to our predictions or because leaving it out of a model reduces that model's prediction accuracy (Jiang and Owen, 2003). Importance by one of these criteria need not imply importance by another, though additional assumptions may allow a causal implication to be made from one of the other measures (Pearl, 2009; Zhao and Hastie, 2019). We could be interested in variables that are important overall or in variables that explain one single prediction, such as why a given person was or was not approved for a loan, or why a given patient was or was not placed in an intensive care unit.

decomposition, predictor, shapley value, (15 more...)

arXiv.org Artificial Intelligence

1911.00467

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Genre: Research Report (0.83)

Industry:

Transportation > Air (0.61)
Health & Medicine (0.54)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Tree Ensembles with Rule Structured Horseshoe Regularization

Nalenz, Malte, Villani, Mattias

arXiv.org Machine LearningFeb-15-2018

We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu (2008) where rules from decision trees and linear terms are used in a L1-regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictor while leaving the important signal essentially untouched. This is especially important when a large number of rules are used as predictors as many of them only contribute noise. Our horseshoe prior has an additional hierarchical layer that applies more shrinkage a priori to rules with a large number of splits, and to rules that are only satisfied by a few observations. The aggressive noise shrinkage of our prior also makes it possible to complement the rules from boosting in Friedman and Popescu (2008) with an additional set of trees from random forest, which brings a desirable diversity to the ensemble. We sample from the posterior distribution using a very efficient and easily implemented Gibbs sampler. The new model is shown to outperform state-of-the-art methods like RuleFit, BART and random forest on 16 datasets. The model and its interpretation is demonstrated on the well known Boston housing data, and on gene expression data for cancer classification. The posterior sampling, prediction and graphical tools for interpreting the model results are implemented in a publicly available R package.

artificial intelligence, horserule, machine learning, (19 more...)

arXiv.org Machine Learning

1702.05008

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
(2 more...)

Add feedback

Nonlinear Markov Networks for Continuous Variables

Hofmann, Reimar, Tresp, Volker

Neural Information Processing SystemsDec-31-1998

We address the problem oflearning structure in nonlinear Markov networks with continuous variables. This can be viewed as non-Gaussian multidimensional density estimation exploiting certain conditional independencies in the variables. Markov networks are a graphical way of describing conditional independencies well suited to model relationships which do not exhibit a natural causal ordering. We use neural network structures to model the quantitative relationships between variables. The main focus in this paper will be on learning the structure for the purpose of gaining insight into the underlying process. Using two data sets we show that interesting structures can be found using our approach. Inference will be briefly addressed.

boston housing data, markov boundary, markov network, (12 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Japan (0.04)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Nonlinear Markov Networks for Continuous Variables

Hofmann, Reimar, Tresp, Volker

Neural Information Processing SystemsDec-31-1998

We address the problem oflearning structure in nonlinear Markov networks with continuous variables. This can be viewed as non-Gaussian multidimensional density estimation exploiting certain conditional independencies in the variables. Markov networks are a graphical way of describing conditional independencies well suited to model relationships which do not exhibit a natural causal ordering. We use neural network structures to model the quantitative relationships between variables. The main focus in this paper will be on learning the structure for the purpose of gaining insight into the underlying process. Using two data sets we show that interesting structures can be found using our approach. Inference will be briefly addressed.

boston housing data, markov boundary, markov network, (12 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Japan (0.04)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Nonlinear Markov Networks for Continuous Variables

Hofmann, Reimar, Tresp, Volker

Neural Information Processing SystemsDec-31-1998

We address the problem oflearning structure in nonlinear Markov networks with continuous variables. This can be viewed as non-Gaussian multidimensional densityestimation exploiting certain conditional independencies in the variables. Markov networks are a graphical way of describing conditional independencieswell suited to model relationships which do not exhibit a natural causal ordering. We use neural network structures to model the quantitative relationships between variables.

boston housing data, markov boundary, markov network, (12 more...)

Neural Information Processing Systems

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Japan (0.04)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Early Brain Damage

Tresp, Volker, Neuneier, Ralph, Zimmermann, Hans-Georg

Neural Information Processing SystemsDec-31-1997

Optimal Brain Damage (OBD) is a method for reducing the number of weights in a neural network. OBD estimates the increase in cost function if weights are pruned and is a valid approximation if the learning algorithm has converged into a local minimum. On the other hand it is often desirable to terminate the learning process before a local minimum is reached (early stopping). In this paper we show that OBD estimates the increase in cost function incorrectly if the network is not in a local minimum. We also show how OBD can be extended such that it can be used in connection with early stopping. We call this new approach Early Brain Damage, EBD. EBD also allows to revive already pruned weights. We demonstrate the improvements achieved by EBD using three publicly available data sets.

cost function, early stopping, obd, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
Europe > Germany (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Early Brain Damage

Tresp, Volker, Neuneier, Ralph, Zimmermann, Hans-Georg

Neural Information Processing SystemsDec-31-1997

Optimal Brain Damage (OBD) is a method for reducing the number of weights in a neural network. OBD estimates the increase in cost function if weights are pruned and is a valid approximation if the learning algorithm has converged into a local minimum. On the other hand it is often desirable to terminate the learning process before a local minimum is reached (early stopping). In this paper we show that OBD estimates the increase in cost function incorrectly if the network is not in a local minimum. We also show how OBD can be extended such that it can be used in connection with early stopping. We call this new approach Early Brain Damage, EBD. EBD also allows to revive already pruned weights. We demonstrate the improvements achieved by EBD using three publicly available data sets.

cost function, early stopping, obd, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
Europe > Germany (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Early Brain Damage

Tresp, Volker, Neuneier, Ralph, Zimmermann, Hans-Georg

Neural Information Processing SystemsDec-31-1997

Optimal Brain Damage (OBD) is a method for reducing the number ofweights in a neural network. OBD estimates the increase in cost function if weights are pruned and is a valid approximation if the learning algorithm has converged into a local minimum. On the other hand it is often desirable to terminate the learning process beforea local minimum is reached (early stopping). In this paper we show that OBD estimates the increase in cost function incorrectly if the network is not in a local minimum. We also show how OBD can be extended such that it can be used in connection withearly stopping.

artificial intelligence, cost function, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback