AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

#artificialintelligenceDec-9-2019, 04:19:04 GMT

Connections: Log Likelihood, Cross Entropy, KL Divergence, Logistic Regression, and Neural Networks

Maximizing the (log) likelihood is equivalent to minimizing the binary cross entropy. There is literally no difference between the two objective functions, so there can be no difference between the resulting model or its characteristics. This of course, can be extended quite simply to the multiclass case using softmax cross-entropy and the so-called multinoulli likelihood, so there is no difference when doing this for multiclass cases as is typical in, say, neural networks. The difference between MLE and cross-entropy is that MLE represents a structured and principled approach to modeling and training, and binary/softmax cross-entropy simply represent special cases of that applied to problems that people typically care about. After that aside on maximum likelihood estimation, let's delve more into the relationship between negative log likelihood and cross entropy.

cross entropy, likelihood, log likelihood, (11 more...)

Genre:

Research Report > New Finding (0.42)
Research Report > Experimental Study (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.59)

Miot, Alexandre, Drigout, Gilles

An empirical study of neural networks for trend detection in time series

arXiv.org Machine LearningDec-9-2019

We have derived theoretical maximum likelihood estimators of trends for standard dynamics and implemented them. We have reframed the problem of trend detection into a classification problem amenable to machine learning methods. We have shown that RNN are in a way a generalization of simple moving average techniques and motivated this by theory. In a simple case, we have shown that this generalization transforms the trend estimation problem into simply locating the state vector into convex polytopes cells. Finally, we have showed empirically that GRU or LSTM cells are on average the best building block to use compared to a broad range of estimators in order to detect trends in time series. Putting the emphasis on learning stylized data and then transferring to real data rather than building complex structures fitted to data is also an important takeaway of this paper. Ongoing preliminary research seems to validate our approach for financial applications. This might pave the way to building efficient market estimators protected against over-fitting.

estimator, rnn baseline, time sery, (12 more...)

1912.04009

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Industry: Banking & Finance > Trading (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Agrawal, Amritanshu, Menzies, Tim

Is AI different for SE?

arXiv.org Artificial IntelligenceDec-9-2019

What AI tools are needed for SE? Ideally, we should have simple rules that peek at data, then say "use this tool" or "use that tool". To find such a rule, we explored 120 different data sets addressing numerous problems, including bad smell detection, predicting Github issue close time, bug report analysis, defect prediction and dozens of other non-SE problems. To this data, we apply a SE-based tool that (a)~out-performs the state-of-the-art for these SE problems yet (b)~fails very badly on standard AI problems. In those results, we can find a simple rule for when to use/avoid the SE-based tool. SE data is often about infrequent issues, like the occasional defect, or the rarely exploited security violation, or the requirement that holds for one special case. But as we show, standard AI tools work best when the target is relatively more frequent. Also, we can exploit these special properties of SE, to great effect (to rapidly find better optimizations for SE tasks via a tactic called "dodging", explained in this paper). More generally, this result says we need a new kind of SE research for developing new AI tools that are more suited to SE problems.

ai tool, international conference, software engineering, (13 more...)

arXiv.org Artificial Intelligence

1912.04061

Country:

North America > United States > New York > New York County > New York City (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > North Carolina (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

#artificialintelligenceDec-8-2019, 21:21:54 GMT

Learning Apache Mahout - Programmer Books

In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark. Starting with the basics of Mahout and machine learning, you will explore prominent algorithms and their implementation in Mahout development. You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naive Bayes classifier and item and user-based recommendation.

learning apache mahout, programmer book, use case

Genre: Instructional Material (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.65)

arXiv.org Machine LearningDec-8-2019

Contrast Trees and Distribution Boosting

Friedman, Jerome H.

Often machine learning methods are applied and results reported in cases where there is little to no information concerning accuracy of the output. Simply because a computer program returns a result does not insure its validity. If decisions are to be made based on such results it is important to have some notion of their veracity. Contrast trees represent a new approach for assessing the accuracy of many types of machine learning estimates that are not amenable to standard (cross) validation methods. In situations where inaccuracies are detected boosted contrast trees can often improve performance. A special case, distribution boosting, provides an assumption free method for estimating the full probability distribution of an outcome variable given any set of joint input predictor variable values.

contrast tree, discrepancy, gradient, (14 more...)

1912.03785

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Portugal > Coimbra > Coimbra (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Kumar, Abhishek, Chatterjee, Sunabha, Rai, Piyush

Nonparametric Bayesian Structure Adaptation for Continual Learning

arXiv.org Machine LearningDec-8-2019

Continual Learning is a learning paradigm where machine learning mode ls are trained with sequential or streaming tasks. Two notable directions among the recent adva nces in continual learning with neural networks are ( i) variational Bayes based regularization by learning priors from pre vious tasks, and, ( ii) learning the structure of deep networks to adapt to new tasks. S o far, these two approaches have been orthogonal. We present a principled nonparametric Bayesian appr oach for learning the structure of feed-forward neural networks, addressing the shortcomings o f both these approaches. In our model, the number of nodes in each hidden layer can automatically grow with the in troduction of each new task, and inter-task transfer occurs through the overlapping of differ ent sparse subsets of weights learned by different tasks. On benchmark datasets, our model performs comparably or better than the state-of-the-art approaches, while also being able to adaptively infer the evolving network structure in the continual learning setting.

continual learning, neural network, new task, (12 more...)

1912.03624

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

#artificialintelligenceDec-7-2019, 00:08:07 GMT

Invited Talk: Symbolic Reasoning About Machine Learning Systems (PADL 2020 : 22nd Symposium on Practical Aspects of Declarative Languages) - POPL 2020

I will discuss a line of work in which we compile common machine learning systems into symbolic representations that have the same input-output behavior to facilitate formal reasoning about these systems. We have targeted Bayesian network classifiers, random forests and some types of neural networks, compiling each into tractable Boolean circuits, including Ordered Binary Decision Diagrams (OBDDs). Once the machine learning system is compiled into a tractable Boolean circuit, reasoning can commence using classical AI and computer science techniques. This includes generating explanations for decisions, quantifying robustness and verifying properties such as monotonicity. I will particularly discuss a new theory for unveiling the reasons behind the decisions made by classifiers, which can detect classifier bias sometimes from the reasons behind unbiased decisions.

declarative language, machine learning system, symbolic reasoning, (8 more...)

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Shalaeva, Vera, Esfahani, Alireza Fakhrizadeh, Germain, Pascal, Petreczky, Mihaly

Improved PAC-Bayesian Bounds for Linear Regression

arXiv.org Machine LearningDec-6-2019

In this paper, we improve the PAC-Bayesian error bound for linear regression derived in Germain et al. [10]. The improvements are twofold. First, the proposed error bound is tighter, and converges to the generalization loss with a well-chosen temperature parameter. Second, the error bound also holds for training data that are not independently sampled. In particular, the error bound applies to certain time series generated by well-known classes of dynamical models, such as ARX models.

assumption, generalization, theorem 7, (14 more...)

1912.03036

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Su, Jiahao, Cvitkovic, Milan, Huang, Furong

Sampling-Free Learning of Bayesian Quantized Neural Networks

arXiv.org Machine LearningDec-6-2019

Bayesian learning of model parameters in neural networks is important in scenarios where estimates with well-calibrated uncertainty are important. In this paper, we propose Bayesian quantized networks (BQNs), quantized neural networks (QNNs) for which we learn a posterior distribution over their discrete parameters. We provide a set of efficient algorithms for learning and prediction in BQNs without the need to sample from their parameters or activations, which not only allows for differentiable learning in QNNs, but also reduces the variance in gradients. We demonstrate BQNs achieve both lower predictive errors and better-calibrated uncertainties than E-QNN (with less than 20% of the negative log-likelihood). A Bayesian approach to deep learning considers the network's parameters to be random variables and seeks to infer their posterior distribution given the training data. Models trained this way, called Bayesian neural networks (BNNs) (Wang & Y eung, 2016), in principle have well-calibrated uncertainties when they make predictions, which is important in scenarios such as active learning and reinforcement learning (Gal, 2016). Furthermore, the posterior distribution over the model parameters provides valuable information for evaluation and compression of neural networks. There are three main challenges in using BNNs: (1) Intractable posterior: Computing and storing the exact posterior distribution over the network weights is intractable due to the complexity and high-dimensionality of deep networks. These challenges are typically addressed either by making simplifying assumptions about the distributions of the parameters and activations, or by using sampling-based approaches, which are expensive and unreliable (likely to overestimate the uncertainties in predictions). Our goal is to propose a sampling-free method which uses probabilistic propagation to deterministically learn BNNs. A seemingly unrelated area of deep learning research is that of quantized neural networks (QNNs), which offer advantages of computational and memory efficiency compared to continuous-valued models.

bqn, neural network, probabilistic propagation, (11 more...)

1912.02992

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
(3 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)