Goto

Collaborating Authors

 Directed Networks


Text classification using Naive Bayes classifier

#artificialintelligence

In this article, we have explored how we can classify text into different categories using Naive Bayes classifier. We have used the News20 dataset and developed the demo in Python. As the name suggests, classifying texts can be referred as text classification. Usually, we classify them for ease of access and understanding. We don't need human labour to make them sit all day reading texts and labelling categories.


Connections: Log Likelihood, Cross Entropy, KL Divergence, Logistic Regression, and Neural Networks

#artificialintelligence

Maximizing the (log) likelihood is equivalent to minimizing the binary cross entropy. There is literally no difference between the two objective functions, so there can be no difference between the resulting model or its characteristics. This of course, can be extended quite simply to the multiclass case using softmax cross-entropy and the so-called multinoulli likelihood, so there is no difference when doing this for multiclass cases as is typical in, say, neural networks. The difference between MLE and cross-entropy is that MLE represents a structured and principled approach to modeling and training, and binary/softmax cross-entropy simply represent special cases of that applied to problems that people typically care about. After that aside on maximum likelihood estimation, let's delve more into the relationship between negative log likelihood and cross entropy.


An empirical study of neural networks for trend detection in time series

arXiv.org Machine Learning

We have derived theoretical maximum likelihood estimators of trends for standard dynamics and implemented them. We have reframed the problem of trend detection into a classification problem amenable to machine learning methods. We have shown that RNN are in a way a generalization of simple moving average techniques and motivated this by theory. In a simple case, we have shown that this generalization transforms the trend estimation problem into simply locating the state vector into convex polytopes cells. Finally, we have showed empirically that GRU or LSTM cells are on average the best building block to use compared to a broad range of estimators in order to detect trends in time series. Putting the emphasis on learning stylized data and then transferring to real data rather than building complex structures fitted to data is also an important takeaway of this paper. Ongoing preliminary research seems to validate our approach for financial applications. This might pave the way to building efficient market estimators protected against over-fitting.


Is AI different for SE?

arXiv.org Artificial Intelligence

What AI tools are needed for SE? Ideally, we should have simple rules that peek at data, then say "use this tool" or "use that tool". To find such a rule, we explored 120 different data sets addressing numerous problems, including bad smell detection, predicting Github issue close time, bug report analysis, defect prediction and dozens of other non-SE problems. To this data, we apply a SE-based tool that (a)~out-performs the state-of-the-art for these SE problems yet (b)~fails very badly on standard AI problems. In those results, we can find a simple rule for when to use/avoid the SE-based tool. SE data is often about infrequent issues, like the occasional defect, or the rarely exploited security violation, or the requirement that holds for one special case. But as we show, standard AI tools work best when the target is relatively more frequent. Also, we can exploit these special properties of SE, to great effect (to rapidly find better optimizations for SE tasks via a tactic called "dodging", explained in this paper). More generally, this result says we need a new kind of SE research for developing new AI tools that are more suited to SE problems.


Learning Apache Mahout - Programmer Books

#artificialintelligence

In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark. Starting with the basics of Mahout and machine learning, you will explore prominent algorithms and their implementation in Mahout development. You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naive Bayes classifier and item and user-based recommendation.


Contrast Trees and Distribution Boosting

arXiv.org Machine Learning

Often machine learning methods are applied and results reported in cases where there is little to no information concerning accuracy of the output. Simply because a computer program returns a result does not insure its validity. If decisions are to be made based on such results it is important to have some notion of their veracity. Contrast trees represent a new approach for assessing the accuracy of many types of machine learning estimates that are not amenable to standard (cross) validation methods. In situations where inaccuracies are detected boosted contrast trees can often improve performance. A special case, distribution boosting, provides an assumption free method for estimating the full probability distribution of an outcome variable given any set of joint input predictor variable values.


Nonparametric Bayesian Structure Adaptation for Continual Learning

arXiv.org Machine Learning

Continual Learning is a learning paradigm where machine learning mode ls are trained with sequential or streaming tasks. Two notable directions among the recent adva nces in continual learning with neural networks are ( i) variational Bayes based regularization by learning priors from pre vious tasks, and, ( ii) learning the structure of deep networks to adapt to new tasks. S o far, these two approaches have been orthogonal. We present a principled nonparametric Bayesian appr oach for learning the structure of feed-forward neural networks, addressing the shortcomings o f both these approaches. In our model, the number of nodes in each hidden layer can automatically grow with the in troduction of each new task, and inter-task transfer occurs through the overlapping of differ ent sparse subsets of weights learned by different tasks. On benchmark datasets, our model performs comparably or better than the state-of-the-art approaches, while also being able to adaptively infer the evolving network structure in the continual learning setting.


Invited Talk: Symbolic Reasoning About Machine Learning Systems (PADL 2020 : 22nd Symposium on Practical Aspects of Declarative Languages) - POPL 2020

#artificialintelligence

I will discuss a line of work in which we compile common machine learning systems into symbolic representations that have the same input-output behavior to facilitate formal reasoning about these systems. We have targeted Bayesian network classifiers, random forests and some types of neural networks, compiling each into tractable Boolean circuits, including Ordered Binary Decision Diagrams (OBDDs). Once the machine learning system is compiled into a tractable Boolean circuit, reasoning can commence using classical AI and computer science techniques. This includes generating explanations for decisions, quantifying robustness and verifying properties such as monotonicity. I will particularly discuss a new theory for unveiling the reasons behind the decisions made by classifiers, which can detect classifier bias sometimes from the reasons behind unbiased decisions.


Improved PAC-Bayesian Bounds for Linear Regression

arXiv.org Machine Learning

In this paper, we improve the PAC-Bayesian error bound for linear regression derived in Germain et al. [10]. The improvements are twofold. First, the proposed error bound is tighter, and converges to the generalization loss with a well-chosen temperature parameter. Second, the error bound also holds for training data that are not independently sampled. In particular, the error bound applies to certain time series generated by well-known classes of dynamical models, such as ARX models.


Sampling-Free Learning of Bayesian Quantized Neural Networks

arXiv.org Machine Learning

Bayesian learning of model parameters in neural networks is important in scenarios where estimates with well-calibrated uncertainty are important. In this paper, we propose Bayesian quantized networks (BQNs), quantized neural networks (QNNs) for which we learn a posterior distribution over their discrete parameters. We provide a set of efficient algorithms for learning and prediction in BQNs without the need to sample from their parameters or activations, which not only allows for differentiable learning in QNNs, but also reduces the variance in gradients. We demonstrate BQNs achieve both lower predictive errors and better-calibrated uncertainties than E-QNN (with less than 20% of the negative log-likelihood). A Bayesian approach to deep learning considers the network's parameters to be random variables and seeks to infer their posterior distribution given the training data. Models trained this way, called Bayesian neural networks (BNNs) (Wang & Y eung, 2016), in principle have well-calibrated uncertainties when they make predictions, which is important in scenarios such as active learning and reinforcement learning (Gal, 2016). Furthermore, the posterior distribution over the model parameters provides valuable information for evaluation and compression of neural networks. There are three main challenges in using BNNs: (1) Intractable posterior: Computing and storing the exact posterior distribution over the network weights is intractable due to the complexity and high-dimensionality of deep networks. These challenges are typically addressed either by making simplifying assumptions about the distributions of the parameters and activations, or by using sampling-based approaches, which are expensive and unreliable (likely to overestimate the uncertainties in predictions). Our goal is to propose a sampling-free method which uses probabilistic propagation to deterministically learn BNNs. A seemingly unrelated area of deep learning research is that of quantized neural networks (QNNs), which offer advantages of computational and memory efficiency compared to continuous-valued models.