Bayesian Learning
Semi-supervised Learning by Entropy Minimization
Grandvalet, Yves, Bengio, Yoshua
We consider the semi-supervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. Our approach includes otherapproaches to the semi-supervised problem as particular or limiting cases. A series of experiments illustrates that the proposed solution benefitsfrom unlabeled data. The method challenges mixture models when the data are sampled from the distribution class spanned by the generative model. The performances are definitely in favor of minimum entropy regularization when generative models are misspecified, and the weighting of unlabeled data provides robustness to the violation of the "cluster assumption". Finally, we also illustrate that the method can also be far superior to manifold learning in high dimension spaces.
Bayesian inference in spiking neurons
We propose a new interpretation of spiking neurons as Bayesian integrators accumulatingevidence over time about events in the external world or the body, and communicating to other neurons their certainties about these events. In this model, spikes signal the occurrence of new information, i.e.what cannot be predicted from the past activity. As a result, firing statistics are close to Poisson, albeit providing a deterministic representation ofprobabilities. We proceed to develop a theory of Bayesian inference in spiking neural networks, recurrent interactions implementing avariant of belief propagation. Many perceptual and motor tasks performed by the central nervous system are probabilistic, andcan be described in a Bayesian framework [4, 3].
Similarity and Discrimination in Classical Conditioning: A Latent Variable Account
Courville, Aaron C., Daw, Nathaniel D., Touretzky, David S.
We propose a probabilistic, generative account of configural learning phenomena in classical conditioning. Configural learning experiments probe how animals discriminate and generalize between patterns of simultaneously presentedstimuli (such as tones and lights) that are differentially predictive of reinforcement. Previous models of these issues have been successful more on a phenomenological than an explanatory level: they reproduce experimental findings but, lacking formal foundations, providescant basis for understanding why animals behave as they do. We present a theory that clarifies seemingly arbitrary aspects of previous modelswhile also capturing a broader set of data.
A Machine Learning Approach to Conjoint Analysis
Chapelle, Olivier, Harchaoui, Zaรฏd
Choice-based conjoint analysis builds models of consumer preferences over products with answers gathered in questionnaires. Our main goal is to bring tools from the machine learning community to solve this problem moreefficiently. Thus, we propose two algorithms to quickly and accurately estimate consumer preferences.
A Suffix Tree Approach to Email Filtering
Pampapathi, Rajesh M., Mirkin, Boris, Levene, Mark
Just as email traffic has increased over the years since its in ception, so has the proportion that is unsolicited; some estimations have plac ed the proportion as high as 60%, and the average cost of this to business at arou nd $2000 per year, per employee (see [29] for a range of numbers and statis tics on spam). Unsolicited emails - commonly know as spam - have thereby become a daily feature of every email user's inbox; and regardless of advan ces in email filtering, spam continues to be a problem in a similar way to comp uter viruses which constantly reemerge in new guises. This leaves the res earch community with the task of continually investigating new approac hes to sorting the welcome emails (known as ham) from the unwelcome spam. W e present just such an approach to email classification and fi ltering based on a well studied data structure, the suffix tree (see [1 6] for a brief introduction). The approach is similar to many existing one s, in that it uses training examples to construct a model or profile of the class and its features, then uses this to make decisions as to the class of new example s; but it differs in the depth and extent of the anaysis. For a good overview of a number of text classification methods, see [26, 1, 31]. Using a suffix tree, we are able to compare not only single word s, as in most current approaches, but substrings of an arbitrary len gth.
Relational Dynamic Bayesian Networks
Sanghai, S., Domingos, P., Weld, D.
Stochastic processes that involve the creation of objects and relations over time are widespread, but relatively poorly studied. For example, accurate fault diagnosis in factory assembly processes requires inferring the probabilities of erroneous assembly operations, but doing this efficiently and accurately is difficult. Modeled as dynamic Bayesian networks, these processes have discrete variables with very large domains and extremely high dimensionality. In this paper, we introduce relational dynamic Bayesian networks (RDBNs), which are an extension of dynamic Bayesian networks (DBNs) to first-order logic. RDBNs are a generalization of dynamic probabilistic relational models (DPRMs), which we had proposed in our previous work to model dynamic uncertain domains. We first extend the Rao-Blackwellised particle filtering described in our earlier work to RDBNs. Next, we lift the assumptions associated with Rao-Blackwellization in RDBNs and propose two new forms of particle filtering. The first one uses abstraction hierarchies over the predicates to smooth the particle filter's estimates. The second employs kernel density estimation with a kernel function specifically designed for relational domains. Experiments show these two methods greatly outperform standard particle filtering on the task of assembly plan execution monitoring.
Robust Inference of Trees
Zaffalon, Marco, Hutter, Marcus
This paper is concerned with the reliable inference of optimal tree-approximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley's imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time O(m^4), m being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees.
Monotone Conditional Complexity Bounds on Future Prediction Errors
Chernov, Alexey, Hutter, Marcus
We bound the future loss when predicting any (computably) stochastic sequence online. Solomonoff finitely bounded the total deviation of his universal predictor M from the true distribution m by the algorithmic complexity of m. Here we assume we are at a time t>1 and already observed x=x_1...x_t. We bound the future prediction performance on x_{t+1}x_{t+2}... by a new variant of algorithmic complexity of m given x, plus the complexity of the randomness deficiency of x. The new complexity is monotone in its condition in the sense that this complexity can only decrease if the condition is prolonged. We also briefly discuss potential generalizations to Bayesian model classes and to classification problems.
Semantic Integration in Text: From Ambiguous Names to Identifiable Entities
Li, Xin, Morie, Paul, Roth, Dan
Semantic integration focuses on discovering, representing, and manipulating correspondences between entities in disparate data sources. The topic has been widely studied in the context of structured data, with problems being considered including ontology and schema matching, matching relational tuples, and reconciling inconsistent data values. In recent years, however, semantic integration over text has also received increasing attention. This article studies a key challenge in semantic integration over text: identifying whether different mentions of real-world entities, such as "JFK" and "John Kennedy," within and across natural language text documents, actually represent the same concept. We present a machine-learning study of this problem. The first approach is a discriminative approach -- a pairwise local classifier is trained in a supervised way to determine whether two given mentions represent the same real-world entity. This is followed, potentially, by a global clustering algorithm that uses the classifier as its similarity metric. Our second approach is a global generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes (1) a joint distribution over entities (for example, a document that mentions "President Kennedy" is more likely to mention "Oswald" or "White House" than "Roger Clemens"), and (2) an "author" model that assumes that at least one mention of an entity in a document is easily identifiable and then generates other mentions via (3) an "appearance" model that governs how mentions are transformed from the "representative" mention. We show that both approaches perform very accurately, in the range of 90-95 percent. F1 measure for different entity types, much better than previous approaches to some aspects of this problem. Finally, we discuss how our solution for mention matching in text can be potentially applied to matching relational tuples, as well as to linking entities across databases and text.