South America
Direct 0-1 Loss Minimization and Margin Maximization with Boosting
Zhai, Shaodan, Xia, Tian, Tan, Ming, Wang, Shaojun
We propose a boosting method, DirectBoost, a greedy coordinate descent algorithm that builds an ensemble classifier of weak classifiers through directly minimizing empirical classification error over labeled training examples; once the training classification error is reduced to a local coordinatewise minimum, DirectBoost runs a greedy coordinate ascent algorithm that continuously adds weak classifiers to maximize any targeted arbitrarily defined margins until reaching a local coordinatewise maximum of the margins in a certain sense. Experimental results on a collection of machine-learning benchmark datasets show that DirectBoost gives consistently better results than AdaBoost, LogitBoost, LPBoost with column generation and BrownBoost, and is noise tolerant when it maximizes an n'th order bottom sample margin.
Summary Statistics for Partitionings and Feature Allocations
Fidaner, Isik B., Cemgil, Taylan
Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitionings. In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. We develop an element-based definition of entropy to quantify segmentation among their elements. Then we propose a simple algorithm called entropy agglomeration (EA) to summarize and visualize this information. Experiments on various infinite mixture posteriors as well as a feature allocation dataset demonstrate that the proposed statistics are useful in practice.
Introduction to Intelligent Systems in Traffic and Transportation
Bazzan, Ana L.C., Klgl, Franziska
Urban mobility is not only one of the pillars of modern economic systems, but also a key issue in the quest for equality of opportunity, once it can improve access to other services. Currently, however, there are a number of negative issues related to traffic, especially in mega-cities, such as economical issues (cost of opportunity caused by delays), environmental (externalities related to emissions of pollutants), and social (traffic accidents). Solutions to these issues are more and more closely tied to information and communication technology. Indeed, a search in the technical literature (using the keyword urban traffic" to filter out articles on data network traffic) retrieved the following number of articles (as of December 3, 2013): 9,443 (ACM Digital Library), 26,054 (Scopus), and 1,730,000 (Google Scholar). Moreover, articles listed in the ACM query relate to conferences as diverse as MobiCom, CHI, PADS, and AAMAS.
Viterbi training in PRISM
Sato, Taisuke, Kubota, Keiichi
VT (Viterbi training), or hard EM, is an efficient way of parameter learning for probabilistic models with hidden variables. Given an observation $y$, it searches for a state of hidden variables $x$ that maximizes $p(x,y \mid \theta)$ by coordinate ascent on parameters $\theta$ and $x$. In this paper we introduce VT to PRISM, a logic-based probabilistic modeling system for generative models. VT improves PRISM in three ways. First VT in PRISM converges faster than EM in PRISM due to the VT's termination condition. Second, parameters learned by VT often show good prediction performance compared to those learned by EM. We conducted two parsing experiments with probabilistic grammars while learning parameters by a variety of inference methods, i.e.\ VT, EM, MAP and VB. The result is that VT achieved the best parsing accuracy among them in both experiments. Also we conducted a similar experiment for classification tasks where a hidden variable is not a prediction target unlike probabilistic grammars. We found that in such a case VT does not necessarily yield superior performance. Third since VT always deals with a single probability of a single explanation, Viterbi explanation, the exclusiveness condition that is imposed on PRISM programs is no more required if we learn parameters by VT. Last but not least we can say that as VT in PRISM is general and applicable to any PRISM program, it largely reduces the need for the user to develop a specific VT algorithm for a specific model. Furthermore since VT in PRISM can be used just by setting a PRISM flag appropriately, it makes VT easily accessible to (probabilistic) logic programmers. To appear in Theory and Practice of Logic Programming (TPLP).
Summary Statistics for Partitionings and Feature Allocations
Fidaner, Işık Barış, Cemgil, Ali Taylan
Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitionings. In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. We develop an element-based definition of entropy to quantify segmentation among their elements. Then we propose a simple algorithm called entropy agglomeration (EA) to summarize and visualize this information. Experiments on various infinite mixture posteriors as well as a feature allocation dataset demonstrate that the proposed statistics are useful in practice.
Horn Clause Contraction Functions
Delgrande, J. P., Wassermann, R.
In classical, AGM-style belief change, it is assumed that the underlying logic contains classical propositional logic. This is clearly a limiting assumption, particularly in Artificial Intelligence. Consequently there has been recent interest in studying belief change in approaches where the full expressivity of classical propositional logic is not obtained. In this paper we investigate belief contraction in Horn knowledge bases. We point out that the obvious extension to the Horn case, involving Horn remainder sets as a starting point, is problematic. Not only do Horn remainder sets have undesirable properties, but also some desirable Horn contraction functions are not captured by this approach. For Horn belief set contraction, we develop an account in terms of a model-theoretic characterisation involving weak remainder sets. Maxichoice and partial meet Horn contraction is specified, and we show that the problems arising with earlier work are resolved by these approaches. As well, constructions of the specific operators and sets of postulates are provided, and representation results are obtained. We also examine Horn package contraction, or contraction by a set of formulas. Again, we give a construction and postulate set, linking them via a representation result. Last, we investigate the closely-related notion of forgetting in Horn clauses. This work is arguably interesting since Horn clauses have found widespread use in AI; as well, the results given here may potentially be extended to other areas which make use of Horn-like reasoning, such as logic programming, rule-based systems, and description logics. Finally, since Horn reasoning is weaker than classical reasoning, this work sheds light on the foundations of belief change
Convergence analysis of kernel LMS algorithm with pre-tuned dictionary
Chen, Jie, Gao, Wei, Richard, Cédric, Bermudez, Jose-Carlos M.
The kernel least-mean-square (KLMS) algorithm is an appealing tool for online identification of nonlinear systems due to its simplicity and robustness. In addition to choosing a reproducing kernel and setting filter parameters, designing a KLMS adaptive filter requires to select a so-called dictionary in order to get a finite-order model. This dictionary has a significant impact on performance, and requires careful consideration. Theoretical analysis of KLMS as a function of dictionary setting has rarely, if ever, been addressed in the literature. In an analysis previously published by the authors, the dictionary elements were assumed to be governed by the same probability density function of the input data. In this paper, we modify this study by considering the dictionary as part of the filter parameters to be set. This theoretical analysis paves the way for future investigations on KLMS dictionary design.
SAR Image Despeckling Algorithms using Stochastic Distances and Nonlocal Means
Torres, Leonardo, Frery, Alejandro C.
This paper presents two approaches for filter design based on stochastic distances for intensity speckle reduction. A window is defined around each pixel, overlapping samples are compared and only those which pass a goodness-of-fit test are used to compute the filtered value. The tests stem from stochastic divergences within the Information Theory framework. The technique is applied to intensity Synthetic Aperture Radar (SAR) data with homogeneous regions using the Gamma model. The first approach uses a Nagao-Matsuyama-type procedure for setting the overlapping samples, and the second uses the nonlocal method. The proposals are compared with the Improved Sigma filter and with anisotropic diffusion for speckled data (SRAD) using a protocol based on Monte Carlo simulation. Among the criteria used to quantify the quality of filters, we employ the equivalent number of looks, and line and edge preservation. Moreover, we also assessed the filters by the Universal Image Quality Index and by the Pearson correlation between edges. Applications to real images are also discussed. The proposed methods show good results.
Pattern recognition issues on anisotropic smoothed particle hydrodynamics
This is a preliminary theoretical discussion on the computational requirements of the state of the art smoothed particle hydrodynamics (SPH) from the optics of pattern recognition and artificial intelligence. It is pointed out in the present paper that, when including anisotropy detection to improve resolution on shock layer, SPH is a very peculiar case of unsupervised machine learning. On the other hand, the free particle nature of SPH opens an opportunity for artificial intelligence to study particles as agents acting in a collaborative framework in which the timed outcomes of a fluid simulation forms a large knowledge base, which might be very attractive in computational astrophysics phenomenological problems like self-propagating star formation.
Topic Segmentation and Labeling in Asynchronous Conversations
Joty, S., Carenini, G., Ng, R. T.
Topic segmentation and labeling is often considered a prerequisite for higher-level conversation analysis and has been shown to be useful in many Natural Language Processing (NLP) applications. We present two new corpora of email and blog conversations annotated with topics, and evaluate annotator reliability for the segmentation and labeling tasks in these asynchronous conversations. We propose a complete computational framework for topic segmentation and labeling in asynchronous conversations. Our approach extends state-of-the-art methods by considering a fine-grained structure of an asynchronous conversation, along with other conversational features by applying recent graph-based methods for NLP. For topic segmentation, we propose two novel unsupervised models that exploit the fine-grained conversational structure, and a novel graph-theoretic supervised model that combines lexical, conversational and topic features. For topic labeling, we propose two novel (unsupervised) random walk models that respectively capture conversation specific clues from two different sources: the leading sentences and the fine-grained conversational structure. Empirical evaluation shows that the segmentation and the labeling performed by our best models beat the state-of-the-art, and are highly correlated with human annotations.