Dep artment of Computer and Information Scienc e, Link oping University S-581 83 Link oping, Swe den Abstract W ei n v estigate the computational prop erties of the spatial algebra R CC-5 whic h is a restricted v ersion of the R CC framew ork for spatial reasoning. The satisabili t y problem for R CC-5 is kno wn to b e NPcomplete but not m uc hi sk n o wn ab out its appro ximately four billion sub classes. In the pro cess, w e iden tify all maximal tractable subalgebras whic h are four in total. The main reason for this is, probably, that spatial reasoning has pro v ed to b e applicable to real-w orld problems in, for example, geographical database systems (Egenhofer, 1991; Grigni, P apadias, & P apadimitriou, 1995) and molecular biology (Cui, 1994). In b oth these applications, the size of the problem instances can b e h uge, so the complexit y of problems and algorithms is a highly relev an t area to study .H o w ev er, questions of computational complexit yh a v e not receiv ed so m uc h atten tion in the literature; t w o notable exceptions are the results rep orted b y Neb el (1995) and Renz and Neb el (1997). Aw ell-kno wn framew ork for qualitativ e spatial reasoning is the so-called R CC approac h (Randell & Cohn, 1989; Randell, Cui, & Cohn, 1992). This approac h is based on mo delling qualitativ e spatial relations b et w een regions using rst-order logic. Of sp ecial in terest, from a complexit y-theoretic standp oin t, are the t w o sub classes R CC-5 and R CC-8. It is w ell-kno wn that b oth R CC-5 and R CC-8 ha v e quite w eak expressiv ep o w er.
This paper presents new experimental evidence against the utility of Occam's razor. A~systematic procedure is presented for post-processing decision trees produced by C4.5. This procedure was derived by rejecting Occam's razor and instead attending to the assumption that similar objects are likely to belong to the same class. It increases a decision tree's complexity without altering the performance of that tree on the training data from which it is inferred. The resulting more complex decision trees are demonstrated to have, on average, for a variety of common learning tasks, higher predictive accuracy than the less complex original decision trees. This result raises considerable doubt about the utility of Occam's razor as it is commonly applied in modern machine learning.
Ramírez-Corona, Mallinali (Instituto Nacional de Astrofísica Óptica y Electrónica) | Sucar, L. Enrique (Instituto Nacional de Astrofísica Óptica y Electrónica) | Morales, Eduardo F. (Instituto Nacional de Astrofísica Óptica y Electrónica)
Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (Cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods.
In this study, machine learning models were constructed to predict whether judgments made by the European Court of Human Rights (ECHR) would lead to a violation of an Article in the Convention on Human Rights. The problem is framed as a binary classification task where a judgment can lead to a "violation" or "non-violation" of a particular Article. Using auto-sklearn, an automated algorithm selection package, models were constructed for 12 Articles in the Convention. To train these models, textual features were obtained from the ECHR Judgment documents using N-grams, word embeddings and paragraph embeddings. Additional documents, from the ECHR, were incorporated into the models through the creation of a word embedding (echr2vec) and a doc2vec model. The features obtained using the echr2vec embedding provided the highest cross-validation accuracy for 5 of the Articles. The overall test accuracy, across the 12 Articles, was 68.83%. As far as we could tell, this is the first estimate of the accuracy of such machine learning models using a realistic test set. This provides an important benchmark for future work. As a baseline, a simple heuristic of always predicting the most common outcome in the past was used. The heuristic achieved an overall test accuracy of 86.68% which is 29.7% higher than the models. Again, this was seemingly the first study that included such a heuristic with which to compare model results. The higher accuracy achieved by the heuristic highlights the importance of including such a baseline.