AITopics

1206.6381

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Thalhammer, Andreas, Toma, Ioan, Roa-Valverde, Antonio, Fensel, Dieter

Leveraging Usage Data for Linked Data Movie Entity Summarization

arXiv.org Artificial IntelligenceApr-12-2012

Novel research in the field of Linked Data focuses on the problem of entity summarization. This field addresses the problem of ranking features according to their importance for the task of identifying a particular entity. Next to a more human friendly presentation, these summarizations can play a central role for semantic search engines and semantic recommender systems. In current approaches, it has been tried to apply entity summarization based on patterns that are inherent to the regarded data. The proposed approach of this paper focuses on the movie domain. It utilizes usage data in order to support measuring the similarity between movie entities. Using this similarity it is possible to determine the k-nearest neighbors of an entity. This leads to the idea that features that entities share with their nearest neighbors can be considered as significant or important for these entities. Additionally, we introduce a downgrading factor (similar to TF-IDF) in order to overcome the high number of commonly occurring features. We exemplify the approach based on a movie-ratings dataset that has been linked to Freebase entities.

information management, summarization, text processing, (20 more...)

1204.2718

Country:

Europe (1.00)
North America > United States (0.47)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.71)
(2 more...)

Sricharan, Kumar, Raich, Raviv, Hero, Alfred O. III

Empirical estimation of entropy functionals with confidence

arXiv.org Machine LearningFeb-25-2012

This paper introduces a class of k-nearest neighbor ($k$-NN) estimators called bipartite plug-in (BPI) estimators for estimating integrals of non-linear functions of a probability density, such as Shannon entropy and R\'enyi entropy. The density is assumed to be smooth, have bounded support, and be uniformly bounded from below on this set. Unlike previous $k$-NN estimators of non-linear density functionals, the proposed estimator uses data-splitting and boundary correction to achieve lower mean square error. Specifically, we assume that $T$ i.i.d. samples ${X}_i \in \mathbb{R}^d$ from the density are split into two pieces of cardinality $M$ and $N$ respectively, with $M$ samples used for computing a k-nearest-neighbor density estimate and the remaining $N$ samples used for empirical estimation of the integral of the density functional. By studying the statistical properties of k-NN balls, explicit rates for the bias and variance of the BPI estimator are derived in terms of the sample size, the dimension of the samples and the underlying probability distribution. Based on these results, it is possible to specify optimal choice of tuning parameters $M/T$, $k$ for maximizing the rate of decrease of the mean square error (MSE). The resultant optimized BPI estimator converges faster and achieves lower mean squared error than previous $k$-NN entropy estimators. In addition, a central limit theorem is established for the BPI estimator that allows us to specify tight asymptotic confidence intervals.

artificial intelligence, estimator, machine learning, (19 more...)

1012.4188

Country: North America > United States (0.27)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.94)

Nikolic, Mladen, Maric, Filip, Janicic, Predrag

Simple Algorithm Portfolio for SAT

arXiv.org Artificial IntelligenceDec-13-2011

The importance of algorithm portfolio techniques for SAT has long been noted, and a number of very successful systems have been devised, including the most successful one --- SATzilla. However, all these systems are quite complex (to understand, reimplement, or modify). In this paper we propose a new algorithm portfolio for SAT that is extremely simple, but in the same time so efficient that it outperforms SATzilla. For a new SAT instance to be solved, our portfolio finds its k-nearest neighbors from the training set and invokes a solver that performs the best at those instances. The main distinguishing feature of our algorithm portfolio is the locality of the selection procedure --- the selection of a SAT solver is based only on few instances similar to the input one.

artificial intelligence, machine learning, solver, (16 more...)

1107.0268

Country:

Europe > Serbia (0.29)
North America > Canada > British Columbia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.35)

Zhou, Zhi-Hua, Zhang, Min-Ling, Huang, Sheng-Jun, Li, Yu-Feng

Multi-Instance Multi-Label Learning

arXiv.org Artificial IntelligenceOct-23-2011

In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Considering that the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly.

algorithm, decision tree learning, inductive learning, (20 more...)

doi: 10.1016/j.artint.2011.10.002

0808.3231

Country:

Asia (1.00)
North America > United States > Massachusetts > Middlesex County (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(3 more...)

Industry:

Information Technology > Security & Privacy (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

arXiv.org Machine LearningSep-26-2011

Unsupervised K-Nearest Neighbor Regression

Kramer, Oliver

In many scientific disciplines structures in high-dimensional data have to be found, e.g., in stellar spectra, in genome data, or in face recognition tasks. In this work we present a novel approach to non-linear dimensionality reduction. It is based on fitting K-nearest neighbor regression to the unsupervised regression framework for learning of low-dimensional manifolds. Similar to related approaches that are mostly based on kernel methods, unsupervised K-nearest neighbor (UNN) regression optimizes latent variables w.r.t. the data space reconstruction error employing the K-nearest neighbor heuristic. The problem of optimizing latent neighborhoods is difficult to solve, but the UNN formulation allows the design of efficient strategies that iteratively embed latent points to fixed neighborhood topologies. UNN is well appropriate for sorting of high-dimensional data. The iterative variants are analyzed experimentally.

artificial intelligence, machine learning, regression, (15 more...)

1107.36

Country:

North America > United States (0.15)
Europe > Germany (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

AAAI ConferencesJul-15-2010

Multi-Label Classification: Inconsistency and Class Balanced K-Nearest Neighbor

Wang, Hua (University of Texas at Arlington) | Ding, Chris (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)

Many existing approaches employ one-vs-rest method to decompose a multi-label classification problem into a set of 2- class classification problems, one for each class. This method is valid in traditional single-label classification, it, however, incurs training inconsistency in multi-label classification, because in the latter a data point could belong to more than one class. In order to deal with this problem, in this work, we further develop classicalK-Nearest Neighbor classifier and propose a novel Class Balanced K-Nearest Neighbor approach for multi-label classification by emphasizing balanced usage of data from all the classes. In addition, we also propose a Class Balanced Linear Discriminant Analysis approach to address high-dimensional multi-label input data. Promising experimental results on three broadly used multi-label data sets demonstrate the effectiveness of our approach.

artificial intelligence, machine learning, multi-label classification, (11 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.93)

Maier, Markus, Hein, Matthias, von Luxburg, Ulrike

Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

arXiv.org Machine LearningDec-17-2009

We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest neighbor or symmetric k-nearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order log n) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest neighbor graph occurs when one attempts to detect the most significant cluster only.

artificial intelligence, graph, machine learning, (17 more...)

doi: 10.1016/j.tcs.2009.01.009

0912.3408

Country:

North America > United States > New York (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > United Kingdom > England (0.14)
Europe > Germany > Saarland (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Paoli, Christophe, Voyant, Cyril, Muselli, Marc, Nivet, Marie-Laure

Solar radiation forecasting using ad-hoc time series preprocessing and neural networks

arXiv.org Artificial IntelligenceJun-1-2009

In this paper, we present an application of neural networks in the renewable energy domain. We have developed a methodology for the daily prediction of global solar radiation on a horizontal surface. We use an ad-hoc time series preprocessing and a Multi-Layer Perceptron (MLP) in order to predict solar radiation at daily horizon. First results are promising with nRMSE < 21% and RMSE < 998 Wh/m2. Our optimized MLP presents prediction similar to or even better than conventional methods such as ARIMA techniques, Bayesian inference, Markov chains and k-Nearest-Neighbors approximators. Moreover we found that our data preprocessing approach can reduce significantly forecasting errors.

neural network, prediction, renewable energy, (19 more...)

0906.0311

Country:

Europe > United Kingdom (0.28)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.25)
Europe > France > Corsica (0.14)

Industry: Energy > Renewable > Solar (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

arXiv.org Artificial IntelligenceMay-13-2009

Multi-Instance Learning by Treating Instances As Non-I.I.D. Samples

Zhou, Zhi-Hua, Sun, Yu-Yin, Li, Yu-Feng

Multi-instance learning attempts to learn from a training set consisting of labeled bags each containing many unlabeled instances. Previous studies typically treat the instances in the bags as independently and identically distributed. However, the instances in a bag are rarely independent, and therefore a better performance can be expected if the instances are treated in an non-i.i.d. way that exploits the relations among instances. In this paper, we propose a simple yet effective multi-instance learning method, which regards each bag as a graph and uses a specific kernel to distinguish the graphs by considering the features of the nodes as well as the features of the edges that convey some relations among instances. The effectiveness of the proposed method is validated by experiments.

artificial intelligence, learning, machine learning, (17 more...)

0807.1997

Country:

North America > United States > Nebraska > Lancaster County > Lincoln (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy > Piedmont > Turin Province > Turin (0.14)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)