AITopics

1011.0233

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

arXiv.org Machine LearningNov-3-2010

The Lasso under Heteroscedasticity

Jia, Jinzhu, Rohe, Karl, Yu, Bin

The performance of the Lasso is well understood under the assumptions of the standard linear model with homoscedastic noise. However, in several applications, the standard model does not describe the important features of the data. This paper examines how the Lasso performs on a non-standard model that is motivated by medical imaging applications. In these applications, the variance of the noise scales linearly with the expectation of the observation. Like all heteroscedastic models, the noise terms in this Poisson-like model are \textit{not} independent of the design matrix. More specifically, this paper studies the sign consistency of the Lasso under a sparse Poisson-like model. In addition to studying sufficient conditions for the sign consistency of the Lasso estimate, this paper also gives necessary conditions for sign consistency. Both sets of conditions are comparable to results for the homoscedastic model, showing that when a measure of the signal to noise ratio is large, the Lasso performs well on both Poisson-like data and homoscedastic data. Simulations reveal that the Lasso performs equally well in terms of model selection performance on both Poisson-like data and homoscedastic data (with properly scaled noise variance), across a range of parameterizations. Taken as a whole, these results suggest that the Lasso is robust to the Poisson-like heteroscedastic noise.

artificial intelligence, health & medicine, lasso, (18 more...)

1011.1026

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Health Care Technology (0.34)
Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Balakrishnan, Julie M. David And Kannan

Significance of Classification Techniques in Prediction of Learning Disabilities

arXiv.org Artificial IntelligenceNov-2-2010

The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified.

decision tree learning, expert system, focused education, (19 more...)

doi: 10.5121/ijaia.2010.1409

1011.0628

Country: Asia > India (0.49)

Industry: Education > Focused Education > Special Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.93)
(2 more...)

Chojnacki, Szymon, Kłopotek, Mieczysław

Random Graph Generator for Bipartite Networks Modeling

arXiv.org Artificial IntelligenceNov-2-2010

The purpose of this article is to introduce a new iterative algorithm with properties resembling real life bipartite graphs. The algorithm enables us to generate wide range of random bigraphs, which features are determined by a set of parameters.We adapt the advances of last decade in unipartite complex networks modeling to the bigraph setting. This data structure can be observed in several situations. However, only a few datasets are freely available to test the algorithms (e.g. community detection, influential nodes identification, information retrieval) which operate on such data. Therefore, artificial datasets are needed to enhance development and testing of the algorithms. We are particularly interested in applying the generator to the analysis of recommender systems. Therefore, we focus on two characteristics that, besides simple statistics, are in our opinion responsible for the performance of neighborhood based collaborative filtering algorithms. The features are node degree distribution and local clustering coeficient.

artificial intelligence, degree distribution, node, (16 more...)

1010.5943

Country: North America > United States > New York > New York County > New York City (0.14)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)

Bien, Jacob, Xu, Ya, Mahoney, Michael W.

CUR from a Sparse Optimization Viewpoint

arXiv.org Machine LearningNov-1-2010

The CUR decomposition provides an approximation of a matrix $X$ that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of $X$. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.

cur, health & medicine, optimization problem, (18 more...)

1011.0413

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Nikulin, Vladimir, Huang, Tian-Hsiang, Ng, Shu-Kay, Rathnayake, Suren I, McLachlan, Geoffrey J

A Very Fast Algorithm for Matrix Factorization

arXiv.org Machine LearningNov-1-2010

We present a very fast algorithm for general matrix factorization of a data matrix for use in the statistical analysis of high-dimensional data via latent factors. Such data are prevalent across many application areas and generate an ever-increasing demand for methods of dimension reduction in order to undertake the statistical analysis of interest. Our algorithm uses a gradient-based approach which can be used with an arbitrary loss function provided the latter is differentiable. The speed and effectiveness of our algorithm for dimension reduction is demonstrated in the context of supervised classification of some real high-dimensional data sets from the bioinformatics literature.

health & medicine, matrix, oncology, (18 more...)

1011.0506

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.97)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Mossakowski, Till, Moratz, Reinhard

Qualitative Reasoning about Relative Direction on Adjustable Levels of Granularity

arXiv.org Artificial IntelligenceOct-30-2010

An important issue in Qualitative Spatial Reasoning is the representation of relative direction. In this paper we present simple geometric rules that enable reasoning about relative direction between oriented points. This framework, the Oriented Point Algebra OPRA_m, has a scalable granularity m. We develop a simple algorithm for computing the OPRA_m composition tables and prove its correctness. Using a composition table, algebraic closure for a set of OPRA statements is sufficient to solve spatial navigation tasks. And it turns out that scalable granularity is useful in these navigation tasks.

constraint-based reasoning, relation, spatial reasoning, (14 more...)

1011.0098

Country:

North America > United States (0.28)
Europe > Germany > Bremen > Bremen (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.94)

Scheinberg, Katya, Ma, Shiqian, Goldfarb, Donald

Sparse Inverse Covariance Selection via Alternating Linearization Methods

arXiv.org Machine LearningOct-30-2010

Gaussian graphical models are of great interest in statistical learning. Because the conditional independencies between different nodes correspond to zero entries in the inverse covariance matrix of the Gaussian distribution, one can learn the structure of the graph by estimating a sparse inverse covariance matrix from sample data, by solving a convex maximum likelihood problem with an $\ell_1$-regularization term. In this paper, we propose a first-order method based on an alternating linearization technique that exploits the problem's special structure; in particular, the subproblems solved in each iteration have closed-form solutions. Moreover, our algorithm obtains an $\epsilon$-optimal solution in $O(1/\epsilon)$ iterations. Numerical experiments on both synthetic and real data from gene association networks show that a practical version of this algorithm outperforms other competitive algorithms.

algorithm, health & medicine, optimization problem, (16 more...)

1011.0097

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Machine LearningOct-30-2010

Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser

Cornec, Matthieu

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for empirical risk minimizers. In the general setting, we prove sanity-check bounds in the spirit of \cite{KR99} \textquotedblleft\textit{bounds showing that the worst-case error of this estimate is not much worse that of training error estimate} \textquotedblright . General loss functions and class of predictors with finite VC-dimension are considered. We closely follow the formalism introduced by \cite{DUD03} to cover a large variety of cross-validation procedures including leave-one-out cross-validation, $k$% -fold cross-validation, hold-out cross-validation (or split sample), and the leave-$\upsilon$-out cross-validation. In particular, we focus on proving the consistency of the various cross-validation procedures. We point out the interest of each cross-validation procedure in terms of rate of convergence. An estimation curve with transition phases depending on the cross-validation procedure and not only on the percentage of observations in the test sample gives a simple rule on how to choose the cross-validation. An interesting consequence is that the size of the test sample is not required to grow to infinity for the consistency of the cross-validation procedure.

artificial intelligence, health & medicine, test sample, (16 more...)

1011.0096

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Bombini, Grazia, Ros, Raquel, Ferilli, Stefano, de Mantaras, Ramon Lopez

Analysing the behaviour of robot teams through relational sequential pattern mining

arXiv.org Artificial IntelligenceOct-29-2010

This report outlines the use of a relational representation in a Multi-Agent domain to model the behaviour of the whole system. A desired property in this systems is the ability of the team members to work together to achieve a common goal in a cooperative manner. The aim is to define a systematic method to verify the effective collaboration among the members of a team and comparing the different multi-agent behaviours. Using external observations of a Multi-Agent System to analyse, model, recognize agent behaviour could be very useful to direct team actions. In particular, this report focuses on the challenge of autonomous unsupervised sequential learning of the team's behaviour from observations. Our approach allows to learn a symbolic sequence (a relational representation) to translate raw multi-agent, multi-variate observations of a dynamic, complex environment, into a set of sequential behaviours that are characteristic of the team in question, represented by a set of sequences expressed in first-order logic atoms. We propose to use a relational learning algorithm to mine meaningful frequent patterns among the relational sequences to characterise team behaviours. We compared the performance of two teams in the RoboCup four-legged league environment, that have a very different approach to the game. One uses a Case Based Reasoning approach, the other uses a pure reactive behaviour.

artificial intelligence, direction view, soccer, (20 more...)

1010.6234

Country: Europe > Italy (0.14)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Games (0.83)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)