AITopics

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Wisconsin (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

Bradley, Joseph K., Schapire, Robert E.

FilterBoost: Regression and Classification on Large Datasets

We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm's strong theoretical properties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification.

algorithm, booster, filterboost, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Colorado (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > New Finding (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

Blum, Ben, Baker, David, Jordan, Michael I., Bradley, Philip, Das, Rhiju, Kim, David E.

Rosetta is one of the leading algorithms for protein structure prediction today. It is a Monte Carlo energy minimization method requiring many random restarts to find structures with low energy. In this paper we present a resampling technique for structure prediction of small alpha/beta proteins using Rosetta. From an initial round of Rosetta sampling, we learn properties of the energy landscape that guide a subsequent round of sampling toward lower-energy structures. Rather than attempt to fit the full energy landscape, we use feature selection methods--both L1-regularized linear regression and decision trees--to identify structural features that give rise to low energy. We then enrich these structural features in the second sampling round. Results are presented across a benchmark set of nine small alpha/beta proteins demonstrating that our methods seldom impair, and frequently improve, Rosetta's performance.

prediction, protein, rosetta, (17 more...)

Country:

Asia > Middle East > Jordan (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Szafranski, Marie, Grandvalet, Yves, Morizet-mahoudeaux, Pierre

Hierarchical Penalization

Hierarchical penalization is a generic framework for incorporating prior information in the fitting of statistical models, when the explicative variables are organized in a hierarchical structure. The penalizer is a convex functional that performs soft selection at the group level, and shrinks variables within each group. This favors solutions with few leading terms in the final combination. The framework, originally derived for taking prior knowledge into account, is shown to be useful in linear regression, when several parameters are used to model the influence of one feature, or in kernel regression, for learning multiple kernels. Keywords - Optimization: constrained and convex optimization.

convex, hierarchical penalization, regression, (14 more...)

Country:

Europe > France > Hauts-de-France > Oise > Compiègne (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Wasserman, Larry, Lafferty, John D.

Statistical Analysis of Semi-Supervised Regression

Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning.

assumption, estimator, unlabeled data, (12 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Wisconsin (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

Bradley, Joseph K., Schapire, Robert E.

FilterBoost: Regression and Classification on Large Datasets

We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm's strong theoretical properties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification.

algorithm, booster, filterboost, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Colorado (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > New Finding (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Wasserman, Larry, Lafferty, John D.

Statistical Analysis of Semi-Supervised Regression

Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors.While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus,the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning.

artificial intelligence, estimator, machine learning, (14 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

Strehl, Alexander L., Littman, Michael L.

Online Linear Regression and Its Application to Model-Based Reinforcement Learning

We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a model-based approach and show that a special type of online linear regression allows us to learn MDPs with (possibly kernalized) linearly parameterized dynamics. This result builds on Kearns and Singh's work that provides a provably efficient algorithm for finite state MDPs. Our approach is not restricted to the linear setting, and is applicable to other classes of continuous MDPs.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country: North America > United States (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Ghebreab, Sennay, Smeulders, Arnold, Adriaans, Pieter

Predicting Brain States from fMRI Data: Incremental Functional Principal Component Regression

We propose a method for reconstruction of human brain states directly from functional neuroimaging data. The method extends the traditional multivariate regression analysis of discretized fMRI data to the domain of stochastic functional measurements, facilitating evaluation of brain responses to naturalistic stimuli and boosting the power of functional imaging. The method searches for sets of voxel timecourses that optimize a multivariate functional linear model in terms of Rsquare-statistic. Population based incremental learning is used to search for spatially distributed voxel clusters, taking into account the variation in Haemodynamic lag across brain areas and among subjects by voxel-wise non-linear registration of stimuli to fMRI data. The method captures spatially distributed brain responses to naturalistic stimuli without attempting to localize function. Application of the method for prediction of naturalistic stimuli from new and unknown fMRI data shows that the approach is capable of identifying distributed clusters of brain locations that are highly predictive of a specific stimuli.

artificial intelligence, fmri data, machine learning, (16 more...)

Country: Europe > Netherlands (0.15)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Bradley, Joseph K., Schapire, Robert E.

FilterBoost: Regression and Classification on Large Datasets

We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, requires fewer assumptions to achieve bounds equivalent to or better than previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm's strong theoretical properties forboth classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification.

artificial intelligence, filterboost, machine learning, (18 more...)

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)