AITopics

doi: 10.1016/j.hlpt.2012.03.001

1204.4927

Country:

North America > United States > Tennessee (0.35)
North America > United States > Indiana (0.34)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Biomedical Informatics > Clinical Informatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Allen, Genevera I., Peterson, Christine, Vannucci, Marina, Maletic-Savatic, Mirjana

Regularized Partial Least Squares with an Application to NMR Spectroscopy

arXiv.org Machine LearningApr-17-2012

Department of Statistics, Rice University Abstract High-dimensional data common in genomics, proteomics, and chemometrics often contains complicated correlation structures. Recently, partial least squares (PLS) and Sparse PLS methods have gained attention in these areas as dimension reduction techniques in the context of supervised data analysis. We introduce a framework for Regularized PLS by solving a relaxation of the SIMPLS optimization problem with penalties on the PLS loadings vectors. Our approach enjoys many advantages including flexibility, general penalties, easy interpretation of results, and fast computation in high-dimensional settings. We also outline extensions of our methods leading to novel methods for Nonnegative PLS and Generalized PLS, an adaption of PLS for structured data. We demonstrate the utility of our methods through simulations and a case study on proton Nuclear Magnetic Resonance (NMR) spectroscopy data. To whom correspondence should be addressed; Department of Statistics, Rice University, MS 138, 6100 Main St., Houston, TX 77005 (email: gallen@rice.edu) 1 Introduction Technologies to measure high-throughput biomedical data in proteomics, chemometrics, and genomics have led to a proliferation of high-dimensional data that pose many statistical challenges. As genes, proteins, and metabolites, are biologically interconnected, the variables in these data sets are often highly correlated. In this context, several have recently advocated using partial least squares (PLS) for dimension reduction of supervised data, or data with a response or labels (Nguyen and Rocke, 2002b; Boulesteix and Strimmer, 2007; Rossouw et al., 2008; Chun and Keleş, 2010). First introduced by Wold (1966) as a regression method that uses least squares on a set of derived inputs accounting for multi-colinearities, others have since proposed alternative methods for PLS with multiple responses (de Jong, 1993) and for classification (Marx, 1996; Barker and Rayens, 2003).

artificial intelligence, loading, machine learning, (18 more...)

1204.3942

Country: North America > United States > Texas > Harris County > Houston (0.24)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

AAAI ConferencesMar-25-2012

Harnessing the Crowds for Automating the Identification of Web APIs

Pedrinaci, Carlos (The Open University) | Liu, Dong (The Open University) | Lin, Chenghua (The Open University) | Domingue, John (The Open University)

Supporting the efficient discovery and use of Web APIs is increasingly important as their use and popularity grows. Yet, a simple task like finding potentially interesting APIs and their related documentation turns out to be hard and time consuming even when using the best resources currently available on the Web. In this paper we describe our research towards an automated Web API documentation crawler and search engine. This paper presents two main contributions. First, we have devised and exploited crowdsourcing techniques to generate a curated dataset of Web APIs documentation. Second, thanks to this dataset, we have devised an engine able to automatically detect documentation pages. Our preliminary experiments have shown that we obtain an accuracy of 80% and a precision increase of 15 points over a keyword-based heuristic we have used as baseline.

api, dataset, documentation, (17 more...)

2012 AAAI Spring Symposium Series

Country:

Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Europe > United Kingdom > England > Buckinghamshire > Milton Keynes (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Information Management > Search (0.88)
Information Technology > Communications > Web > Semantic Web (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

AAAI ConferencesMar-25-2012

Meditation Training and Neurofeedback Using a Personal EEG Device

Dixit, Rohan (BrainBot)

Baseline and meditation data was obtained from 31 longterm meditation practitioners using the single-sensor right Over the past several years, a host of simple consumer prefrontal EEG system produced by Neurosky, Inc. Each electroencephalography (EEG) devices have been released subject was asked to complete a 5 minute resting period in at relatively inexpensive price points. These devices allow which they were asked to close their eyes and let their single or multi-channel recording of EEG, generally mind wander (without meditating). This was followed by employing user-friendly design, e.g.

dataset, meditation, meditation training and neurofeedback, (8 more...)

2012 AAAI Spring Symposium Series

Country: North America > United States > California > San Francisco County > San Francisco (0.17)

Industry: Health & Medicine > Therapeutic Area (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.31)

arXiv.org Machine LearningMar-15-2012

Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario

Yan, Yan, Rosales, Romer, Fung, Glenn, Dy, Jennifer

Learning algorithms normally assume that there is at most one annotation or label per data point. However, in some scenarios, such as medical diagnosis and on-line collaboration,multiple annotations may be available. In either case, obtaining labels for data points can be expensive and time-consuming (in some circumstances ground-truth may not exist). Semi-supervised learning approaches have shown that utilizing the unlabeled data is often beneficial in these cases. This paper presents a probabilistic semi-supervised model and algorithm that allows for learning from both unlabeled and labeled data in the presence of multiple annotators. We assume that it is known what annotator labeled which data points. The proposed approach produces annotator models that allow us to provide (1) estimates of the true label and (2) annotator variable expertise for both labeled and unlabeled data. We provide numerical comparisons under various scenarios and with respect to standard semi-supervised learning. Experiments showed that the presented approach provides clear advantages over multi-annotator methods that do not use the unlabeled data and over methods that do not use multi-labeler information.

annotator, artificial intelligence, machine learning, (17 more...)

1203.3529

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)
Health & Medicine > Diagnostic Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Yuan, null, Qi, null, Abdel-Gawad, Ahmed H., Minka, Thomas P.

Sparse-posterior Gaussian Processes for general likelihoods

arXiv.org Machine LearningMar-15-2012

Gaussian processes (GPs) provide a probabilistic nonparametric representation of functions in regression, classification, and other problems. Unfortunately, exact learning with GPs is intractable for large datasets. A variety of approximate GP methods have been proposed that essentially map the large dataset into a small set of basis points. Among them, two state-of-the-art methods are sparse pseudo-input Gaussian process (SPGP) (Snelson and Ghahramani, 2006) and variablesigma GP (VSGP) Walder et al. (2008), which generalizes SPGP and allows each basis point to have its own length scale. However, VSGP was only derived for regression. In this paper, we propose a new sparse GP framework that uses expectation propagation to directly approximate general GP likelihoods using a sparse and smooth basis. It includes both SPGP and VSGP for regression as special cases. Plus as an EP algorithm, it inherits the ability to process data online. As a particular choice of approximating family, we blur each basis point with a Gaussian distribution that has a full covariance matrix representing the data distribution around that basis point; as a result, we can summarize local data manifold information with a small set of basis points. Our experiments demonstrate that this framework outperforms previous GP classification methods on benchmark datasets in terms of minimizing divergence to the non-sparse GP solution as well as lower misclassification rate.

artificial intelligence, basis point, machine learning, (17 more...)

1203.3507

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Cederborg, Thomas, Oudeyer, Pierre-Yves

Imitation learning of motor primitives and language bootstrapping in robots

arXiv.org Artificial IntelligenceMar-11-2012

Imitation learning in robots, also called programing by demonstration, has made important advances in recent years, allowing humans to teach context dependant motor skills/tasks to robots. We propose to extend the usual contexts investigated to also include acoustic linguistic expressions that might denote a given motor skill, and thus we target joint learning of the motor skills and their potential acoustic linguistic name. In addition to this, a modification of a class of existing algorithms within the imitation learning framework is made so that they can handle the unlabeled demonstration of several tasks/motor primitives without having to inform the imitator of what task is being demonstrated or what the number of tasks are, which is a necessity for language learning, i.e; if one wants to teach naturally an open number of new motor skills together with their acoustic names. Finally, a mechanism for detecting whether or not linguistic input is relevant to the task is also proposed, and our architecture also allows the robot to find the right framing for a given identified motor primitive. With these additions it becomes possible to build an imitator that bridges the gap between imitation learning and language learning by being able to learn linguistic expressions using methods from the imitation learning community. In this sense the imitator can learn a word by guessing whether a certain speech pattern present in the context means that a specific task is to be executed. The imitator is however not assumed to know that speech is relevant and has to figure this out on its own by looking at the demonstrations: indeed, the architecture allows the robot to transparently also learn tasks which should not be triggered by an acoustic word, but for example by the color or position of an object or a gesture made by someone in the environment. To demonstrate this ability to find the ...

artificial intelligence, machine learning, motor primitive and language, (2 more...)

arXiv.org Artificial Intelligence

1011.033

Genre: Research Report (0.40)

Industry: Education > Curriculum > Subject-Specific Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

AAAI ConferencesFeb-22-2012

Towards Analyzing Micro-Blogs for Detection and Classification of Real-Time Intentions

Banerjee, Nilanjan (IBM Research - India) | Chakraborty, Dipanjan (IBM Research - India) | Joshi, Anupam (IBM Research - India) | Mittal, Sumit (IBM Research - India, New Delhi) | Rai, Angshu (IBM Research - India) | Ravindran, Balaraman (Indian Institute of Technology, Madras)

Micro-blog forums, such as Twitter, constitute a powerful medium today that people use to express their thoughts and intentions on a daily, and in many cases, hourly, basis. Extracting ‘Real-Time Intention’ (RTI) of a user from such short text updates is a huge opportunity towards web personalization and social net- working around dynamic user context. In this paper, we explore the novel problem of detecting and classifying RTIs from micro-blogs. We find that employing a heuristic based ensemble approach on a reduced dimension of the feature space, based on a wide spectrum of linguistic and statistical features of RTI expressions, achieves significant improvement in detect- ing RTIs compared to word-level features used in many social media classification tasks today. Our solution approach takes into account various salient characteristics of micro-blogs towards such classification – high dimensionality, sparseness of data, limited context, grammatical in-correctness, etc.

classification, extractor, intention, (12 more...)

Sixth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)

Industry: Information Technology (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)

AAAI ConferencesFeb-22-2012

Network Sampling Designs for Relational Classification

Ahmed, Nesreen K. (Purdue University) | Neville, Jennifer (Purdue University) | Kompella, Ramana (Purdue University)

Relational classification has been extensively studied recently due to its applications in social, biological, technological, and information networks. Much of the work in relational learning has focused on analyzing input data that comprise a single network. Although machine learning researchers have considered the issue of how to sample training and test sets from the input network (for evaluation), the mechanisms which are used to construct the input networks have largely been ignored. In most cases, the input network has itself been sampled from a larger target network (e.g., Facebook) and often the researcher is unaware of how the input network was constructed or what impact that may have on evaluation of the relational models. Since the goal in evaluating relational classification algorithms is to accurately assess their performance on the larger target network, it is critical to understand what impact the initial sampling method may have on our estimates of classification accuracy.In this paper, we present different sampling methods and systematically study their impact on evaluation of relational classification. Our results indicate that the choice of sampling method can impact classification performance, and thus consequently affects the accuracy of evaluation.

algorithm, classifier, node, (15 more...)

Sixth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)

Genre: Research Report > New Finding (0.35)

Industry: Information Technology > Services (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

AAAI ConferencesFeb-22-2012

Unsupervised Real-Time Company Name Disambiguation in Twitter

Muñoz, Agustín D. Delgado (UNED University) | Unanue, Raquel Martínez (UNED University) | García-Plaza, Alberto Pérez (UNED University) | Fresno, Víctor (UNED University)

This paper presents a new approach to disambiguate company names in the Twitter social network. We have focused on making lighter the processing of comparing company profiles with tweets in order to obtain a competitive real-time system. With this aim, we only use the home page of each company as information source to create a unique profile. On the other hand, we compute the similarity of a tweet in connection to a profile by comparing the content of the tweet with the profile. Both steps do not use any other external information source and all the process is developed in an unsupervised way. We have tested our application with the test WePS-3 CLEF ORM corpus obtaining encouraging results.

machine learning, real time system, tweet, (18 more...)

Sixth International AAAI Conference on Weblogs and Social Media

Country:

Europe > Spain > Galicia > Madrid (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.95)
(3 more...)