Goto

Collaborating Authors

 Performance Analysis


Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World

arXiv.org Artificial Intelligence

In this paper, we focus on online representation learning in non-stationary environments which may require continuous adaptation of model architecture. We propose a novel online dictionary-learning (sparse-coding) framework which incorporates the addition and deletion of hidden units (dictionary elements), and is inspired by the adult neurogenesis phenomenon in the dentate gyrus of the hippocampus, known to be associated with improved cognitive function and adaptation to new environments. In the online learning setting, where new input instances arrive sequentially in batches, the neuronal-birth is implemented by adding new units with random initial weights (random dictionary elements); the number of new units is determined by the current performance (representation error) of the dictionary, higher error causing an increase in the birth rate. Neuronal-death is implemented by imposing l1/l2-regularization (group sparsity) on the dictionary within the block-coordinate descent optimization at each iteration of our online alternating minimization scheme, which iterates between the code and dictionary updates. Finally, hidden unit connectivity adaptation is facilitated by introducing sparsity in dictionary elements. Our empirical evaluation on several real-life datasets (images and language) as well as on synthetic data demonstrates that the proposed approach can considerably outperform the state-of-art fixed-size (nonadaptive) online sparse coding of Mairal et al. (2009) in the presence of nonstationary data. Moreover, we identify certain properties of the data (e.g., sparse inputs with nearly non-overlapping supports) and of the model (e.g., dictionary sparsity) associated with such improvements.


Probing for sparse and fast variable selection with model-based boosting

arXiv.org Machine Learning

We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g. cross-validation or bootstrap) to find the optimal number of boosting iterations and prevent overfitting. In our proposed approach, we augment the data set with randomly permuted versions of the true variables, so called shadow variables, and stop the step-wise fitting as soon as such a variable would be added to the model. This allows variable selection in a single fit of the model without requiring further parameter tuning. We show that our probing approach can compete with state-of-the-art selection methods like stability selection in a high-dimensional classification benchmark and apply it on gene expression data for the estimation of riboflavin production of Bacillus subtilis.


Case-Study: Better HAAR feature-based Eye Detector using OpenCV ยป CV-Tricks.com

#artificialintelligence

Opencv object detectors which are built using Haar feature-based cascade classifiers is at least a decade old. OpenCV framework provides a default pre-built haar and lbp based cascade classifiers for face and eye detection which are very good quality detectors. However, I had never measured the accuracy of these face and eye detectors. I recently discovered that pre-built haar/lbp cascades have a relatively higher false positive rates which might make them unsuitable for many use-cases. It's possible to build an eye detector with very high accuracy and low false positive rates for many cases with OpenCV.


State-of-the-Art Machine Learning Automation with HDT

@machinelearnbot

The number of "feature values" is the total number of key-value pairs found, including the small unstable ones, regardless as to whether they are classified as good or bad. Any article with a pv above the arbitrary value pv_threshold 7.1 (see source code) is considered as good. This corresponds to articles having about 1.3 times more traffic than average, since we use a log scale and the average pv is 6.81. The traffic for articles classified as good by the algorithm (pv 8.23) is about 4.2 times above the traffic that an average article receives. Also note that we correctly identify the vast majority of good articles, but this is because we work with small nodes. Finally an article is marked as good if it triggers at least one node marked as good (that is, satisfying the criterion defined in the next sub-section.) Besides pv_threshold, the algorithm uses 12 parameters to identify a usable, stable node classified as good.


Improving Efficiency of SVM k -Fold Cross-Validation by Alpha Seeding

AAAI Conferences

The k-fold cross-validation is commonly used to evaluate the effectiveness of SVMs with the selected hyper-parameters. It is known that the SVM k-fold cross-validation is expensive, since it requires training k SVMs. However, little work has explored reusing the h-th SVM for training the (h+1)-th SVM for improving the efficiency of k-fold cross-validation. In this paper, we propose three algorithms that reuse the h-th SVM for improving the efficiency of training the (h+1)-th SVM. Our key idea is to efficiently identify the support vectors and to accurately estimate their associated weights (also called alpha values) of the next SVM by using the previous SVM. Our experimental results show that our algorithms are several times faster than the k-fold cross-validation which does not make use of the previously trained SVM. Moreover, our algorithms produce the same results (hence same accuracy) as the k-fold cross-validation which does not make use of the previously trained SVM.


Cost-Sensitive Feature Selection via F-Measure Optimization Reduction

AAAI Conferences

Feature selection aims to select a small subset from the high-dimensional features which can lead to better learning performance, lower computational complexity, and better model readability. The class imbalance problem has been neglected by traditional feature selection methods, therefore the selected features will be biased towards the majority classes. Because of the superiority of F-measure to accuracy for imbalanced data, we propose to use F-measure as the performance measure for feature selection algorithms. As a pseudo-linear function, the optimization of F-measure can be achieved by minimizing the total costs. In this paper, we present a novel cost-sensitive feature selection (CSFS) method which optimizes F-measure instead of accuracy to take class imbalance issue into account. The features will be selected according to optimal F-measure classifier after solving a series of cost-sensitive feature selection sub-problems. The features selected by our method will fully represent the characteristics of not only majority classes, but also minority classes. Extensive experimental results conducted on synthetic, multi-class and multi-label datasets validate the efficiency and significance of our feature selection method.


Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation

AAAI Conferences

Adverse drug reaction (ADR) is a major burden for patients and healthcare industry. It usually causes preventable hospitalizations and deaths, while associated with a huge amount of cost. Traditional preclinical in vitro safety profiling and clinical safety trials are restricted in terms of small scale, long duration, huge financial costs and limited statistical signifi- cance. The availability of large amounts of drug and ADR data potentially allows ADR predictions during the drugsโ€™ early preclinical stage with data analytics methods to inform more targeted clinical safety tests. Despite their initial success, existing methods have trade-offs among interpretability, predictive power and efficiency. This urges us to explore methods that could have all these strengths and provide practical solutions for real world ADR predictions. We cast the ADR-drug relation structure into a three-layer hierarchical Bayesian model. We interpret each ADR as a symbolic word and apply latent Dirichlet allocation (LDA) to learn topics that may represent certain biochemical mechanism that relates ADRs with drug structures. Based on LDA, we designed an equivalent regularization term to incorporate the hierarchical ADR domain knowledge. Finally, we developed a mixed input model leveraging a fast collapsed Gibbs sampling method that the complexity of each iteration of Gibbs sampling proportional only to the number of positive ADRs. Experiments on real world data show our models achieved higher prediction accuracy and shorter running time than the state-of-the-art alternatives.


Bootstrapping Distantly Supervised IE Using Joint Learning and Small Well-Structured Corpora

AAAI Conferences

We propose a framework to improve the performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction. We further extend this framework to make a novel use of document structure: in some small, well-structured corpora, sections can be identified that correspond to relation arguments, and distantly-labeled examples from such sections tend to have good precision. Using these as seeds we extract additional relation examples by applying label propagation on a graph composed of noisy examples extracted from a large unstructured testing corpus. Combined with the soft constraint that concept examples should have the same type as the second argument of the relation, we get significant improvements over several state-of-the-art approaches to distantly-supervised relation extraction, and reasonable extraction performance even with very small set of distant labels.


Multimodal Fusion of EEG and Musical Features in Music-Emotion Recognition

AAAI Conferences

Multimodality has been recently exploited to overcome the challenges of emotion recognition. In this paper, we present a study of fusion of electroencephalogram (EEG) features and musical features extracted from musical stimuli at decision level in recognizing the time-varying binary classes of arousal and valence. Our empirical results demonstrate that EEG modality was suffered from the non-stability of EEG signals, yet fusing with music modality could alleviate the issue and enhance the performance of emotion recognition.


Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

AAAI Conferences

In many reinforcement learning applications, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data. We empirically evaluate the proposed methods in a standard policy evaluation tasks.