Accuracy
A Differentiable Ranking Metric Using Relaxed Sorting Opeartion for Top-K Recommender Systems
Lee, Hyunsung, Jang, Yeongjae, Kim, Jaekwang, Woo, Honguk
A recommender system generates personalized recommendations for a user by computing the preference score of items, sorting the items according to the score, and filtering the top-Kitemswith high scores. While sorting and ranking items are integral for this recommendation procedure,it is nontrivial to incorporate them in the process of end-to-end model training since sorting is non-differentiable and hard to optimize with gradient-based updates. This incurs the inconsistency issue between the existing learning objectives and ranking-based evaluation metrics of recommendation models. In this work, we present DRM (differentiable ranking metric) that mitigates the inconsistency and improves recommendation performance, by employing the differentiable relaxation of ranking-based evaluation metrics. Via experiments with several real-world datasets, we demonstrate that the joint learning of the DRM cost function upon existing factor based recommendation models significantly improves the quality of recommendations, in comparison with other state-of-the-art recommendation methods.
Antifragility Predicts the Robustness and Evolvability of Biological Networks through Multi-class Classification with a Convolutional Neural Network
Kim, Hyobin, Muñoz, Stalin, Osuna, Pamela, Gershenson, Carlos
Robustness and evolvability are essential properties to the evolution of biological networks. To determine if a biological network is robust and/or evolvable, it is required to compare its functions before and after mutations. However, this sometimes takes a high computational cost as the network size grows. Here we develop a predictive method to estimate the robustness and evolvability of biological networks without an explicit comparison of functions. We measure antifragility in Boolean network models of biological systems and use this as the predictor. Antifragility occurs when a system benefits from external perturbations. By means of the differences of antifragility between the original and mutated biological networks, we train a convolutional neural network (CNN) and test it to classify the properties of robustness and evolvability. We found that our CNN model successfully classified the properties. Thus, we conclude that our antifragility measure can be used as a predictor of the robustness and evolvability of biological networks.
A Heaviside Function Approximation for Neural Network Binary Classification
Tsoi, Nathan, Milkessa, Yofti, Vázquez, Marynel
Neural network binary classifiers are often evaluated on metrics like accuracy and $F_1$-Score, which are based on confusion matrix values (True Positives, False Positives, False Negatives, and True Negatives). However, these classifiers are commonly trained with a different loss, e.g. log loss. While it is preferable to perform training on the same loss as the evaluation metric, this is difficult in the case of confusion matrix based metrics because set membership is a step function without a derivative useful for backpropagation. To address this challenge, we propose an approximation of the step function that adheres to the properties necessary for effective training of binary networks using confusion matrix based metrics. This approach allows for end-to-end training of binary deep neural classifiers via batch gradient descent. We demonstrate the flexibility of this approach in several applications with varying levels of class imbalance. We also demonstrate how the approximation allows balancing between precision and recall in the appropriate ratio for the task at hand.
Detecting Parkinson's Disease from Speech-task in an accessible and interpretable manner
Rahman, Wasifur, Lee, Sangwu, Islam, Md. Saiful, Mamun, Abdullah Al, Antony, Victor, Ratnu, Harshil, Ali, Mohammad Rafayet, Hoque, Ehsan
Every nine minutes a person is diagnosed with Parkinson's Disease (PD) in the United States. However, studies have shown that between 25 and 80\% of individuals with Parkinson's Disease (PD) remain undiagnosed. An online, in the wild audio recording application has the potential to help screen for the disease if risk can be accurately assessed. In this paper, we collect data from 726 unique subjects (262 PD and 464 Non-PD) uttering the "quick brown fox jumps over the lazy dog ...." to conduct automated PD assessment. We extracted both standard acoustic features and deep learning based embedding features from the speech data and trained several machine learning algorithms on them. Our models achieved 0.75 AUC by modeling the standard acoustic features through the XGBoost model. We also provide explanation behind our model's decision and show that it is focusing mostly on the widely used MFCC features and a subset of dysphonia features previously used for detecting PD from verbal phonation task.
Programming Fairness in Algorithms
Being good is easy, what is difficult is being just. We need to defend the interests of those whom we've never met and never will. Note: This article is intended for a general audience to try and elucidate the complicated nature of unfairness in machine learning algorithms. As such, I have tried to explain concepts in an accessible way with minimal use of mathematics, in the hope that everyone can get something out of reading this. Supervised machine learning algorithms are inherently discriminatory. They are discriminatory in the sense that they use information embedded in the features of data to separate instances into distinct categories -- indeed, this is their designated purpose in life. This is reflected in the name for these algorithms which are often referred to as discriminative algorithms (splitting data into categories), in contrast to generative algorithms (generating data from a given category). When we use supervised machine learning, this "discrimination" is used as an aid to help us categorize our data into distinct categories within the data distribution, as illustrated below. Whilst this occurs when we apply discriminative algorithms -- such as support vector machines, forms of parametric regression (e.g. For example, using last week's weather data to try and predict the weather tomorrow has no moral valence attached to it.
More is not Always Better: The Negative Impact of A-box Materialization on RDF2vec Knowledge Graph Embeddings
Iana, Andreea, Paulheim, Heiko
RDF2vec is an embedding technique for representing knowledge graph entities in a continuous vector space. In this paper, we investigate the effect of materializing implicit A-box axioms induced by subproperties, as well as symmetric and transitive properties. While it might be a reasonable assumption that such a materialization before computing embeddings might lead to better embeddings, we conduct a set of experiments on DBpedia which demonstrate that the materialization actually has a negative effect on the performance of RDF2vec. In our analysis, we argue that despite the huge body of work devoted on completing missing information in knowledge graphs, such missing implicit information is actually a signal, not a defect, and we show examples illustrating that assumption.
Practical Cross-modal Manifold Alignment for Grounded Language
Nguyen, Andre T., Richards, Luke E., Kebe, Gaoussou Youssouf, Raff, Edward, Darvish, Kasra, Ferraro, Frank, Matuszek, Cynthia
We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Procrustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.
Improved Weighted Random Forest for Classification Problems
Shahhosseini, Mohsen, Hu, Guiping
Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of the base models. Of the most common solutions for introducing diversity into the decision trees are bagging and random forest. Bagging enhances the diversity by sampling with replacement and generating many training data sets, while random forest adds selecting a random number of features as well. This has made the random forest a winning candidate for many machine learning applications. However, assuming equal weights for all base decision trees does not seem reasonable as the randomization of sampling and input feature selection may lead to different levels of decision-making abilities across base decision trees. Therefore, we propose several algorithms that intend to modify the weighting strategy of regular random forest and consequently make better predictions. The designed weighting frameworks include optimal weighted random forest based on ac-curacy, optimal weighted random forest based on the area under the curve (AUC), performance-based weighted random forest, and several stacking-based weighted random forest models. The numerical results show that the proposed models are able to introduce significant improvements compared to regular random forest.
Time-Varying Parameters as Ridge Regressions
Time-varying parameters (TVPs) models are frequently used in economics to model structural change. I show that they are in fact ridge regressions. Instantly, this makes computations, tuning, and implementation much easier than in the state-space paradigm. Among other things, solving the equivalent dual ridge problem is computationally very fast even in high dimensions, and the crucial "amount of time variation" is tuned by cross-validation. Evolving volatility is dealt with using a two-step ridge regression. I consider extensions that incorporate sparsity (the algorithm selects which parameters vary and which do not) and reduced-rank restrictions (variation is tied to a factor model). To demonstrate the usefulness of the approach, I use it to study the evolution of monetary policy in Canada. The application requires the estimation of about 4600 TVPs, a task well within the reach of the new method.
Machine learning for cybersecurity: only as effective as your implementation
We recently launched Elastic Security, combining the threat hunting and analytics tools from Elastic SIEM with the prevention and response features of Elastic Endpoint Security. This combined solution focuses on detecting and flexibly responding to security threats, with machine learning providing core capabilities for real-time protections, detections, and interactive hunting. But why are machine learning tools so important in information security? How is machine learning being applied? In this first of a two-part blog series, we'll motivate the "why" and explore the "how," highlighting malware prevention via supervised machine learning in Elastic Endpoint Security.