logistic loss function
Clinical Deterioration Prediction in Brazilian Hospitals Based on Artificial Neural Networks and Tree Decision Models
Yazdanpanah, Hamed, Silva, Augusto C. M., Guedes, Murilo, Morales, Hugo M. P., Coelho, Leandro dos S., Moro, Fernando G.
Early recognition of clinical deterioration (CD) has vital importance in patients' survival from exacerbation or death. Electronic health records (EHRs) data have been widely employed in Early Warning Scores (EWS) to measure CD risk in hospitalized patients. Recently, EHRs data have been utilized in Machine Learning (ML) models to predict mortality and CD. The ML models have shown superior performance in CD prediction compared to EWS. Since EHRs data are structured and tabular, conventional ML models are generally applied to them, and less effort is put into evaluating the artificial neural network's performance on EHRs data. Thus, in this article, an extremely boosted neural network (XBNet) is used to predict CD, and its performance is compared to eXtreme Gradient Boosting (XGBoost) and random forest (RF) models. For this purpose, 103,105 samples from thirteen Brazilian hospitals are used to generate the models. Moreover, the principal component analysis (PCA) is employed to verify whether it can improve the adopted models' performance. The performance of ML models and Modified Early Warning Score (MEWS), an EWS candidate, are evaluated in CD prediction regarding the accuracy, precision, recall, F1-score, and geometric mean (G-mean) metrics in a 10-fold cross-validation approach. According to the experiments, the XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Singapore (0.05)
- South America > Brazil > Paraná > Curitiba (0.05)
- (8 more...)
Multiple Run Ensemble Learning withLow-Dimensional Knowledge Graph Embeddings
Xu, Chengjin, Nayyeri, Mojtaba, Vahdati, Sahar, Lehmann, Jens
Among the top approaches of recent years, link prediction using knowledge graph embedding (KGE) models has gained significant attention for knowledge graph completion. Various embedding models have been proposed so far, among which, some recent KGE models obtain state-of-the-art performance on link prediction tasks by using embeddings with a high dimension (e.g. 1000) which accelerate the costs of training and evaluation considering the large scale of KGs. In this paper, we propose a simple but effective performance boosting strategy for KGE models by using multiple low dimensions in different repetition rounds of the same model. For example, instead of training a model one time with a large embedding size of 1200, we repeat the training of the model 6 times in parallel with an embedding size of 200 and then combine the 6 separate models for testing while the overall numbers of adjustable parameters are same (6*200=1200) and the total memory footprint remains the same. We show that our approach enables different models to better cope with their expressiveness issues on modeling various graph patterns such as symmetric, 1-n, n-1 and n-n. In order to justify our findings, we conduct experiments on various KGE models. Experimental results on standard benchmark datasets, namely FB15K, FB15K-237 and WN18RR, show that multiple low-dimensional models of the same kind outperform the corresponding single high-dimensional models on link prediction in a certain range and have advantages in training efficiency by using parallel training while the overall numbers of adjustable parameters are same.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Germany > Berlin (0.04)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
- Asia > China (0.04)
Research Guide: Advanced Loss Functions for Machine Learning Models
Logistic loss functions don't perform very well during training when the data in question is very noisy. Such noise can be caused by outliers and mislabeled data. In this paper, Google Brain authors aim to solve the shortcomings of the logistic loss function by replacing the logarithm and exponential functions with their corresponding "tempered" versions. The authors introduce a temperature into the exponential function and replace the softmax output layer of neural nets with a high-temperature generalization. The algorithm used in the log loss is replaced by a low-temperature logarithm.
How Do Machines Learn To Make Predictions?
The craze behind machine learning is fueled by its ability to make predictions that come handy in running a business. This post tries to provide an intuitive understanding of how machines learn to spit out probabilities. Suppose you run a bubble tea shop, and want to create a machine learning model to predict if customers will like pieces of mock coconut in their bubble tea. Step #1: One day you make a note of all the customers who ordered mock coconut pieces, and whether they liked it or not (by rummaging through the garbage can later and checking how many of them actually finished all those white little cubes of industrial waste). This allows you to make a chart like the one below, where 0 represents the customer didn't like the mock coconut pieces, and 1 they really liked it.
Logitron: Perceptron-augmented classification model based on an extended logistic loss function
Classification is the most important process in data analysis. However, due to the inherent non-convex and non-smooth structure of the zero-one loss function of the classification model, various convex surrogate loss functions such as hinge loss, squared hinge loss, logistic loss, and exponential loss are introduced. These loss functions have been used for decades in diverse classification models, such as SVM (support vector machine) with hinge loss, logistic regression with logistic loss, and Adaboost with exponential loss and so on. In this work, we present a Perceptron-augmented convex classification framework, {\it Logitron}. The loss function of it is a smoothly stitched function of the extended logistic loss with the famous Perceptron loss function. The extended logistic loss function is a parameterized function established based on the extended logarithmic function and the extended exponential function. The main advantage of the proposed Logitron classification model is that it shows the connection between SVM and logistic regression via polynomial parameterization of the loss function. In more details, depending on the choice of parameters, we have the Hinge-Logitron which has the generalized $k$-th order hinge-loss with an additional $k$-th root stabilization function and the Logistic-Logitron which has a logistic-like loss function with relatively large $|k|$. Interestingly, even $k=-1$, Hinge-Logitron satisfies the classification-calibration condition and shows reasonable classification performance with low computational cost. The numerical experiment in the linear classifier framework demonstrates that Hinge-Logitron with $k=4$ (the fourth-order SVM with the fourth root stabilization function) outperforms logistic regression, SVM, and other Logitron models in terms of classification accuracy.
- Europe > Switzerland (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.89)
- Research Report > Experimental Study (0.75)
Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods
Freund, Robert M., Grigas, Paul, Mazumder, Rahul
Logistic regression is one of the most popular methods in binary classification, wherein estimation of model parameters is carried out by solving the maximum likelihood (ML) optimization problem, and the ML estimator is defined to be the optimal solution of this problem. It is well known that the ML estimator exists when the data is non-separable, but fails to exist when the data is separable. First-order methods are the algorithms of choice for solving large-scale instances of the logistic regression problem. In this paper, we introduce a pair of condition numbers that measure the degree of non-separability or separability of a given dataset in the setting of binary classification, and we study how these condition numbers relate to and inform the properties and the convergence guarantees of first-order methods. When the training data is non-separable, we show that the degree of non-separability naturally enters the analysis and informs the properties and convergence guarantees of two standard first-order methods: steepest descent (for any given norm) and stochastic gradient descent. Expanding on the work of Bach, we also show how the degree of non-separability enters into the analysis of linear convergence of steepest descent (without needing strong convexity), as well as the adaptive convergence of stochastic gradient descent. When the training data is separable, first-order methods rather curiously have good empirical success, which is not well understood in theory. In the case of separable data, we demonstrate how the degree of separability enters into the analysis of $\ell_2$ steepest descent and stochastic gradient descent for delivering approximate-maximum-margin solutions with associated computational guarantees as well. This suggests that first-order methods can lead to statistically meaningful solutions in the separable case, even though the ML solution does not exist.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- South America > Chile (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Classifying Big Data over Networks via the Logistic Network Lasso
Ambos, Henrik, Tran, Nguyen, Jung, Alexander
ABSTRACT We apply network Lasso to solve binary classification (clustering) problems on network structured data. To this end, we generalize ordinary logistic regression to non-Euclidean data defined over a complex network structure. A scalable classification algorithm is obtained by applying the alternating direction methods of multipliers to solve this optimization problem. Index Terms-- compressed sensing, big data over networks, semi-supervised learning, classification, clustering, complex networks, convex optimization I. INTRODUCTION We consider the problem of classifying or clustering a large set of data points which conform to an underlying network structure. Such network-structured datasets arise in a wide range of application domains, e.g., image-and video processing as well as social networks [1].
- North America > United States > Massachusetts > Plymouth County > Hanover (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Finland (0.04)
Piecewise-Linear Approximation for Feature Subset Selection in a Sequential Logit Model
Sato, Toshiki, Takano, Yuichi, Miyashiro, Ryuhei
This paper concerns a method of selecting a subset of features for a sequential logit model. Tanaka and Nakagawa (2014) proposed a mixed integer quadratic optimization formulation for solving the problem based on a quadratic approximation of the logistic loss function. However, since there is a significant gap between the logistic loss function and its quadratic approximation, their formulation may fail to find a good subset of features. To overcome this drawback, we apply a piecewise-linear approximation to the logistic loss function. Accordingly, we frame the feature subset selection problem of minimizing an information criterion as a mixed integer linear optimization problem. The computational results demonstrate that our piecewise-linear approximation approach found a better subset of features than the quadratic approximation approach.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
- Leisure & Entertainment (0.50)
- Education (0.46)
- Banking & Finance (0.46)