Goto

Collaborating Authors

 Regression


An Analysis of Classification Approaches for Hit Song Prediction using Engineered Metadata Features with Lyrics and Audio Features

arXiv.org Artificial Intelligence

Hit song prediction, one of the emerging fields in music information retrieval (MIR), remains a considerable challenge. Being able to understand what makes a given song a hit is clearly beneficial to the whole music industry. Previous approaches to hit song prediction have focused on using audio features of a record. This study aims to improve the prediction result of the top 10 hits among Billboard Hot 100 songs using more alternative metadata, including song audio features provided by Spotify, song lyrics, and novel metadata-based features (title topic, popularity continuity and genre class). Five machine learning approaches are applied, including: k-nearest neighbours, Naive Bayes, Random Forest, Logistic Regression and Multilayer Perceptron. Our results show that Random Forest (RF) and Logistic Regression (LR) with all features (including novel features, song audio features and lyrics features) outperforms other models, achieving 89.1% and 87.2% accuracy, and 0.91 and 0.93 AUC, respectively. Our findings also demonstrate the utility of our novel music metadata features, which contributed most to the models' discriminative performance.


Cutting Plane Selection with Analytic Centers and Multiregression

arXiv.org Artificial Intelligence

Cutting planes are a crucial component of state-of-the-art mixed-integer programming solvers, with the choice of which subset of cuts to add being vital for solver performance. We propose new distance-based measures to qualify the value of a cut by quantifying the extent to which it separates relevant parts of the relaxed feasible set. For this purpose, we use the analytic centers of the relaxation polytope or of its optimal face, as well as alternative optimal solutions of the linear programming relaxation. We assess the impact of the choice of distance measure on root node performance and throughout the whole branch-and-bound tree, comparing our measures against those prevalent in the literature. Finally, by a multi-output regression, we predict the relative performance of each measure, using static features readily available before the separation process. Our results indicate that analytic center-based methods help to significantly reduce the number of branch-and-bound nodes needed to explore the search space and that our multiregression approach can further improve on any individual method.


Multicalibration as Boosting for Regression

arXiv.org Artificial Intelligence

We revisit the problem of boosting for regression, and develop a new agnostic regression boosting algorithm via a connection to multicalibration. In doing so, we shed additional light on multicalibration, a recent learning objective that has emerged from the algorithmic fairness literature [Hébert-Johnson et al., 2018]. In particular, we characterize multicalibration in terms of a "swap-regret" like condition, and use it to answer the question "what property must a collection of functions H have so that multicalibration with respect to H implies Bayes optimality?", giving a complete answer to problem asked by Burhanpurkar et al. [2021]. Using our swap-regret characterization, we derive an especially simple algorithm for learning a multicalibrated predictor for a class of functions H by reduction to a standard squared-error regression algorithm for H. The same algorithm can also be analyzed as a boosting algorithm for squared error regression that makes calls to a weak learner for squared error regression on subsets of the original data distribution without the need to relabel examples (in contrast to Gradient Boosting as well as existing multicalibration algorithms). This lets us specify a weak learning condition that is sufficient for convergence to the Bayes optimal predictor (even if the Bayes optimal predictor does not have zero error), avoiding the kinds of realizability assumptions that are implicit in analyses of boosting algorithms that converge to zero error. We conclude that ensuring multicalibration with respect to H corresponds to boosting for squared error regression in which H forms the set of weak learners. Finally we define a weak learning condition for H relative to a constrained class of functions C (rather than with respect to the Bayes optimal predictor). We show that multicalibration with respect to H implies multicalibration with respect to C if H satisfies the weak learning condition with respect to C, which in turn implies accuracy at least that of the best function in C. Multicalibration Consider a distribution D P Z defined over a domain Z " X ˆ R of feature vectors x P X paired with real valued labels y.


Low Complexity Adaptive Machine Learning Approaches for End-to-End Latency Prediction

arXiv.org Artificial Intelligence

Software Defined Networks have opened the door to statistical and AI-based techniques to improve efficiency of networking. Especially to ensure a certain Quality of Service (QoS) for specific applications by routing packets with awareness on content nature (VoIP, video, files, etc.) and its needs (latency, bandwidth, etc.) to use efficiently resources of a network. Monitoring and predicting various Key Performance Indicators (KPIs) at any level may handle such problems while preserving network bandwidth. The question addressed in this work is the design of efficient, low-cost adaptive algorithms for KPI estimation, monitoring and prediction. We focus on end-to-end latency prediction, for which we illustrate our approaches and results on data obtained from a public generator provided after the recent international challenge on GNN [12]. In this paper, we improve our previously proposed low-cost estimators [6] by adding the adaptive dimension, and show that the performances are minimally modified while gaining the ability to track varying networks.


Robust Linear Regression: Gradient-descent, Early-stopping, and Beyond

arXiv.org Artificial Intelligence

In this work we study the robustness to adversarial attacks, of early-stopping strategies on gradient-descent (GD) methods for linear regression. More precisely, we show that early-stopped GD is optimally robust (up to an absolute constant) against Euclidean-norm adversarial attacks. However, we show that this strategy can be arbitrarily sub-optimal in the case of general Mahalanobis attacks. This observation is compatible with recent findings in the case of classification~\cite{Vardi2022GradientMP} that show that GD provably converges to non-robust models. To alleviate this issue, we propose to apply instead a GD scheme on a transformation of the data adapted to the attack. This data transformation amounts to apply feature-depending learning rates and we show that this modified GD is able to handle any Mahalanobis attack, as well as more general attacks under some conditions. Unfortunately, choosing such adapted transformations can be hard for general attacks. To the rescue, we design a simple and tractable estimator whose adversarial risk is optimal up to within a multiplicative constant of 1.1124 in the population regime, and works for any norm.


Near Optimal Private and Robust Linear Regression

arXiv.org Artificial Intelligence

We study the canonical statistical estimation problem of linear regression from $n$ i.i.d.~examples under $(\varepsilon,\delta)$-differential privacy when some response variables are adversarially corrupted. We propose a variant of the popular differentially private stochastic gradient descent (DP-SGD) algorithm with two innovations: a full-batch gradient descent to improve sample complexity and a novel adaptive clipping to guarantee robustness. When there is no adversarial corruption, this algorithm improves upon the existing state-of-the-art approach and achieves a near optimal sample complexity. Under label-corruption, this is the first efficient linear regression algorithm to guarantee both $(\varepsilon,\delta)$-DP and robustness. Synthetic experiments confirm the superiority of our approach.


Understanding Self-Distillation in the Presence of Label Noise

arXiv.org Artificial Intelligence

Self-distillation (SD) is the process of first training a \enquote{teacher} model and then using its predictions to train a \enquote{student} model with the \textit{same} architecture. Specifically, the student's objective function is $\big(\xi*\ell(\text{teacher's predictions}, \text{ student's predictions}) + (1-\xi)*\ell(\text{given labels}, \text{ student's predictions})\big)$, where $\ell$ is some loss function and $\xi$ is some parameter $\in [0,1]$. Empirically, SD has been observed to provide performance gains in several settings. In this paper, we theoretically characterize the effect of SD in two supervised learning problems with \textit{noisy labels}. We first analyze SD for regularized linear regression and show that in the high label noise regime, the optimal value of $\xi$ that minimizes the expected error in estimating the ground truth parameter is surprisingly greater than 1. Empirically, we show that $\xi > 1$ works better than $\xi \leq 1$ even with the cross-entropy loss for several classification datasets when 50\% or 30\% of the labels are corrupted. Further, we quantify when optimal SD is better than optimal regularization. Next, we analyze SD in the case of logistic regression for binary classification with random label corruption and quantify the range of label corruption in which the student outperforms the teacher in terms of accuracy. To our knowledge, this is the first result of its kind for the cross-entropy loss.


Data-driven soiling detection in PV modules

arXiv.org Artificial Intelligence

Soiling is the accumulation of dirt in solar panels which leads to a decreasing trend in solar energy yield and may be the cause of vast revenue losses. The effect of soiling can be reduced by washing the panels, which is, however, a procedure of non-negligible cost. Moreover, soiling monitoring systems are often unreliable or very costly. We study the problem of estimating the soiling ratio in photo-voltaic (PV) modules, i.e., the ratio of the real power output to the power output that would be produced if solar panels were clean. A key advantage of our algorithms is that they estimate soiling, without needing to train on labelled data, i.e., periods of explicitly monitoring the soiling in each park, and without relying on generic analytical formulas which do not take into account the peculiarities of each installation. We consider as input a time series comprising a minimum set of measurements, that are available to most PV park operators. Our experimental evaluation shows that we significantly outperform current state-of-the-art methods for estimating soiling ratio.


On mesoscale thermal dynamics of para- and ortho- isomers of water

arXiv.org Artificial Intelligence

This work describes experiments on thermal dynamics of pure H2O excited by hydrodynamic cavitation, which has been reported to facilitate the spin conversion of para- and ortho-isomers at water interfaces. Previous measurements by NMR and capillary methods of excited samples demonstrated changes of proton density by 12-15%, the surface tension up to 15.7%, which can be attributed to a non-equilibrium para-/ortho- ratio. Beside these changes, we also expect a variation of heat capacity. Experiments use a differential calorimetric approach with two devices: one with an active thermostat for diathermic measurements, another is fully passive for long-term measurements. Samples after excitation are degassed at -0.09MPa and thermally equalized in a water bath. Conducted attempts demonstrated changes in the heat capacity of experimental samples by 4.17%--5.72% measured in the transient dynamics within 60 min after excitation, which decreases to 2.08% in the steady-state dynamics 90-120 min after excitation. Additionally, we observed occurrence of thermal fluctuations at the level of 10^-3 C relative temperature on 20-40 min mesoscale dynamics and a long-term increase of such fluctuations in experimental samples. Obtained results are reproducible in both devices and are supported by previously published outcomes on four-photon scattering spectra in the range from -1.5 to 1.5 cm^-1 and electrochemical reactivity in CO2 and H2O2 pathways. Based on these results, we propose a hypothesis about ongoing spin conversion process on mesoscopic scales under weak influx of energy caused by thermal, EM or geomagnetic factors; this enables explaining electrochemical and thermal anomalies observed in long-term measurements.


Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to prevent avoidable all-cause readmissions or death

arXiv.org Artificial Intelligence

We developed an inherently interpretable multilevel Bayesian framework for representing variation in regression coefficients that mimics the piecewise linearity of ReLU-activated deep neural networks. We used the framework to formulate a survival model for using medical claims to predict hospital readmission and death that focuses on discharge placement, adjusting for confounding in estimating causal local average treatment effects. We trained the model on a 5% sample of Medicare beneficiaries from 2008 and 2011, based on their 2009--2011 inpatient episodes, and then tested the model on 2012 episodes. The model scored an AUROC of approximately 0.76 on predicting all-cause readmissions -- defined using official Centers for Medicare and Medicaid Services (CMS) methodology -- or death within 30-days of discharge, being competitive against XGBoost and a Bayesian deep neural network, demonstrating that one need-not sacrifice interpretability for accuracy. Crucially, as a regression model, we provide what blackboxes cannot -- the exact gold-standard global interpretation of the model, identifying relative risk factors and quantifying the effect of discharge placement. We also show that the posthoc explainer SHAP fails to provide accurate explanations.