Goto

Collaborating Authors

 lambda 0


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

The paper proposes new algorithms to address a set of problems falling under the umbrella term of'submodular partitioning' - including two distinct clustering problems, namely clustering to maximize homogeneity, or clustering so as to maximize the representation power of every cluster (e.g. The authors consider the case where common costs are applied to each cluster (homogeneous case), or when distinct costs are applied to every cluster (hetergoeneous). Adding to this split, the authors also treat a mix of a robust and an average loss (e.g. a linear combination of the the max distortion with the average distortion when clustering). Starting from the observation that the algorithms working for the robust variants are not scalable, the authors proceed to propose (i) greedy algorithms for the robust homogeneous/heterogeneous algorithms with approximation guarantees, and (ii) a simple method for blending the solutions to the robust and the average cost optimizations. They then proceed to practical examples, demonstrating improvements over baseline algorithms for clustering.


Review for NeurIPS paper: Personalized Federated Learning with Moreau Envelopes

Neural Information Processing Systems

Weaknesses: I am not completely convinced by the theoretical results. Specifically, I am not sure that Theorems 1 and 2 prove the right notion of convergence. Conceptually, I think you want to show that sum_i f_i(theta_i) is small and/or sum_i f_i(w) is small. The theorem statements, I suppose, upper bound sum_i f_i(theta_i), but \ w - theta_i \ is involved, and I don't see why that quantity should be relevant. I don't really see any reason to care about \ w - theta_i \; so what if you need to move far away from w to optimally personalize for one particular objective? In short, I think that the proposed objective pFedMe makes sense as a *training/surrogate objective*, but does not make as much sense as a criterion for evaluating a model, if that makes sense.


Review for NeurIPS paper: On the training dynamics of deep networks with L_2 regularization

Neural Information Processing Systems

Weaknesses: ((1)) If I could have access to the test set, then why bother tuning l2 regularisation to get optimal on the test set? Technically, I could run a brute-force algorithm to find an optimal set of parameters without tuning any other hyperparameters. I do think the submission violates the ethics of machine learning research. I understand that theoretical work generally considers the generalisation gap between the training set and the test set, however, the submission is an empirical work on hyperparameter tuning for optimal l2 regularisation that gives the highest test set accuracy. Therefore, a validation set is required for tuning, and then it should be tested on the test set afterward.


Reviews: Improving Black-box Adversarial Attacks with a Transfer-based Prior

Neural Information Processing Systems

A regret is about estimation for cosine similarity (my concern 2). Although the response adds the specific value of S, it is still not explained **when** and **how often** to estimate cosine similarity (see line 197–198). It should have an important impact on query complexity but ignored in experiments. It is suggested to make similarity estimation clear in a final version. The idea is OK to combine transfer-based attack and query-based attack. The paper proposes a simple method where the gradient of the surrogate model is used as a prior of the true gradient.


Global Minima by Penalized Full-dimensional Scaling

arXiv.org Machine Learning

The full-dimensional (metric, Euclidean, least squares) multidimensional scaling stress loss function is combined with a quadratic external penalty function term. The trajectory of minimizers of stress for increasing values of the penalty parameter is then used to find (tentative) global minima for low-dimensional multidimensional scaling. This is illustrated with several one-dimensional and two-dimensional examples.


Exclusive Lasso and Group Lasso using R code

#artificialintelligence

This post shows how to use the R packages for estimating an exclusive lasso and a group lasso. These lasso variants have a given grouping order in common but differ in how this grouping constraint is functioning when a variable selection is performed. Lasso, Group Lasso, and Exclusive Lasso While LASSO (least absolute shrinkage and selection operator) has many variants and extensions, our focus is on two lasso models: Group Lasso and Exclusive Lasso. Before we dive into the specifics, let's go over the similarities and differences of these two lasso variants from the following figure. In the above figure, 15 variables are categorized into 5 groups.


Lagrange Multiplier Approach with Inequality Constraints

#artificialintelligence

In a previous post, we introduced the method of Lagrange multipliers to find local minima or local maxima of a function with equality constraints. The same method can be applied to those with inequality constraints as well. In this tutorial, you will discover the method of Lagrange multipliers applied to find the local minimum or maximum of a function when inequality constraints are present, optionally together with equality constraints. Lagrange Multiplier Approach with Inequality Constraints Photo by Christine Roy, some rights reserved. You can review these concepts by clicking on the links above.


Multi-Task Learning and Adapted Knowledge Models for Emotion-Cause Extraction

arXiv.org Artificial Intelligence

Detecting what emotions are expressed in text is a well-studied problem in natural language processing. However, research on finer grained emotion analysis such as what causes an emotion is still in its infancy. We present solutions that tackle both emotion recognition and emotion cause detection in a joint fashion. Considering that common-sense knowledge plays an important role in understanding implicitly expressed emotions and the reasons for those emotions, we propose novel methods that combine common-sense knowledge via adapted knowledge models with multi-task learning to perform joint emotion classification and emotion cause tagging. We show performance improvement on both tasks when including common-sense reasoning and a multitask framework. We provide a thorough analysis to gain insights into model performance.


On the Use of Minimum Penalties in Statistical Learning

arXiv.org Machine Learning

Modern multivariate machine learning and statistical methodologies estimate parameters of interest while leveraging prior knowledge of the association between outcome variables. The methods that do allow for estimation of relationships do so typically through an error covariance matrix in multivariate regression which does not scale to other types of models. In this article we proposed the MinPEN framework to simultaneously estimate regression coefficients associated with the multivariate regression model and the relationships between outcome variables using mild assumptions. The MinPen framework utilizes a novel penalty based on the minimum function to exploit detected relationships between responses. An iterative algorithm that generalizes current state of the art methods is proposed as a solution to the non-convex optimization that is required to obtain estimates. Theoretical results such as high dimensional convergence rates, model selection consistency, and a framework for post selection inference are provided. We extend the proposed MinPen framework to other exponential family loss functions, with a specific focus on multiple binomial responses. Tuning parameter selection is also addressed. Finally, simulations and two data examples are presented to show the finite sample properties of this framework.


Group-sparse block PCA and explained variance

arXiv.org Machine Learning

The paper addresses the simultneous determination of goup-sparse loadings by block optimization, and the correlated problem of defining explained variance for a set of non orthogonal components. We give in both cases a comprehensive mathematical presentation of the problem, which leads to propose i) a new formulation/algorithm for group-sparse block PCA and ii) a framework for the definition of explained variance with the analysis of five definitions. The numerical results i) confirm the superiority of block optimization over deflation for the determination of group-sparse loadings, and the importance of group information when available, and ii) show that ranking of algorithms according to explained variance is essentially independant of the definition of explained variance. These results lead to propose a new optimal variance as the definition of choice for explained variance.