dykstra
Dyslexia and the Reading Wars
Proven methods for teaching the readers who struggle most have been known for decades. Why do we often fail to use them? "There's a window of opportunity to intervene," Mark Seidenberg, a cognitive neuroscientist, said. "You don't want to let that go." In 2024, my niece Caroline received a Ph.D. in gravitational-wave physics. Her research interests include "the impact of model inaccuracies on biases in parameters recovered from gravitational wave data" and "Petrov type, principal null directions, and Killing tensors of slowly rotating black holes in quadratic gravity." I watched a little of her dissertation defense, on Zoom, and was lost as soon as she'd finished introducing herself. She and her husband now live in Italy, where she has a postdoctoral appointment. Caroline's academic achievements seem especially impressive if you know that until third grade she could barely read: to her, words on a page looked like a pulsing mass. She attended a private school in Connecticut, and there was a set time every day when students selected books to read on their own. "I can't remember how long that lasted, but it felt endless," she told me. She hid her disability by turning pages when her classmates did, and by volunteering to draw illustrations during group story-writing projects. One day, she told her grandmother that she could sound out individual letters but when she got to "the end of a row" she couldn't remember what had come before. A psychologist eventually identified her condition as dyslexia. Fluent readers sometimes think of dyslexia as a tendency to put letters in the wrong order or facing the wrong direction, but it's more complicated than that.
- North America > United States > Connecticut (0.24)
- Europe > Italy (0.24)
- North America > United States > New York > Bronx County > New York City (0.05)
- (9 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Education > Educational Setting (1.00)
- North America > United States > Virginia (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration
Qiu, Ruiyu, Wang, Rui, Yang, Guanghui, Li, Xiang, Shao, Zhijiang
Lexicographic multi-objective problems, which consist of multiple conflicting subtasks with explicit priorities, are common in real-world applications. Despite the advantages of Reinforcement Learning (RL) in single tasks, extending conventional RL methods to prioritized multiple objectives remains challenging. In particular, traditional Safe RL and Multi-Objective RL (MORL) methods have difficulty enforcing priority orderings efficiently. Therefore, Lexicographic Multi-Objective RL (LMORL) methods have been developed to address these challenges. However, existing LMORL methods either rely on heuristic threshold tuning with prior knowledge or are restricted to discrete domains. To overcome these limitations, we propose Lexicographically Projected Policy Gradient RL (LPPG-RL), a novel LMORL framework which leverages sequential gradient projections to identify feasible policy update directions, thereby enabling LPPG-RL broadly compatible with all policy gradient algorithms in continuous spaces. LPPG-RL reformulates the projection step as an optimization problem, and utilizes Dykstra's projection rather than generic solvers to deliver great speedups, especially for small- to medium-scale instances. In addition, LPPG-RL introduces Subproblem Exploration (SE) to prevent gradient vanishing, accelerate convergence and enhance stability. We provide theoretical guarantees for convergence and establish a lower bound on policy improvement. Finally, through extensive experiments in a 2D navigation environment, we demonstrate the effectiveness of LPPG-RL, showing that it outperforms existing state-of-the-art continuous LMORL methods.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > China > Zhejiang Province > Ningbo (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
LapSum -- One Method to Differentiate Them All: Ranking, Sorting and Top-k Selection
Struski, Łukasz, Bednarczyk, Michał B., Podolak, Igor T., Tabor, Jacek
We present a novel technique for constructing differentiable order-type operations, including soft ranking, soft top-k selection, and soft permutations. Our approach leverages an efficient closed-form formula for the inverse of the function LapSum, defined as the sum of Laplace distributions. This formulation ensures low computational and memory complexity in selecting the highest activations, enabling losses and gradients to be computed in $O(n\log{}n)$ time. Through extensive experiments, we demonstrate that our method outperforms state-of-the-art techniques for high-dimensional vectors and large $k$ values. Furthermore, we provide efficient implementations for both CPU and CUDA environments, underscoring the practicality and scalability of our method for large-scale ranking and differentiable ordering problems.
- Research Report > Promising Solution (0.54)
- Research Report > New Finding (0.46)
5ef698cd9fe650923ea331c15af3b160-Paper.pdf
We study connections between Dykstra's algorithm for projecting onto an intersection of convex sets, the augmented Lagrangian method of multipliers or ADMM, and block coordinate descent. We prove that coordinate descent for a regularized regression problem, in which the penalty is a separable sum of support functions, is exactly equivalent to Dykstra's algorithm applied to the dual problem. ADMM on the dual problem is also seen to be equivalent, in the special case of two sets, with one being a linear subspace. These connections, aside from being interesting in their own right, suggest new ways of analyzing and extending coordinate descent. For example, from existing convergence theory on Dykstra's algorithm over polyhedra, we discern that coordinate descent for the lasso problem converges at an (asymptotically) linear rate. We also develop two parallel versions of coordinate descent, based on the Dykstra and ADMM connections.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Sander, Michael E., Puigcerver, Joan, Djolonga, Josip, Peyré, Gabriel, Blondel, Mathieu
The top-k operator returns a sparse vector, where the non-zero values correspond to the k largest values of the input. Unfortunately, because it is a discontinuous function, it is difficult to incorporate in neural networks trained end-to-end with backpropagation. Recent works have considered differentiable relaxations, based either on regularization or perturbation techniques. However, to date, no approach is fully differentiable and sparse. In this paper, we propose new differentiable and sparse top-k operators. We view the top-k operator as a linear program over the permutahedron, the convex hull of permutations. We then introduce a p-norm regularization term to smooth out the operator, and show that its computation can be reduced to isotonic optimization. Our framework is significantly more general than the existing one and allows for example to express top-k operators that select values in magnitude. On the algorithmic side, in addition to pool adjacent violator (PAV) algorithms, we propose a new GPU/TPU-friendly Dykstra algorithm to solve isotonic optimization problems. We successfully use our operators to prune weights in neural networks, to fine-tune vision transformers, and as a router in sparse mixture of experts.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
An efficient algorithm for the $\ell_{p}$ norm based metric nearness problem
Tang, Peipei, Jiang, Bo, Wang, Chengjing
Given a dissimilarity matrix, the metric nearness problem is to find the nearest matrix of distances that satisfy the triangle inequalities. This problem has wide applications, such as sensor networks, image processing, and so on. But it is of great challenge even to obtain a moderately accurate solution due to the $O(n^{3})$ metric constraints and the nonsmooth objective function which is usually a weighted $\ell_{p}$ norm based distance. In this paper, we propose a delayed constraint generation method with each subproblem solved by the semismooth Newton based proximal augmented Lagrangian method (PALM) for the metric nearness problem. Due to the high memory requirement for the storage of the matrix related to the metric constraints, we take advantage of the special structure of the matrix and do not need to store the corresponding constraint matrix. A pleasing aspect of our algorithm is that we can solve these problems involving up to $10^{8}$ variables and $10^{13}$ constraints. Numerical experiments demonstrate the efficiency of our algorithm. In theory, firstly, under a mild condition, we establish a primal-dual error bound condition which is very essential for the analysis of local convergence rate of PALM. Secondly, we prove the equivalence between the dual nondegeneracy condition and nonsingularity of the generalized Jacobian for the inner subproblem of PALM. Thirdly, when $q(\cdot)=\|\cdot\|_{1}$ or $\|\cdot\|_{\infty}$, without the strict complementarity condition, we also prove the equivalence between the the dual nondegeneracy condition and the uniqueness of the primal solution.
- North America > United States > New York (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (7 more...)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.68)
Informative Clusters for Multivariate Extremes
Clustering is essential for exploratory data mining, data structure analysis and a common technique for statistical data analysis. It is widely used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. Many clustering approaches exist with different intrinsic notions of what a cluster is. In the standard setup, the goal is to group objects into subsets, known as clusters, such that objects within a given cluster are more related to one another than the ones from a different cluster. Clustering is already quite well-known (see [4, 27] and references therein) conversely to Extreme Value Theory (EVT) which is a newer field in the machine learning community that has been used in anomaly detection [14, 28, 45, 51], classification [31, 32, 54] or clustering [10, 12, 13, 33] when dedicated to the most extreme regions of the sample space.
- North America > Mexico (0.06)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Oceania > New Zealand (0.04)
- (3 more...)
Stronger and Faster Wasserstein Adversarial Attacks
Wu, Kaiwen, Wang, Allen Houze, Yu, Yaoliang
Deep models, while being extremely flexible and accurate, are surprisingly vulnerable to "small, imperceptible" perturbations known as adversarial attacks. While the majority of existing attacks focus on measuring perturbations under the $\ell_p$ metric, Wasserstein distance, which takes geometry in pixel space into account, has long been known to be a suitable metric for measuring image quality and has recently risen as a compelling alternative to the $\ell_p$ metric in adversarial attacks. However, constructing an effective attack under the Wasserstein metric is computationally much more challenging and calls for better optimization algorithms. We address this gap in two ways: (a) we develop an exact yet efficient projection operator to enable a stronger projected gradient attack; (b) we show that the Frank-Wolfe method equipped with a suitable linear minimization oracle works extremely fast under Wasserstein constraints. Our algorithms not only converge faster but also generate much stronger attacks. For instance, we decrease the accuracy of a residual network on CIFAR-10 to $3.4\%$ within a Wasserstein perturbation ball of radius $0.005$, in contrast to $65.6\%$ using the previous Wasserstein attack based on an \emph{approximate} projection operator. Furthermore, employing our stronger attacks in adversarial training significantly improves the robustness of adversarially trained models.
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Machine Learning Optimization Algorithms & Portfolio Allocation
Perrin, Sarah, Roncalli, Thierry
Portfolio optimization emerged with the seminal paper of Markowitz (1952). The original mean-variance framework is appealing because it is very efficient from a computational point of view. However, it also has one well-established failing since it can lead to portfolios that are not optimal from a financial point of view. Nevertheless, very few models have succeeded in providing a real alternative solution to the Markowitz model. The main reason lies in the fact that most academic portfolio optimization models are intractable in real life although they present solid theoretical properties. By intractable we mean that they can be implemented for an investment universe with a small number of assets using a lot of computational resources and skills, but they are unable to manage a universe with dozens or hundreds of assets. However, the emergence and the rapid development of robo-advisors means that we need to rethink portfolio optimization and go beyond the traditional mean-variance optimization approach. Another industry has faced similar issues concerning large-scale optimization problems. Machine learning has long been associated with linear and logistic regression models. Again, the reason was the inability of optimization algorithms to solve high-dimensional industrial problems. Nevertheless, the end of the 1990s marked an important turning point with the development and the rediscovery of several methods that have since produced impressive results. The goal of this paper is to show how portfolio allocation can benefit from the development of these large-scale optimization algorithms. Not all of these algorithms are useful in our case, but four of them are essential when solving complex portfolio optimization problems. These four algorithms are the coordinate descent, the alternating direction method of multipliers, the proximal gradient method and the Dykstra's algorithm.
- Europe (0.45)
- North America > United States > California (0.14)
- Research Report > New Finding (0.34)
- Research Report > Experimental Study (0.34)
- Banking & Finance > Trading (1.00)
- Energy > Oil & Gas > Upstream (0.94)