nonnegativity
Relationship between H\"{o}lder Divergence and Functional Density Power Divergence: Intersection and Generalization
In this study, we discuss the relationship between two families of density-power-based divergences with functional degrees of freedom -- the H\"{o}lder divergence and the functional density power divergence (FDPD) -- based on their intersection and generalization. These divergence families include the density power divergence and the $\gamma$-divergence as special cases. First, we prove that the intersection of the H\"{o}lder divergence and the FDPD is limited to a general divergence family introduced by Jones et al. (Biometrika, 2001). Subsequently, motivated by the fact that H\"{o}lder's inequality is used in the proofs of nonnegativity for both the H\"{o}lder divergence and the FDPD, we define a generalized divergence family, referred to as the $\xi$-H\"{o}lder divergence. The nonnegativity of the $\xi$-H\"{o}lder divergence is established through a combination of the inequalities used to prove the nonnegativity of the H\"{o}lder divergence and the FDPD. Furthermore, we derive an inequality between the composite scoring rules corresponding to different FDPDs based on the $\xi$-H\"{o}lder divergence. Finally, we prove that imposing the mathematical structure of the H\"{o}lder score on a composite scoring rule results in the $\xi$-H\"{o}lder divergence.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)
Riemannian Multiplicative Update for Sparse Simplex constraint using oblique rotation manifold
Esposito, Flavia, Ang, Andersen
We propose a new manifold optimization method to solve low-rank problems with sparse simplex constraints (variables are simultaneous nonnegativity, sparsity, and sum-to-1) that are beneficial in applications. The proposed approach exploits oblique rotation manifolds, rewrite the problem, and introduce a new Riemannian optimization method. Experiments on synthetic datasets compared to the standard Euclidean method show the effectiveness of the proposed method.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- (2 more...)
Discretely Beyond $1/e$: Guided Combinatorial Algorithms for Submodular Maximization
Chen, Yixin, Nath, Ankur, Peng, Chunli, Kuhnle, Alan
For constrained, not necessarily monotone submodular maximization, all known approximation algorithms with ratio greater than $1/e$ require continuous ideas, such as queries to the multilinear extension of a submodular function and its gradient, which are typically expensive to simulate with the original set function. For combinatorial algorithms, the best known approximation ratios for both size and matroid constraint are obtained by a simple randomized greedy algorithm of Buchbinder et al. [9]: $1/e \approx 0.367$ for size constraint and $0.281$ for the matroid constraint in $\mathcal O (kn)$ queries, where $k$ is the rank of the matroid. In this work, we develop the first combinatorial algorithms to break the $1/e$ barrier: we obtain approximation ratio of $0.385$ in $\mathcal O (kn)$ queries to the submodular set function for size constraint, and $0.305$ for a general matroid constraint. These are achieved by guiding the randomized greedy algorithm with a fast local search algorithm. Further, we develop deterministic versions of these algorithms, maintaining the same ratio and asymptotic time complexity. Finally, we develop a deterministic, nearly linear time algorithm with ratio $0.377$.
Disentanglement with Biological Constraints: A Theory of Functional Cell Types
Whittington, James C. R., Dorrell, Will, Ganguli, Surya, Behrens, Timothy E. J.
Neurons in the brain are often finely tuned for specific task variables. Moreover, such disentangled representations are highly sought after in machine learning. Here we mathematically prove that simple biological constraints on neurons, namely nonnegativity and energy efficiency in both activity and weights, promote such sought after disentangled representations by enforcing neurons to become selective for single factors of task variation. We demonstrate these constraints lead to disentanglement in a variety of tasks and architectures, including variational autoencoders. We also use this theory to explain why the brain partitions its cells into distinct cell types such as grid and object-vector cells, and also explain when the brain instead entangles representations in response to entangled task factors. Overall, this work provides a mathematical understanding of why single neurons in the brain often represent single human-interpretable factors, and steps towards an understanding task structure shapes the structure of brain representation.
A Classical Math Problem Gets Pulled Into Self-Driving Cars
Long before robots could run or cars could drive themselves, mathematicians contemplated a simple mathematical question. They figured it out, then laid it to rest--with no way of knowing that the object of their mathematical curiosity would feature in machines of the far-off future. Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences. The future is now here. As a result of new work by Amir Ali Ahmadi and Anirudha Majumdar of Princeton University, a classical problem from pure mathematics is poised to provide iron-clad proof that drone aircraft and autonomous cars won't crash into trees or veer into oncoming traffic.
- Transportation > Ground > Road (1.00)
- Transportation > Passenger (0.72)
- Information Technology > Robotics & Automation (0.72)
Quantifying multivariate redundancy with maximum entropy decompositions of mutual information
Williams and Beer (2010) proposed a nonnegative mutual information decomposition, based on the construction of redundancy lattices, which allows separating the information that a set of variables contains about a target variable into nonnegative components interpretable as the unique information of some variables not contained in others as well as redundant and synergistic components. However, the definition of multivariate measures of redundancy that comply with nonnegativity and conform to certain axioms that capture conceptually desirable properties of redundancy has proven to be elusive. We here present a procedure to determine multivariate redundancy measures, within the framework of maximum entropy models. In particular, we generalize existing bivariate maximum entropy-based measures of redundancy and unique information, defining measures of the redundant information that a group of variables has about a target, and of the unique redundant information that a group of variables has about a target that is not redundant with information from another group. The two key ingredients for this approach are: First, the identification of a type of constraints on entropy maximization that allows isolating components of redundancy and unique redundancy by mirroring them to synergy components. Second, the construction of rooted tree-based decompositions to breakdown mutual information, ensuring nonnegativity by the local implementation of maximum entropy information projections at each binary unfolding of the tree nodes. Altogether, the proposed measures are nonnegative and conform to the desirable axioms for redundancy measures.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Indiana (0.04)
- Europe > Italy (0.04)
Nonnegative Sparse PCA
We describe a nonnegative variant of the "Sparse PCA" problem. The goal is to create a low dimensional representation from a collection of points which on the one hand maximizes the variance of the projected points and on the other uses only parts of the original coordinates, and thereby creating a sparse representation. What distinguishes our problem from other Sparse PCA formulations is that the projection involves only nonnegative weights of the original coordinates -- a desired quality in various fields, including economics, bioinformatics and computer vision. Adding nonnegativity contributes to sparseness, where it enforces a partitioning of the original coordinates among the new axes. We describe a simple yet efficient iterative coordinate-descent type of scheme which converges to a local optimum of our optimization criteria, giving good results on large real world datasets.
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
- North America > United States (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Asia > Middle East > Jordan (0.04)
Nonnegative Sparse PCA
We describe a nonnegative variant of the "Sparse PCA" problem. The goal is to create a low dimensional representation from a collection of points which on the one hand maximizes the variance of the projected points and on the other uses only parts of the original coordinates, and thereby creating a sparse representation. What distinguishes our problem from other Sparse PCA formulations is that the projection involves only nonnegative weights of the original coordinates -- a desired quality in various fields, including economics, bioinformatics and computer vision. Adding nonnegativity contributes to sparseness, where it enforces a partitioning of the original coordinates among the new axes. We describe a simple yet efficient iterative coordinate-descent type of scheme which converges to a local optimum of our optimization criteria, giving good results on large real world datasets.
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
- North America > United States (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Asia > Middle East > Jordan (0.04)
Nonnegative Sparse PCA
We describe a nonnegative variant of the "Sparse PCA" problem. The goal is to create a low dimensional representation from a collection of points which on the one hand maximizes the variance of the projected points and on the other uses only parts of the original coordinates, and thereby creating a sparse representation. Whatdistinguishes our problem from other Sparse PCA formulations is that the projection involves only nonnegative weights of the original coordinates -- a desired quality in various fields, including economics, bioinformatics and computer vision.Adding nonnegativity contributes to sparseness, where it enforces a partitioning of the original coordinates among the new axes. We describe a simple yetefficient iterative coordinate-descent type of scheme which converges to a local optimum of our optimization criteria, giving good results on large real world datasets.
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
- North America > United States (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Asia > Middle East > Jordan (0.04)
Generalized Nonnegative Matrix Approximations with Bregman Divergences
Sra, Suvrit, Dhillon, Inderjit S.
Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation for nonnegative input data. NNMA has found a wide variety of applications, including text analysis, document clustering, face/image recognition, language modeling, speech processing and many others. Despite these numerous applications, the algorithmic development for computing the NNMA factors has been relatively deficient. This paper makes algorithmic progress by modeling and solving (using multiplicative updates) new generalized NNMA problems that minimize Bregman divergences between the input matrix and its lowrank approximation. The multiplicative update formulae in the pioneering work by Lee and Seung [11] arise as a special case of our algorithms. In addition, the paper shows how to use penalty functions for incorporating constraints other than nonnegativity into the problem. Further, some interesting extensions to the use of "link" functions for modeling nonlinear relationships are also discussed.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)