Goto

Collaborating Authors

 concavity


Relaxed Sparse Eigenvalue Conditions for Sparse Estimation via Non-convex Regularized Regression

arXiv.org Machine Learning

Non-convex regularizers usually improve the performance of sparse estimation in practice. To prove this fact, we study the conditions of sparse estimations for the sharp concave regularizers which are a general family of non-convex regularizers including many existing regularizers. For the global solutions of the regularized regression, our sparse eigenvalue based conditions are weaker than that of L1-regularization for parameter estimation and sparseness estimation. For the approximate global and approximate stationary (AGAS) solutions, almost the same conditions are also enough. We show that the desired AGAS solutions can be obtained by coordinate descent (CD) based methods. Finally, we perform some experiments to show the performance of CD methods on giving AGAS solutions and the degree of weakness of the estimation conditions required by the sharp concave regularizers. Keywords: Sparse estimation, non-convex regularization, sparse eigenvalue, coordinate descent 1. Introduction High-dimensional estimation concerns the parameter estimation problems in which the dimensions of parameters are comparable to or larger than the sampling size.


A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

arXiv.org Machine Learning

We introduce the Integrated Tsallis Combination (ITC), a hybrid impurity measure for decision tree learning that combines normalized Tsallis entropy with an exponential polarization component. While many existing measures sacrifice theoretical soundness for computational efficiency or vice versa, ITC provides a mathematically principled framework that balances both aspects. The core innovation lies in the complementarity between Tsallis entropy's information-theoretic foundations and the polarization component's sensitivity to distributional asymmetry. We establish key theoretical properties-concavity under explicit parameter conditions, proper boundary conditions, and connections to classical measures-and provide a rigorous justification for the hybridization strategy. Through an extensive comparative evaluation on seven benchmark datasets comparing 23 impurity measures with five-fold repetition, we show that simple parametric measures (Tsallis $α=0.5$) achieve the highest average accuracy ($91.17\%$), while ITC variants yield competitive results ($88.38-89.16\%$) with strong theoretical guarantees. Statistical analysis (Friedman test: $χ^2=3.89$, $p=0.692$) reveals no significant global differences among top performers, indicating practical equivalence for many applications. ITC's value resides in its solid theoretical grounding-proven concavity under suitable conditions, flexible parameterization ($α$, $β$, $γ$), and computational efficiency $O(K)$-making it a rigorous, generalizable alternative when theoretical guarantees are paramount. We provide guidelines for measure selection based on application priorities and release an open-source implementation to foster reproducibility and further research.


Diminishing Returns Shape Constraints for Interpretability and Regularization

Neural Information Processing Systems

Similarly, a model that predicts the time it will take a customer to grocery shop should decrease in the number of cashiers, but each addedcashierreduces average wait time by less. In both cases, we would like to be able to incorporate this prior knowledge by constraining the machine learned model's output to have a diminishing returns response to the size of the apartment or number of cashiers.





Supplementary Material for " Variational Policy Gradient Method for Reinforcement Learning with General Utilities " A Related Work

Neural Information Processing Systems

We provide a more extension discussion for the context of this work. Firstly, when closed-form expressions for the optimizer of a function are unavailable, solving optimization problems requires iterative schemes such as gradient ascent [31]. Their convergence to global extrema is predicated on concavity and the tractability of computing ascent directions. When the objective takes the form of an expected value of a function parameterized by a random variable, stochastic approximations are required [36, 24]. The PG Theorem mentioned above gives a specific form for obtaining ascent directions with respect to a parameterized family of stationary policies via trajectories in a Markov decision process, when the objective is the expected cumulative return [44], which gives rise to the REINFORCE algorithm.


Towards aligned body representations in vision models

arXiv.org Artificial Intelligence

Human physical reasoning relies on internal "body" representations -- coarse, volumetric approximations that capture an object's extent and support intuitive predictions about motion and physics. While psychophysical evidence suggests humans use such coarse representations, their internal structure remains largely unknown. Here we test whether vision models trained for segmentation develop comparable representations. We adapt a psychophysical experiment conducted with 50 human participants to a semantic segmentation task and test a family of seven segmentation networks, varying in size. We find that smaller models naturally form human-like coarse body representations, whereas larger models tend toward overly detailed, fine-grain encodings. Our results demonstrate that coarse representations can emerge under limited computational resources, and that machine representations can provide a scalable path toward understanding the structure of physical reasoning in the brain.


Forecasting AI Time Horizon Under Compute Slowdowns

arXiv.org Artificial Intelligence

METR's time horizon metric has grown exponentially since 2019, along with compute. However, it is unclear whether compute scaling will persist at current rates through 2030, raising the question of how possible compute slowdowns might impact AI agent capability forecasts. Given a model of time horizon as a function of training compute and algorithms, along with a model of how compute investment spills into algorithmic progress (which, notably, precludes the possibility of a software-only singularity), and the empirical fact that both time horizon and compute have grown at constant rates over 2019--2025, we derive that time horizon growth must be proportional to compute growth. We provide additional, albeit limited, experimental evidence consistent with this theory. We use our model to project time horizon growth under OpenAI's compute projection, finding substantial projected delays in some cases. For example, 1-month time horizons at $80\%$ reliability occur $7$ years later than simple trend extrapolation suggests.