fisher
Simplex Deep Linear Discriminant Analysis
Tezekbayev, Maxat, Bolatov, Arman, Assylbekov, Zhenisbek
We revisit Deep Linear Discriminant Analysis (Deep LDA) from a likelihood-based perspective. While classical LDA is a simple Gaussian model with linear decision boundaries, attaching an LDA head to a neural encoder raises the question of how to train the resulting deep classifier by maximum likelihood estimation (MLE). We first show that end-to-end MLE training of an unconstrained Deep LDA model ignores discrimination: when both the LDA parameters and the encoder parameters are learned jointly, the likelihood admits a degenerate solution in which some of the class clusters may heavily overlap or even collapse, and classification performance deteriorates. Batchwise moment re-estimation of the LDA parameters does not remove this failure mode. We then propose a constrained Deep LDA formulation that fixes the class means to the vertices of a regular simplex in the latent space and restricts the shared covariance to be spherical, leaving only the priors and a single variance parameter to be learned along with the encoder. Under these geometric constraints, MLE becomes stable and yields well-separated class clusters in the latent space. On images (Fashion-MNIST, CIFAR-10, CIFAR-100), the resulting Deep LDA models achieve accuracy competitive with softmax baselines while offering a simple, interpretable latent geometry that is clearly visible in two-dimensional projections.
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Discriminant Analysis (0.62)
Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schrödinger Equation
Solving the quantum many-body Schrödinger equation is a fundamental and challenging problem in the fields of quantum physics, quantum chemistry, and material sciences. One of the common computational approaches to this problem is Quantum Variational Monte Carlo (QVMC), in which ground-state solutions are obtained by minimizing the energy of the system within a restricted family of parameterized wave functions. Deep learning methods partially address the limitations of traditional QVMC by representing a rich family of wave functions in terms of neural networks. However, the optimization objective in QVMC remains notoriously hard to minimize and requires second-order optimization methods such as natural gradient. In this paper, we first reformulate energy functional minimization in the space of Born distributions corresponding to particle-permutation (anti-)symmetric wave functions, rather than the space of wave functions. We then interpret QVMC as the Fisher--Rao gradient flow in this distributional space, followed by a projection step onto the variational manifold. This perspective provides us with a principled framework to derive new QMC algorithms, by endowing the distributional space with better metrics, and following the projected gradient flow induced by those metrics. More specifically, we propose Wasserstein Quantum Monte Carlo (WQMC), which uses the gradient flow induced by the Wasserstein metric, rather than the Fisher--Rao metric, and corresponds to the probability mass, rather than it. We demonstrate empirically that the dynamics of WQMC results in faster convergence to the ground state of molecular systems.
Merging Models with Fisher-Weighted Averaging
Averaging the parameters of models that have the same architecture and initialization can provide a means of combining their respective capabilities. In this paper, we take the perspective that this merging operation can be seen as choosing parameters that approximately maximize the joint likelihood of the posteriors of the models' parameters.
Interval Fisher's Discriminant Analysis and Visualisation
Pinheiro, Diogo, Oliveira, M. Rosário, Kravchenko, Igor, Oliveira, Lina
In Data Science, entities are typically represented by single valued measurements. Symbolic Data Analysis extends this framework to more complex structures, such as intervals and histograms, that express internal variability. We propose an extension of multiclass Fisher's Discriminant Analysis to interval-valued data, using Moore's interval arithmetic and the Mallows' distance. Fisher's objective function is generalised to consider simultaneously the contributions of the centres and the ranges of intervals and is numerically maximised. The resulting discriminant directions are then used to classify interval-valued observations.To support visual assessment, we adapt the class map, originally introduced for conventional data, to classifiers that assign labels through minimum distance rules. We also extend the silhouette plot to this setting and use stacked mosaic plots to complement the visual display of class assignments. Together, these graphical tools provide insight into classifier performance and the strength of class membership. Applications to real datasets illustrate the proposed methodology and demonstrate its value in interpreting classification results for interval-valued data.
- North America > United States > California > Los Angeles County > Los Angeles (0.05)
- Europe > Spain > Galicia > Madrid (0.05)
- Asia > China > Hong Kong (0.05)
- (7 more...)
CoGraM: Context-sensitive granular optimization method with rollback for robust model fusion
Merging neural networks without retraining is central to federated and distributed learning. Common methods such as weight averaging or Fisher merging often lose accuracy and are unstable across seeds. CoGraM (Contextual Granular Merging) is a multi-stage, context-sensitive, loss-based, and iterative optimization method across layers, neurons, and weight levels that aligns decisions with loss differences and thresholds and prevents harmful updates through rollback. CoGraM is an optimization method that addresses the weaknesses of methods such as Fisher and can significantly improve the merged network.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning
Sun, Wei, Yang, Wen, Jian, Pu, Du, Qianlong, Cui, Fuwei, Ren, Shuo, Zhang, Jiajun
Recent advances have demonstrated that integrating reinforcement learning with rule-based rewards can significantly enhance the reasoning capabilities of large language models, even without supervised fine-tuning. However, prevalent reinforcement learning algorithms such as GRPO and its variants like DAPO, suffer from a coarse granularity issue when computing the advantage. Specifically, they compute rollout-level advantages that assign identical values to every token within a sequence, failing to capture token-specific contributions and hindering effective learning. To address this limitation, we propose Key-token Advantage Estimation (KTAE) - a novel algorithm that estimates fine-grained, token-level advantages without introducing additional models. KTAE leverages the correctness of sampled rollouts and applies statistical analysis to quantify the importance of individual tokens within a sequence to the final outcome. This quantified token-level importance is then combined with the rollout-level advantage to obtain a more fine-grained token-level advantage estimation. Empirical results show that models trained with GRPO+KTAE and DAPO+KTAE outperform baseline methods across five mathematical reasoning benchmarks. Notably, they achieve higher accuracy with shorter responses and even surpass R1-Distill-Qwen-1.5B using the same base model.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.05)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- (12 more...)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > Austria > Salzburg > Salzburg (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > North Carolina (0.04)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)