discriminant
From Polynomials to Databases: Arithmetic Structures in Galois Theory
We develop a computational framework for classifying Galois groups of irreducible degree-7 polynomials over~$\mathbb{Q}$, combining explicit resolvent methods with machine learning techniques. A database of over one million normalized projective septics is constructed, each annotated with algebraic invariants~$J_0, \dots, J_4$ derived from binary transvections. For each polynomial, we compute resolvent factorizations to determine its Galois group among the seven transitive subgroups of~$S_7$ identified by Foulkes. Using this dataset, we train a neurosymbolic classifier that integrates invariant-theoretic features with supervised learning, yielding improved accuracy in detecting rare solvable groups compared to coefficient-based models. The resulting database provides a reproducible resource for constructive Galois theory and supports empirical investigations into group distribution under height constraints. The methodology extends to higher-degree cases and illustrates the utility of hybrid symbolic-numeric techniques in computational algebra.
- North America > United States > Michigan > Oakland County > Rochester (0.40)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration
Yuan, Xiaoyang, Ding, Yujuan, Bin, Yi, Shao, Wenqi, Cai, Jinyu, Song, Jingkuan, Yang, Yang, Shen, Heng Tao
Reinforcement Learning with V erifiable Rewards (RL VR) is a promising paradigm for enhancing the reasoning ability in Large Language Models (LLMs). However, prevailing methods primarily rely on self-exploration or a single off-policy teacher to elicit long chain-of-thought (LongCoT) reasoning, which may introduce intrinsic model biases and restrict exploration, ultimately limiting reasoning diversity and performance. Drawing inspiration from multi-teacher strategies in knowledge distillation, we introduce Adaptive Multi-Guidance Policy Optimization (AMPO), a novel framework that adaptively leverages guidance from multiple proficient teacher models, but only when the on-policy model fails to generate correct solutions. This "guidance-on-demand" approach expands exploration while preserving the value of self-discovery. Moreover, AMPO incorporates a comprehension-based selection mechanism, prompting the student to learn from the reasoning paths that it is most likely to comprehend, thus balancing broad exploration with effective exploitation. Extensive experiments show AMPO substantially outperforms a strong baseline (GRPO), with a 4.3% improvement on mathematical reasoning tasks and 12.2% on out-of-distribution tasks, while significantly boosting Pass@k performance and enabling more diverse exploration. Notably, using four peer-sized teachers, our method achieves comparable results to approaches that leverage a single, more powerful teacher (e.g., DeepSeek-R1) with more data. These results demonstrate a more efficient and scalable path to superior reasoning and generalizability. Our code is available at https://github.com/SII-Enigma/AMPO. Recent advances in long chain-of-thought (LongCoT) Jaech et al. (2024); Guo et al. (2025); Team et al. (2025) have endowed Large Language Models (LLMs) with remarkable complex reasoning capabilities. A key driver behind this progress is the Reinforcement Learning with V erifiable Rewards (RL VR) paradigm.
Testing the variety hypothesis
Lerario, A., Hoefgeest, P. Roos, Scolamiero, M., Tamai, A.
Given a probability measure on the unit disk, we study the problem of deciding whether, for some threshold probability, this measure is supported near a real algebraic variety of given dimension and bounded degree. We call this "testing the variety hypothesis". We prove an upper bound on the so-called "sample complexity" of this problem and show how it can be reduced to a semialgebraic decision problem. This is done by studying in a quantitative way the Hausdorff geometry of the space of real algebraic varieties of a given dimension and degree.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Generalist Reward Models: Found Inside Large Language Models
Li, Yi-Chen, Xu, Tian, Yu, Yang, Zhang, Xuqin, Chen, Xiong-Hui, Ling, Zhongxiang, Chao, Ningjing, Yuan, Lei, Zhou, Zhi-Hua
The alignment of Large Language Models (LLMs) is critically dependent on reward models trained on costly human preference data. While recent work explores bypassing this cost with AI feedback, these methods often lack a rigorous theoretical foundation. In this paper, we discover that a powerful generalist reward model is already latently present within any LLM trained via standard next-token prediction. We prove that this endogenous reward is not a heuristic, but is theoretically equivalent to a reward function learned through offline inverse reinforcement learning. This connection allows us to directly elicit a high-quality reward signal from a base (pre-trained or supervised fine-tuned) model without any further training. Critically, we also prove that subsequent reinforcement learning using this endogenous reward leads to a policy with a provably superior error bound compared to the base model. To our best knowledge, this is the first theoretical proof of the effectiveness of reinforcement learning for LLMs. Our experiments validate this theory, demonstrating that our method not only outperforms existing LLM-as-a-judge approaches but can also surpass explicitly trained reward models. These findings suggest that the reward modeling stage can be replaced by a principled method of eliciting the knowledge already captured during pre-training, heralding a more efficient, powerful, and scalable paradigm for LLMs alignment as well as multi-modal models.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Galois groups of polynomials and neurosymbolic networks
This project embarks on a journey to merge the abstract realm of Galois theory with the practical capabilities of machine learning This paper introduces a novel approach to understanding Galois (ML). Our goal is to harness ML's pattern recognition and prediction theory, one of the foundational areas of algebra, through the lens of abilities to address some of the most challenging aspects of Galois machine learning. By analyzing polynomial equations with machine theory, potentially revolutionizing our understanding and approach learning techniques, we aim to streamline the process of determining to polynomial solvability and related problems.
- North America > Mexico > Guanajuato (0.05)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Overview > Innovation (0.48)
- Research Report > Promising Solution (0.34)
MultiLingPoT: Enhancing Mathematical Reasoning with Multilingual Program Fine-tuning
Li, Nianqi, Liang, Zujie, Yuan, Siyu, Liang, Jiaqing, Wei, Feng, Xiao, Yanghua
Program-of-Thought (PoT), which aims to use programming language instead of natural language as an intermediate step in reasoning, is an important way for LLMs to solve mathematical problems. Since different programming languages excel in different areas, it is natural to use the most suitable language for solving specific problems. However, current PoT research only focuses on single language PoT, ignoring the differences between different programming languages. Therefore, this paper proposes an multilingual program reasoning method, MultiLingPoT. This method allows the model to answer questions using multiple programming languages by fine-tuning on multilingual data. Additionally, prior and posterior hybrid methods are used to help the model select the most suitable language for each problem. Our experimental results show that the training of MultiLingPoT improves each program's mathematical reasoning by about 2.5\%. Moreover, with proper mixing, the performance of MultiLingPoT can be further improved, achieving a 6\% increase compared to the single-language PoT with the data augmentation.Resources of this paper can be found at https://github.com/Nianqi-Li/MultiLingPoT.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Software > Programming Languages (0.99)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Nash equilibria in scalar discrete-time linear quadratic games
Salizzoni, Giulio, Ouhamma, Reda, Kamgarpour, Maryam
An open problem in linear quadratic (LQ) games has been characterizing the Nash equilibria. This problem has renewed relevance given the surge of work on understanding the convergence of learning algorithms in dynamic games. This paper investigates scalar discrete-time infinite-horizon LQ games with two agents. Even in this arguably simple setting, there are no results for finding $\textit{all}$ Nash equilibria. By analyzing the best response map, we formulate a polynomial system of equations characterizing the linear feedback Nash equilibria. This enables us to bring in tools from algebraic geometry, particularly the Gr\"obner basis, to study the roots of this polynomial system. Consequently, we can not only compute all Nash equilibria numerically, but we can also characterize their number with explicit conditions. For instance, we prove that the LQ games under consideration admit at most three Nash equilibria. We further provide sufficient conditions for the existence of at most two Nash equilibria and sufficient conditions for the uniqueness of the Nash equilibrium. Our numerical experiments demonstrate the tightness of our bounds and showcase the increased complexity in settings with more than two agents.
Bayesian Event Categorization Matrix Approach for Nuclear Detonations
Koermer, Scott, Carmichael, Joshua D., Williams, Brian J.
Current efforts to detect nuclear detonations and correctly categorize explosion sources with ground- and space-collected discriminants presents challenges that remain unaddressed by the Event Categorization Matrix (ECM) model. Smaller events (lower yield explosions) often include only sparse observations among few modalities and can therefore lack a complete set of discriminants. The covariance structures can also vary significantly between such observations of event (source-type) categories. Both obstacles are problematic for ``classic'' ECM. Our work addresses this gap and presents a Bayesian update to the previous ECM model, termed B-ECM, which can be trained on partial observations and does not rely on a pooled covariance structure. We further augment ECM with Bayesian Decision Theory so that false negative or false positive rates of an event categorization can be reduced in an intuitive manner. To demonstrate improved categorization rates with B-ECM, we compare an array of B-ECM and classic ECM models with multiple performance metrics that leverage Monte Carlo experiments. We use both synthetic and real data. Our B-ECM models show consistent gains in overall accuracy and a lower false negative rates relative to the classic ECM model. We propose future avenues to improve B-ECM that expand its decision-making and predictive capability.
- North America > United States (1.00)
- Asia (0.14)
- Europe > United Kingdom > England (0.14)
- Government > Military (1.00)
- Energy > Oil & Gas > Upstream (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Hierarchical Quadratic Random Forest Classifier
In this paper, we proposed a hierarchical quadratic random forest classifier for classifying multiresolution samples extracted from multichannel data. This forest incorporated a penalized multivariate linear discriminant in each of its decision nodes and processed squared features to realize quadratic decision boundaries in the original feature space. The penalized discriminant was based on a multiclass sparse discriminant analysis and the penalization was based on a group Lasso regularizer which was an intermediate between the Lasso and the ridge regularizer. The classification probabilities estimated by this forest and the features learned by its decision nodes could be used standalone or foster graph-based classifiers.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Taming AI Bots: Controllability of Neural States in Large Language Models
Soatto, Stefano, Tabuada, Paulo, Chaudhari, Pratik, Liu, Tian Yu
We tackle the question of whether an agent can, by suitable choice of prompts, control an AI bot to any state. To that end, we first introduce a formal definition of ``meaning'' that is amenable to analysis. Then, we characterize ``meaningful data'' on which large language models (LLMs) are ostensibly trained, and ``well-trained LLMs'' through conditions that are largely met by today's LLMs. While a well-trained LLM constructs an embedding space of meanings that is Euclidean, meanings themselves do not form a vector (linear) subspace, but rather a quotient space within. We then characterize the subset of meanings that can be reached by the state of the LLMs for some input prompt, and show that a well-trained bot can reach any meaning albeit with small probability. We then introduce a stronger notion of controllability as {\em almost certain reachability}, and show that, when restricted to the space of meanings, an AI bot is controllable. We do so after introducing a functional characterization of attentive AI bots, and finally derive necessary and sufficient conditions for controllability. The fact that AI bots are controllable means that an adversary could steer them towards any state. However, the sampling process can be designed to counteract adverse actions and avoid reaching undesirable regions of state space before their boundary is crossed.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)