Decision Tree Learning
Machines Learn Number Fields, But How? The Case of Galois Groups
By applying interpretable machine learning methods such as decision trees, we study how simple models can classify the Galois groups of Galois extensions over $\mathbb{Q}$ of degrees 4, 6, 8, 9, and 10, using Dedekind zeta coefficients. Our interpretation of the machine learning results allows us to understand how the distribution of zeta coefficients depends on the Galois group, and to prove new criteria for classifying the Galois groups of these extensions. Combined with previous results, this work provides another example of a new paradigm in mathematical research driven by machine learning.
A Generic Complete Anytime Beam Search for Optimal Decision Tree
Kiossou, Harold Silvรจre, Nijssen, Siegfried, Schaus, Pierre
Finding an optimal decision tree that minimizes classification error is known to be NP-hard. While exact algorithms based on MILP, CP, SAT, or dynamic programming guarantee optimality, they often suffer from poor anytime behavior -- meaning they struggle to find high-quality decision trees quickly when the search is stopped before completion -- due to unbalanced search space exploration. To address this, several anytime extensions of exact methods have been proposed, such as LDS-DL8.5, Top-k-DL8.5, and Blossom, but they have not been systematically compared, making it difficult to assess their relative effectiveness. In this paper, we propose CA-DL8.5, a generic, complete, and anytime beam search algorithm that extends the DL8.5 framework and unifies some existing anytime strategies. In particular, CA-DL8.5 generalizes previous approaches LDS-DL8.5 and Top-k-DL8.5, by allowing the integration of various heuristics and relaxation mechanisms through a modular design. The algorithm reuses DL8.5's efficient branch-and-bound pruning and trie-based caching, combined with a restart-based beam search that gradually relaxes pruning criteria to improve solution quality over time. Our contributions are twofold: (1) We introduce this new generic framework for exact and anytime decision tree learning, enabling the incorporation of diverse heuristics and search strategies; (2) We conduct a rigorous empirical comparison of several instantiations of CA-DL8.5 -- based on Purity, Gain, Discrepancy, and Top-k heuristics -- using an anytime evaluation metric called the primal gap integral. Experimental results on standard classification benchmarks show that CA-DL8.5 using LDS (limited discrepancy) consistently provides the best anytime performance, outperforming both other CA-DL8.5 variants and the Blossom algorithm while maintaining completeness and optimality guarantees.
A Novel Architecture for Symbolic Reasoning with Decision Trees and LLM Agents
We propose a hybrid architecture that integrates decision tree-based symbolic reasoning with the generative capabilities of large language models (LLMs) within a coordinated multi-agent framework. Unlike prior approaches that loosely couple symbolic and neural modules, our design embeds decision trees and random forests as callable oracles within a unified reasoning system. Tree-based modules enable interpretable rule inference and causal logic, while LLM agents handle abductive reasoning, generalization, and interactive planning. A central orchestrator maintains belief state consistency and mediates communication across agents and external tools, enabling reasoning over both structured and unstructured inputs. The system achieves strong performance on reasoning benchmarks. On \textit{ProofWriter}, it improves entailment consistency by +7.2\% through logic-grounded tree validation. On GSM8k, it achieves +5.3\% accuracy gains in multistep mathematical problems via symbolic augmentation. On \textit{ARC}, it boosts abstraction accuracy by +6.0\% through integration of symbolic oracles. Applications in clinical decision support and scientific discovery show how the system encodes domain rules symbolically while leveraging LLMs for contextual inference and hypothesis generation. This architecture offers a robust, interpretable, and extensible solution for general-purpose neuro-symbolic reasoning.
Canoe Paddling Quality Assessment Using Smart Devices: Preliminary Machine Learning Study
Parab, S., Lamelas, A., Hassan, A., Bhote, P.
Over 22 million Americans participate in paddling-related activities annually, contributing to a global paddlesports market valued at 2.4 billion US dollars in 2020. Despite its popularity, the sport has seen limited integration of machine learning (ML) and remains hindered by the cost of coaching and specialized equipment. This study presents a novel AI-based coaching system that uses ML models trained on motion data and delivers stroke feedback via a large language model (LLM). Participants were recruited through a collaboration with the NYU Concrete Canoe Team. Motion data were collected across two sessions, one with suboptimal form and one with corrected technique, using Apple Watches and smartphones secured in sport straps. The data underwent stroke segmentation and feature extraction. ML models, including Support Vector Classifier, Random Forest, Gradient Boosting, and Extremely Randomized Trees, were trained on both raw and engineered features. A web based interface was developed to visualize stroke quality and deliver LLM-based feedback. Across four participants, eight trials yielded 66 stroke samples. The Extremely Randomized Tree model achieved the highest performance with an F score of 0.9496 under five fold cross validation. The web interface successfully provided both quantitative metrics and qualitative feedback. Sensor placement near the wrists improved data quality. Preliminary results indicate that smartwatches and smartphones can enable low cost, accessible alternatives to traditional paddling instruction. While limited by sample size, the study demonstrates the feasibility of using consumer devices and ML to support stroke refinement and technique improvement.
Quantum Semi-Random Forests for Qubit-Efficient Recommender Systems
Alavi, Azadeh, Kouchmeshki, Fatemeh, Alavi, Abdolrahman, Ren, Yongli, Niu, Jiayang
First and second authors contributed equally to this work Abstract --Modern recommenders describe each item with hundreds of sparse semantic tags, yet most quantum pipelines still map one qubit per tag, demanding well beyond one hundred qubits, far out of reach for current noisy-intermediate-scale quantum (NISQ) devices and prone to deep, error-amplifying circuits. We close this gap with a three-stage hybrid machine learning algorithm that compresses tag profiles, optimizes feature selection under a fixed qubit budget via QAOA, and scores recommendations with a Quantum semi-Random Forest (QsRF) built on just five qubits, while performing similarly to the state-of-the-art methods. Leveraging SVD sketching and k-means, we learn a 1 000-atom dictionary ( >97 % variance), then solve a 20 20 QUBO via depth-3 QAOA to select 5 atoms. A 100-tree QsRF trained on these codes matches full-feature baselines on ICM-150/500. To compress this combinatorial explosion, recent hybrid pipelines formulate feature selection as a Q uadratic U nconstrained Binary O ptimisation (QUBO) and delegate the search to quantum annealers [1], [2] or shallow gate-based circuits [3].
A Machine Learning Approach for Honey Adulteration Detection using Mineral Element Profiles
Al-Awadhi, Mokhtar A., Deshmukh, Ratnadeep R.
This paper aims to develop a Machin e Learning (ML) - based system for detecting honey adulteration utilizing honey mineral element profiles. The proposed system comprises two phases: preprocessing and classification. The preprocessing phase involves the treatment of missing - value attributes a nd normalization. In the classification phase, we use three supervised ML models: logistic regression, d ecision tree, and random forest, to discriminate between authentic and adulterated honey. To evaluate the performance of the ML models, we use a public dataset comprising measurements of mineral element content of authentic honey, sugar syrups, and adulterated honey. Experimental findings show that mineral element content in honey provides robust discriminative information for detecting honey adulteration . Results also dem onstrate that the random forest - based classifier outperforms other classifiers on this dataset, achieving the highest cross - validation accuracy of 98.37%.
VAR: Visual Analysis for Rashomon Set of Machine Learning Models' Performance
Evaluating the performance of closely matched machine learning(ML) models under specific conditions has long been a focus of researchers in the field of machine learning. The Rashomon set is a collection of closely matched ML models, encompassing a wide range of models with similar accuracies but different structures. Traditionally, the analysis of these sets has focused on vertical structural analysis, which involves comparing the corresponding features at various levels within the ML models. However, there has been a lack of effective visualization methods for horizontally comparing multiple models with specific features. We propose the VAR visualization solution. VAR uses visualization to perform comparisons of ML models within the Rashomon set. This solution combines heatmaps and scatter plots to facilitate the comparison. With the help of VAR, ML model developers can identify the optimal model under specific conditions and better understand the Rashomon set's overall characteristics.