Goto

Collaborating Authors

 graph theory




SKYLENAGE Technical Report: Mathematical Reasoning and Contest-Innovation Benchmarks for Multi-Level Math Evaluation

Wei, Hu, Xu, Ze, Yang, Boyu, Miao, Linlin, Zhai, Weiqi, Li, Yihan, Li, Zixuan, Wang, Zhijun, Wang, Boya, Yu, Jianwei, Yuan, Jialing, Zhang, Xiaoyue, He, Cheng, Chen, Minglei, Zhang, Zifan, Li, Qianhui, Wang, Wei, Xu, Xiang

arXiv.org Artificial Intelligence

Large language models (LLMs) now perform strongly on many public math suites, yet frontier separation within mathematics increasingly suffers from ceiling effects. We present two complementary benchmarks: SKYLENAGE-ReasoningMATH, a 100-item, structure-aware diagnostic set with per-item metadata on length, numeric density, and symbolic complexity; and SKYLENAGE-MATH, a 150-item contest-style suite spanning four stages from high school to doctoral under a seven-subject taxonomy. We evaluate fifteen contemporary LLM variants under a single setup and analyze subject x model and grade x model performance. On the contest suite, the strongest model reaches 44% while the runner-up reaches 37%; accuracy declines from high school to doctoral, and top systems exhibit a doctoral-to-high-school retention near 79%. On the reasoning set, the best model attains 81% overall, and hardest-slice results reveal clear robustness gaps between leaders and the mid-tier. In summary, we release SKYLENAGE-ReasoningMATH and report aggregate results for SKYLENAGE-MATH; together, SKYLENAGE provides a hard, reasoning-centered and broadly covering math benchmark with calibrated difficulty and rich metadata, serving as a reference benchmark for future evaluations of mathematical reasoning.


Reinforcement learning for graph theory, Parallelizing Wagner's approach

Bouffard, Alix, Breen, Jane

arXiv.org Artificial Intelligence

Our work applies reinforcement learning to construct counterexamples concerning conjectured bounds on the spectral radius of the Laplacian matrix of a graph. We expand upon the re-implementation of Wagnar's approach by Stevanovic et al. with the ability to train numerous unique models simultaneously and a novel redefining of the action space to adjust the influence of the current local optimum on the learning process.


Physics-Informed EvolveGCN: Satellite Prediction for Multi Agent Systems

Huber, Timothy Jacob, Tiwari, Madhur, Riano-Rios, Camilo A.

arXiv.org Artificial Intelligence

In the rapidly evolving domain of autonomous systems, interaction among agents within a shared environment is both inevitable and essential for enhancing overall system capabilities. A key requirement in such multi-agent systems is the ability of each agent to reliably predict the future positions of its nearest neighbors. Traditionally, graphs and graph theory have served as effective tools for modeling inter agent communication and relationships. While this approach is widely used, the present work proposes a novel method that leverages dynamic graphs in a forward looking manner. Specifically, the employment of EvolveGCN, a dynamic graph convolutional network, to forecast the evolution of inter-agent relationships over time. To improve prediction accuracy and ensure physical plausibility, this research incorporates physics constrained loss functions based on the Clohessy-Wiltshire equations of motion. This integrated approach enhances the reliability of future state estimations in multi-agent scenarios.


Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy

Taghanaki, Saeid Asgari, Monteiro, Joao

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated remarkable proficiency in generating detailed and coherent explanations of complex concepts. However, the extent to which these models truly comprehend the concepts they articulate remains unclear. To assess the level of comprehension of a model relative to the content it generates, we implemented a self-evaluation pipeline where models: (i) given a topic generate an excerpt with information about the topic, (ii) given an excerpt generate question-answer pairs, and finally (iii) given a question generate an answer. We refer to this self-evaluation approach as Explain-Query-Test (EQT). Interestingly, the accuracy on generated questions resulting from running the EQT pipeline correlates strongly with the model performance as verified by typical benchmarks such as MMLU-Pro. In other words, EQT's performance is predictive of MMLU-Pro's, and EQT can be used to rank models without the need for any external source of evaluation data other than lists of topics of interest. Moreover, our results reveal a disparity between the models' ability to produce detailed explanations and their performance on questions related to those explanations. This gap highlights fundamental limitations in the internal knowledge representation and reasoning abilities of current LLMs. We release the code at https://github.com/asgsaeid/EQT.


The \emph{Optimist}: Towards Fully Automated Graph Theory Research

Davila, Randy

arXiv.org Artificial Intelligence

This paper introduces the \emph{Optimist}, an autonomous system developed to advance automated conjecture generation in graph theory. Leveraging mixed-integer programming (MIP) and heuristic methods, the \emph{Optimist} generates conjectures that both rediscover established theorems and propose novel inequalities. Through a combination of memory-based computation and agent-like adaptability, the \emph{Optimist} iteratively refines its conjectures by integrating new data, enabling a feedback process with minimal human (\emph{or machine}) intervention. Initial experiments reveal the \emph{Optimist}'s potential to uncover foundational results in graph theory, as well as to produce conjectures of interest for future exploration. This work also outlines the \emph{Optimist}'s evolving integration with a counterpart agent, the \emph{Pessimist} (a human \emph{or machine} agent), to establish a dueling system that will drive fully automated graph theory research.


Automated conjecturing in mathematics with \emph{TxGraffiti}

Davila, Randy

arXiv.org Artificial Intelligence

\emph{TxGraffiti} is a data-driven, heuristic-based computer program developed to automate the process of generating conjectures across various mathematical domains. Since its creation in 2017, \emph{TxGraffiti} has contributed to numerous mathematical publications, particularly in graph theory. In this paper, we present the design and core principles of \emph{TxGraffiti}, including its roots in the original \emph{Graffiti} program, which pioneered the automation of mathematical conjecturing. We describe the data collection process, the generation of plausible conjectures, and methods such as the \emph{Dalmatian} heuristic for filtering out redundant or transitive conjectures. Additionally, we highlight its contributions to the mathematical literature and introduce a new web-based interface that allows users to explore conjectures interactively. While we focus on graph theory, the techniques demonstrated extend to other areas of mathematics.


Extracting and Validating Explanatory Word Archipelagoes using Dual Entropy

Ohsawa, Yukio

arXiv.org Artificial Intelligence

The logical connectivity of text is represented by the connectivity of words that form archipelagoes. Here, each archipelago is a sequence of islands of the occurrences of a certain word. An island here means the local sequence of sentences where the word is emphasized, and an archipelago of a length comparable to the target text is extracted using the co-variation of entropy A (the window-based entropy) on the distribution of the word's occurrences with the width of each time window. Then, the logical connectivity of text is evaluated on entropy B (the graph-based entropy) computed on the distribution of sentences to connected word-clusters obtained on the co-occurrence of words. The results show the parts of the target text with words forming archipelagoes extracted on entropy A, without learned or prepared knowledge, form an explanatory part of the text that is of smaller entropy B than the parts extracted by the baseline methods.


Artificial intelligence and machine learning generated conjectures with TxGraffiti

Davila, Randy

arXiv.org Artificial Intelligence

The ability of carefully designed computer programs to generate meaningful mathematical conjectures has been demonstrated since the late 1980s, notably by Fajtlowicz's GRAFFITI program [23]. Indeed, this heuristic-based program was the first artificial intelligence to make significant conjectures in matrices, number theory, and graph theory, attracting the attention of renowned mathematicians like Paul Erdős, Ronald Graham, and Odile Favaron. Inspired by the pioneering work of Fajtlowicz, and by interactions with mathematicians who considered conjectures of GRAFFITI, we developed the TxGraffiti program, a modern conjecturing artificial intelligence named in homage to this rich history of conjectures made by GRAFFITI and now available as an interactive website.