Goto

Collaborating Authors

 transposition



Onrankingviasortingbyestimatedexpectedutility

Neural Information Processing Systems

Since utilities can serveas target values to learn the scoring function through square loss regression, the optimality ofsorting byexpected utilities isequivalent tothe consistencyofregression.


On the Statistical Query Complexity of Learning Semiautomata: a Random Walk Approach

Giapitzakis, George, Fountoulakis, Kimon, Nichani, Eshaan, Lee, Jason D.

arXiv.org Artificial Intelligence

Semiautomata form a rich class of sequence-processing algorithms with applications in natural language processing, robotics, computational biology, and data mining. We establish the first Statistical Query hardness result for semiautomata under the uniform distribution over input words and initial states. We show that Statistical Query hardness can be established when both the alphabet size and input length are polynomial in the number of states. Unlike the case of deterministic finite automata, where hardness typically arises through the hardness of the language they recognize (e.g., parity), our result is derived solely from the internal state-transition structure of semiautomata. Our analysis reduces the task of distinguishing the final states of two semiautomata to studying the behavior of a random walk on the group $S_{N} \times S_{N}$. By applying tools from Fourier analysis and the representation theory of the symmetric group, we obtain tight spectral gap bounds, demonstrating that after a polynomial number of steps in the number of states, distinct semiautomata become nearly uncorrelated, yielding the desired hardness result.


On ranking via sorting by estimated expected utility

Neural Information Processing Systems

This paper addresses the question of which of these tasks are asymptotically solved by sorting by decreasing order of expected utility, for some suitable notion of utility, or, equivalently, when is square loss regression consistent for ranking via score-and-sort?


CayleyPy Growth: Efficient growth computations and hundreds of new conjectures on Cayley graphs (Brief version)

Chervov, A., Fedoriaka, D., Konstantinova, E., Naumov, A., Kiselev, I., Sheveleva, A., Koltsov, I., Lytkin, S., Smolensky, A., Soibelman, A., Levkovich-Maslyuk, F., Grimov, R., Volovich, D., Isakov, A., Kostin, A., Litvinov, M., Vilkin-Krom, N., Bidzhiev, A., Krasnyi, A., Evseev, M., Geraseva, E., Grunwald, L., Galkin, S., Koldunov, E., Diner, S., Chevychelov, A., Kudasheva, E., Sychev, A., Kravchenko, A., Kogan, Z., Natyrova, A., Shishina, L., Cheldieva, L., Zamkovoy, V., Kovalenko, D., Papulov, O., Kudashev, S., Shiltsov, D., Turtayev, R., Nikitina, O., Mamayeva, D., Nikolenko, S., Obozov, M., Titarenko, A., Dolgorukova, A., Aparnev, A., Debeaupuis, O., C., S. Alami, Isambert, H.

arXiv.org Artificial Intelligence

This is the third paper of the CayleyPy project applying artificial intelligence to problems in group theory. We announce the first public release of CayleyPy, an open source Python library for computations with Cayley and Schreier graphs. Compared with systems such as GAP and Sage, CayleyPy handles much larger graphs and performs several orders of magnitude faster. Using CayleyPy we obtained about 200 new conjectures on Cayley and Schreier graphs, focused on diameters and growth. For many Cayley graphs of symmetric groups Sn we observe quasi polynomial diameter formulas: a small set of quadratic or linear polynomials indexed by n mod s. We conjecture that this is a general phenomenon, giving efficient diameter computation despite the problem being NP hard. We propose a refinement of the Babai type conjecture on diameters of Sn: n^2/2 + 4n upper bounds in the undirected case, compared to previous O(n^2) bounds. We also provide explicit generator families, related to involutions in a square with whiskers pattern, conjectured to maximize the diameter; search confirms this for all n up to 15. We further conjecture an answer to a question posed by V M Glushkov in 1968 on directed Cayley graphs generated by a cyclic shift and a transposition. For nilpotent groups we conjecture an improvement of J S Ellenberg's results on upper unitriangular matrices over Z/pZ, showing linear dependence of diameter on p. Moreover. Some conjectures are LLM friendly, naturally stated as sorting problems verifiable by algorithms or Python code. To benchmark path finding we created more than 10 Kaggle datasets. CayleyPy works with arbitrary permutation or matrix groups and includes over 100 predefined generators. Our growth computation code outperforms GAP and Sage up to 1000 times in speed and size.


Checklist

Neural Information Processing Systems

Do the main claims made in the abstract and introduction accurately reflect the paper's Did you describe the limitations of your work? Did you discuss any potential negative societal impacts of your work? Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're We start with a lemma that we will repeatedly use in the proofs. If has only one infoset, then Lemma 11 applies.


FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs

Dege, Pengcuo, Luo, Qiuming, Mao, Rui, Kong, Chang

arXiv.org Artificial Intelligence

Efficient inference of Multi-Head Latent Attention (MLA) is challenged by deploying the DeepSeek-R1 671B model on a single Multi-GPU server. This paper introduces FlashMLA-ETAP, a novel framework that enhances MLA inference for the single-instance deployment scenario on NVIDIA H20 GPUs. We propose the Efficient Transpose Attention Pipeline (ETAP), which reconfigures attention computation through transposition to align the KV context length with the \(M\)-dimension in WGMMA operations, significantly reducing redundant computations. FlashMLA-ETAP achieves a 2.78x speedup over FlashMLA at 64K sequence length (batch size 16), with 5.24x and 4.94x improvements over FlashAttention-3 and FlashInfer, respectively, while maintaining numerical stability with a 15.2x lower RMSE (\(1.25 \times 10^{-5}\)) than FlashAttention-3. Furthermore, ETAP's design enables seamless integration into frameworks like FlashAttention-3 and FlashInfer, supported by a detailed theoretical analysis. Our work addresses a critical gap in resource-constrained inference, offering a scalable solution for mid-tier GPUs and paving the way for broader adoption in hardware-aware optimization. Code is available at https://github.com/pengcuo/FlashMLA-ETAP.


Synchronizing Process Model and Event Abstraction for Grounded Process Intelligence (Extended Version)

Benzin, Janik-Vasily, Park, Gyunam, Rinderle-Ma, Stefanie

arXiv.org Artificial Intelligence

Model abstraction (MA) and event abstraction (EA) are means to reduce complexity of (discovered) models and event data. Imagine a process intelligence project that aims to analyze a model discovered from event data which is further abstracted, possibly multiple times, to reach optimality goals, e.g., reducing model size. So far, after discovering the model, there is no technique that enables the synchronized abstraction of the underlying event log. This results in loosing the grounding in the real-world behavior contained in the log and, in turn, restricts analysis insights. Hence, in this work, we provide the formal basis for synchronized model and event abstraction, i.e., we prove that abstracting a process model by MA and discovering a process model from an abstracted event log yields an equivalent process model. We prove the feasibility of our approach based on behavioral profile abstraction as non-order preserving MA technique, resulting in a novel EA technique.


Learning the symmetric group: large from small

Petschack, Max, Garbali, Alexandr, de Gier, Jan

arXiv.org Artificial Intelligence

Machine learning explorations can make significant inroads into solving difficult problems in pure mathematics. One advantage of this approach is that mathematical datasets do not suffer from noise, but a challenge is the amount of data required to train these models and that this data can be computationally expensive to generate. Key challenges further comprise difficulty in a posteriori interpretation of statistical models and the implementation of deep and abstract mathematical problems. We propose a method for scalable tasks, by which models trained on simpler versions of a task can then generalize to the full task. Specifically, we demonstrate that a transformer neural-network trained on predicting permutations from words formed by general transpositions in the symmetric group $S_{10}$ can generalize to the symmetric group $S_{25}$ with near 100\% accuracy. We also show that $S_{10}$ generalizes to $S_{16}$ with similar performance if we only use adjacent transpositions. We employ identity augmentation as a key tool to manage variable word lengths, and partitioned windows for training on adjacent transpositions. Finally we compare variations of the method used and discuss potential challenges with extending the method to other tasks.


Expected Return Symmetries

Muglich, Darius, Forkel, Johannes, van der Pol, Elise, Foerster, Jakob

arXiv.org Artificial Intelligence

Symmetry is an important inductive bias that can improve model robustness and generalization across many deep learning domains. In multi-agent settings, a priori known symmetries have been shown to address a fundamental coordination failure mode known as mutually incompatible symmetry breaking; e.g. in a game where two independent agents can choose to move "left'' or "right'', and where a reward of +1 or -1 is received when the agents choose the same action or different actions, respectively. However, the efficient and automatic discovery of environment symmetries, in particular for decentralized partially observable Markov decision processes, remains an open problem. Furthermore, environmental symmetry breaking constitutes only one type of coordination failure, which motivates the search for a more accessible and broader symmetry class. In this paper, we introduce such a broader group of previously unexplored symmetries, which we call expected return symmetries, which contains environment symmetries as a subgroup. We show that agents trained to be compatible under the group of expected return symmetries achieve better zero-shot coordination results than those using environment symmetries. As an additional benefit, our method makes minimal a priori assumptions about the structure of their environment and does not require access to ground truth symmetries.