Goto

Collaborating Authors

 comparability


Invariant Price of Anarchy: a Metric for Welfarist Traffic Control

Shilov, Ilia, He, Mingjia, Nax, Heinrich H., Frazzoli, Emilio, Zardini, Gioele, Bolognani, Saverio

arXiv.org Artificial Intelligence

The Price of Anarchy (PoA) is a standard metric for quantifying inefficiency in socio-technical systems, widely used to guide policies like traffic tolling. Conventional PoA analysis relies on exact numerical costs. However, in many settings, costs represent agents' preferences and may be defined only up to possibly arbitrary scaling and shifting, representing informational and modeling ambiguities. We observe that while such transformations preserve equilibrium and optimal outcomes, they change the PoA value. To resolve this issue, we rely on results from Social Choice Theory and define the Invariant PoA. By connecting admissible transformations to degrees of comparability of agents' costs, we derive the specific social welfare functions which ensure that efficiency evaluations do not depend on arbitrary rescalings or translations of individual costs. Case studies on a toy example and the Zurich network demonstrate that identical tolling strategies can lead to substantially different efficiency estimates depending on the assumed comparability. Our framework thus demonstrates that explicit axiomatic foundations are necessary in order to define efficiency metrics and to appropriately guide policy in large-scale infrastructure design robustly and effectively.



Simulating Student Success in the Age of GenAI: A Kantian-Axiomatic Perspective

Kayadibi, Seyma Yaman

arXiv.org Artificial Intelligence

This study reinterprets a Monte Carlo simulation of students' perceived success with generative AI (GenAI) through a Kantian-axiomatic lens. Building on prior work, theme-level survey statistics Ease of Use and Learnability, System Efficiency and Learning Burden, and Perceived Complexity and Integration from a representative dataset are used to generate 10,000 synthetic scores per theme on the [1,5] Likert scale. The simulated outputs are evaluated against the axioms of dense linear order without endpoints (DLO): irreflexivity, transitivity, total comparability (connectedness), no endpoints (no greatest and no least; A4-A5), and density (A6). At the data level, the basic ordering axioms (A1-A3) are satisfied, whereas no-endpoints (A4-A5) and density (A6) fail as expected. Likert clipping introduces minimum and maximum observed values, and a finite, discretized sample need not contain a value strictly between any two distinct scores. These patterns are read not as methodological defects but as markers of an epistemological boundary. Following Kant and Friedman, the findings suggest that what simulations capture finite, quantized observations cannot instantiate the ideal properties of an unbounded, dense continuum. Such properties belong to constructive intuition rather than to finite sampling alone. A complementary visualization contrasts the empirical histogram with a sine-curve proxy to clarify this divide. The contribution is interpretive rather than data-expansive: it reframes an existing simulation as a probe of the synthetic a priori structure underlying students' perceptions, showing how formal order-theoretic coherence coexists with principled failures of endpoint-freeness and density in finite empirical models.


Memory in Large Language Models: Mechanisms, Evaluation and Evolution

Zhang, Dianxing, Li, Wendong, Song, Kani, Lu, Jiaye, Li, Gang, Yang, Liuchun, Li, Sheng

arXiv.org Artificial Intelligence

Under a unified operational definition, we define LLM memory as a persistent state written during pretraining, finetuning, or inference that can later be addressed and that stably influences outputs. We propose a four-part taxonomy (parametric, contextual, external, procedural/episodic) and a memory quadruple (location, persistence, write/access path, controllability). We link mechanism, evaluation, and governance via the chain write -> read -> inhibit/update. To avoid distorted comparisons across heterogeneous setups, we adopt a three-setting protocol (parametric only, offline retrieval, online retrieval) that decouples capability from information availability on the same data and timeline. On this basis we build a layered evaluation: parametric (closed-book recall, edit differential, memorization/privacy), contextual (position curves and the mid-sequence drop), external (answer correctness vs snippet attribution/faithfulness), and procedural/episodic (cross-session consistency and timeline replay, E MARS+). The framework integrates temporal governance and leakage auditing (freshness hits, outdated answers, refusal slices) and uncertainty reporting via inter-rater agreement plus paired tests with multiple-comparison correction. For updating and forgetting, we present DMM Gov: coordinating DAPT/TAPT, PEFT, model editing (ROME, MEND, MEMIT, SERAC), and RAG to form an auditable loop covering admission thresholds, rollout, monitoring, rollback, and change audits, with specs for timeliness, conflict handling, and long-horizon consistency. Finally, we give four testable propositions: minimum identifiability; a minimal evaluation card; causally constrained editing with verifiable forgetting; and when retrieval with small-window replay outperforms ultra-long-context reading. This yields a reproducible, comparable, and governable coordinate system for research and deployment.


Learning with Algorithmic Supervision via Continuous Relaxations

Neural Information Processing Systems

Felix Petersen Christian Borgelt Hilde Kuehne Oliver Deussen In the supplementary material, we give implementation details, and present the algorithms. For comparability to Grover et al. [6] and Cuturi et al. [7], we use the same network architecture.


Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning

Bleich, Amnon, Linnemann, Antje, Diem, Bjoern H., Conrad, Tim OF

arXiv.org Artificial Intelligence

Recent advances in deep learning and natural language generation have significantly improved image captioning, enabling automated, human-like descriptions for visual content. In this work, we apply these captioning techniques to generate clinician-like interpretations of ECG data. This study leverages existing ECG datasets accompanied by free-text reports authored by healthcare professionals (HCPs) as training data. These reports, while often inconsistent, provide a valuable foundation for automated learning. We introduce an encoder-decoder-based method that uses these reports to train models to generate detailed descriptions of ECG episodes. This represents a significant advancement in ECG analysis automation, with potential applications in zero-shot classification and automated clinical decision support. The model is tested on various datasets, including both 1- and 12-lead ECGs. It significantly outperforms the state-of-the-art reference model by Qiu et al., achieving a METEOR score of 55.53% compared to 24.51% achieved by the reference model. Furthermore, several key design choices are discussed, providing a comprehensive overview of current challenges and innovations in this domain. The source codes for this research are publicly available in our Git repository https://git.zib.de/ableich/ecg-comment-generation-public


JAM: A Comprehensive Model for Age Estimation, Verification, and Comparability

David, François, Novikov, Alexey A., Parkhomenko, Ruslan, Voronin, Artem, Melchy, Alix

arXiv.org Artificial Intelligence

This paper introduces a comprehensive model for age estimation, verification, and comparability, offering a comprehensive solution for a wide range of applications. It employs advanced learning techniques to understand age distribution and uses confidence scores to create probabilistic age ranges, enhancing its ability to handle ambiguous cases. The model has been tested on both proprietary and public datasets and compared against one of the top-performing models in the field. Additionally, it has recently been evaluated by NIST as part of the FATE challenge, achieving top places in many categories.


Prasatul Matrix: A Direct Comparison Approach for Analyzing Evolutionary Optimization Algorithms

Biswas, Anupam

arXiv.org Artificial Intelligence

The performance of individual evolutionary optimization algorithms is mostly measured in terms of statistics such as mean, median and standard deviation etc., computed over the best solutions obtained with few trails of the algorithm. To compare the performance of two algorithms, the values of these statistics are compared instead of comparing the solutions directly. This kind of comparison lacks direct comparison of solutions obtained with different algorithms. For instance, the comparison of best solutions (or worst solution) of two algorithms simply not possible. Moreover, ranking of algorithms is mostly done in terms of solution quality only, despite the fact that the convergence of algorithm is also an important factor. In this paper, a direct comparison approach is proposed to analyze the performance of evolutionary optimization algorithms. A direct comparison matrix called \emph{Prasatul Matrix} is prepared, which accounts direct comparison outcome of best solutions obtained with two algorithms for a specific number of trials. Five different performance measures are designed based on the prasatul matrix to evaluate the performance of algorithms in terms of Optimality and Comparability of solutions. These scores are utilized to develop a score-driven approach for comparing performance of multiple algorithms as well as for ranking both in the grounds of solution quality and convergence analysis. Proposed approach is analyzed with six evolutionary optimization algorithms on 25 benchmark functions. A non-parametric statistical analysis, namely Wilcoxon paired sum-rank test is also performed to verify the outcomes of proposed direct comparison approach.


Probabilistic Forecasting Methods for System-Level Electricity Load Forecasting

Giese, Philipp

arXiv.org Artificial Intelligence

Load forecasts have become an integral part of energy security. Due to the various influencing factors that can be considered in such a forecast, there is also a wide range of models that attempt to integrate these parameters into a system in various ways. Due to the growing importance of probabilistic load forecast models, different approaches are presented in this analysis. The focus is on different models from the short-term sector. After that, another model from the long-term sector is presented. Then, the presented models are put in relation to each other and examined with reference to advantages and disadvantages. Afterwards, the presented papers are analyzed with focus on their comparability to each other. Finally, an outlook on further areas of development in the literature will be discussed.


Exemplar-free Class Incremental Learning via Discriminative and Comparable One-class Classifiers

Sun, Wenju, Li, Qingyong, Zhang, Jing, Wang, Danyu, Wang, Wen, Geng, Yangli-ao

arXiv.org Artificial Intelligence

The exemplar-free class incremental learning requires classification models to learn new class knowledge incrementally without retaining any old samples. Recently, the framework based on parallel one-class classifiers (POC), which trains a one-class classifier (OCC) independently for each category, has attracted extensive attention, since it can naturally avoid catastrophic forgetting. POC, however, suffers from weak discriminability and comparability due to its independent training strategy for different OOCs. To meet this challenge, we propose a new framework, named Discriminative and Comparable One-class classifiers for Incremental Learning (DisCOIL). DisCOIL follows the basic principle of POC, but it adopts variational auto-encoders (VAE) instead of other well-established one-class classifiers (e.g. deep SVDD), because a trained VAE can not only identify the probability of an input sample belonging to a class but also generate pseudo samples of the class to assist in learning new tasks. With this advantage, DisCOIL trains a new-class VAE in contrast with the old-class VAEs, which forces the new-class VAE to reconstruct better for new-class samples but worse for the old-class pseudo samples, thus enhancing the comparability. Furthermore, DisCOIL introduces a hinge reconstruction loss to ensure the discriminability. We evaluate our method extensively on MNIST, CIFAR10, and Tiny-ImageNet. The experimental results show that DisCOIL achieves state-of-the-art performance.