Goto

Collaborating Authors

 dts


Automatically Learning Hybrid Digital Twins of Dynamical Systems

Neural Information Processing Systems

Digital Twins (DTs) are computational models that simulate the states and temporal dynamics of real-world systems, playing a crucial role in prediction, understanding, and decision-making across diverse domains. However, existing approaches to DTs often struggle to generalize to unseen conditions in data-scarce settings, a crucial requirement for such models. To address these limitations, our work begins by establishing the essential desiderata for effective DTs.


A Comprehensive Survey on Surgical Digital Twin

Khan, Afsah Sharaf, Fan, Falong, Kim, Doohwan DH, Alshareef, Abdurrahman, Chen, Dong, Kim, Justin, Carter, Ernest, Liu, Bo, Rozenblit, Jerzy W., Zeigler, Bernard

arXiv.org Artificial Intelligence

Such models are integral to the development of context-aware surgical training systems and process monitoring platforms [11], [19] as well as for encoding adaptive robotic control policies in teleoperated environments [13], [20], [78]. However, their limited capacity to capture continuous biophysical dynamics can constrain their utility in applications where physiological fidelity is essential. Recognizing the limitations inherent in purely continuous or discrete approaches, hybrid modeling strategies have emerged as a state-of-the-art solution for surgical digital twins. These frameworks integrate continuous dynamic models with discrete state machines, enabling the simultaneous tracking of physiological changes and procedural events [8], [7], [19], [37]. For example, hybrid automata have been deployed to synchronize real-time updates of tissue deformation with the sequencing of surgical tool actions [7], [19]. This integration allows digital twins to provide context-sensitive support, adapting to abrupt workflow transitions and physiological perturbations alike--a critical requirement in both routine and emergent surgical scenarios [8], [11], [7]. B. Mutual Information and Information-Theoretic Approaches With the proliferation of multi-modal surgical data, information-theoretic concepts have become indispensable for quantifying uncertainty, relevance, and redundancy across heterogeneous information streams. Mutual information I(X; Y) has been adopted as a rigorous metric for selecting the most informative sensors, imaging modalities, or clinical parameters, thereby enhancing the efficiency and robustness of digital twin-enabled decision support [2], [3], [13], [34], [11], [51], [48], [26], [29]. This is formally captured as Eq.


How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks

Ruah, Clement, Sifaou, Houssem, Simeone, Osvaldo, Al-Hashimi, Bashir M.

arXiv.org Artificial Intelligence

Abstract--Training effective artificial intelligence models for telecommunications is challenging due to the scarcity of deployment-specific data. Real data collection is expensive, and available datasets often fail to capture the unique operational conditions and contextual variability of the network environment. Digital twinning provides a potential solution to this problem, as simulators tailored to the current network deployment can generate site-specific data to augment the available training datasets. However, there is a need to develop solutions to bridge the inherent simulation-to-reality (sim-to-real) gap between synthetic and real-world data. This paper reviews recent advances on two complementary strategies: 1) the calibration of digital twins (DTs) through real-world measurements, and 2) the use of sim-to-real gap-aware training strategies to robustly handle residual discrepancies between digital twin-generated and real data. For the latter, we evaluate two conceptually distinct methods that model the sim-to-real gap either at the level of the environment via Bayesian learning or at the level of the training loss via prediction-powered inference. Driven by the continued growth of computing resources and training datasets, artificial intelligence (AI) research is widely considered to be in the scaling era, which is focused on the development of general-purpose models that exhibit emergent capabilities. While this trend has yielded impressive results for many tasks, particularly in the domain of language modeling, it poses unique challenges when applied to engineering domains such as telecommunication networks.


Stay Unique, Stay Efficient: Preserving Model Personality in Multi-Task Merging

Guo, Kuangpu, Ding, Yuhe, Liang, Jian, Wang, Zilei, He, Ran

arXiv.org Artificial Intelligence

Model merging has emerged as a promising paradigm for enabling multi-task capabilities without additional training. However, existing methods often experience substantial performance degradation compared with individually fine-tuned models, even on similar tasks, underscoring the need to preserve task-specific information. This paper proposes Decomposition, Thresholding, and Scaling (DTS), an approximation-based personalized merging framework that preserves task-specific information with minimal storage overhead. DTS first applies singular value decomposition to the task-specific information and retains only a small subset of singular values and vectors. It then introduces a novel thresholding strategy that partitions singular vector elements into groups and assigns a scaling factor to each group. To enable generalization to unseen tasks, we further extend DTS with a variant that fuses task-specific information in a data-free manner based on the semantic similarity of task characteristics. Extensive experiments demonstrate that DTS consistently outperforms state-of-the-art baselines while requiring only 1\% additional storage per task. Furthermore, experiments on unseen tasks show that the DTS variant achieves significantly better generalization performance. Our code is available at https://github.com/krumpguo/DTS.


Opponent Modeling with In-context Search

Neural Information Processing Systems

Opponent modeling is a longstanding research topic aimed at enhancing decision-making by modeling information about opponents in multi-agent environments. However, existing approaches often face challenges such as having difficulty generalizing to unknown opponent policies and conducting unstable performance.


Dynamic Temperature Scheduler for Knowledge Distillation

Islam, Sibgat Ul, Ahad, Jawad Ibn, Rahman, Fuad, Amin, Mohammad Ruhul, Mohammed, Nabeel, Rahman, Shafin

arXiv.org Artificial Intelligence

Knowledge Distillation (KD) trains a smaller student model using a large, pre-trained teacher model, with temperature as a key hyperparameter controlling the softness of output probabilities. Traditional methods use a fixed temperature throughout training, which is suboptimal. Moreover, architectural differences between teacher and student often result in mismatched logit magnitudes. We demonstrate that students benefit from softer probabilities early in training but require sharper probabilities in later stages. We introduce Dynamic Temperature Scheduler (DTS), which adjusts temperature dynamically based on the cross-entropy loss gap between teacher and student. To our knowledge, this is the first temperature scheduling method that adapts based on the divergence between teacher and student distributions. Our method integrates seamlessly with existing KD frameworks. We validate DTS across multiple KD strategies on vision (CIFAR-100, Tiny-ImageNet) and NLP tasks (GLUE, Dolly, SelfIns, UnNI, S-NI), consistently outperforming static-temperature baselines. Code is available at https://github.com/Sibgat-Ul/DTS.


the paper thoroughly and incorporate all the comments

Neural Information Processing Systems

We thank all three reviewers for their careful readings, valuable questions and constructive suggestions. Thanks for raising the questions on infinite arms and frequentist regret bound for DTS. Thanks for raising this point and we will add a remark on it. We will add a remark on this point in the revision. We will add a remark on this point in the revision.


RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space

Neural Information Processing Systems

In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss.


DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching

Xu, Zicheng, Wang, Guanchu, Chuang, Yu-Neng, Zheng, Guangyao, Szalay, Alexander S., Liu, Zirui, Braverman, Vladimir

arXiv.org Artificial Intelligence

Large Reasoning Models (LRMs) demonstrate strong performance on complex reasoning tasks, yet they often suffer from overthinking, producing excessively long chain-of-thought (CoT) traces that increase inference cost and may degrade accuracy. Our analysis reveals a clear anti-correlation between reasoning length and accuracy, where across multiple stochastic decodes, the short reasoning paths consistently achieve the highest correctness, while longer ones accumulate errors and repetitions. These short optimal reasoning paths can be found ideally through full enumeration of the reasoning space. However, the tree-structured reasoning space grows exponentially with sequence length, rendering exhaustive exploration infeasible. To address this, we propose DTS, a model-agnostic decoding framework that sketches the reasoning space by selectively branching at high-entropy tokens and applies early stopping to select the shortest completed reasoning path. This approach approximates the optimal solution that enhances both efficiency and accuracy, without requiring additional training or supervision. Experiments on AIME2024 and AIME2025 datasets with DeepSeek-R1-Distill-Qwen-7B and 1.5B show that DTS improves accuracy by up to 8%, reduces average reasoning length by 23%, and decreases repetition frequency by 12%, demonstrating DTS's ability for scalable and efficient LRM reasoning.


Efficient & Correct Predictive Equivalence for Decision Trees

Marques-Silva, Joao, Ignatiev, Alexey

arXiv.org Artificial Intelligence

The Rashomon set of decision trees (DTs) finds importance uses. Recent work showed that DTs computing the same classification function, i.e. predictive equivalent DTs, can represent a significant fraction of the Rashomon set. Such redundancy is undesirable. For example, feature importance based on the Rashomon set becomes inaccurate due the existence of predictive equivalent DTs, i.e. DTs with the same prediction for every possible input. In recent work, McTavish et al. proposed solutions for several computational problems related with DTs, including that of deciding predictive equivalent DTs. The approach of McTavish et al. consists of applying the well-known method of Quine-McCluskey (QM) for obtaining minimum-size DNF (disjunctive normal form) representations of DTs, which are then used for comparing DTs for predictive equivalence. Furthermore, the minimum-size DNF representation was also applied to computing explanations for the predictions made by DTs, and to finding predictions in the presence of missing data. However, the problem of formula minimization is hard for the second level of the polynomial hierarchy, and the QM method may exhibit worst-case exponential running time and space. This paper first demonstrates that there exist decision trees that trigger the worst-case exponential running time and space of the QM method. Second, the paper shows that the QM method may incorrectly decide predictive equivalence, if two key constraints are not respected, and one may be difficult to formally guarantee. Third, the paper shows that any of the problems to which the smallest DNF representation has been applied to can be solved in polynomial time, in the size of the DT. The experiments confirm that, for DTs for which the worst-case of the QM method is triggered, the algorithms proposed in this paper are orders of magnitude faster than the ones proposed by McTavish et al.