train and test
Unsupervised Risk Estimation Using Only Conditional Independence Structure
We show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test. We do not need to assume that the optimal predictor is the same between train and test, or that the true distribution lies in any parametric family. We can also efficiently compute gradients of the estimated error and hence perform unsupervised discriminative learning. Our technical tool is the method of moments, which allows us to exploit conditional independencies in the absence of a fully-specified model. Our framework encompasses a large family of losses including the log and exponential loss, and extends to structured output settings such as conditional random fields.
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective - Supplementary Material - Zhengzhuo Xu
One the one hand, the classification performance will be promoted. F ollow the settings of Proof A.2 and Eq.6, we can get the following relationship of the Hence we can generalize Eq.A.9 to: y According to Eq.A.11, it's easy to find a derivative zero point in range [1,C]. F ollow the Basic Setting of Proof.1, suppose It has limited contribution for the tail class' feature LT scenarios, the likelihood is consistent in the train and test set, but the prior is different. Hence, the actual optimization direction is not described as Eq.B.2 because the bias incurred by According to LemmaB.1, we immediately deduce that Bayias-compensated cross-entropy loss ensures All above re-weight methods are proven effective empirically, more or less. They propose the LDAM loss to encourage the tail classes to enjoy larger margins.
Towards Optimal Valve Prescription for Transcatheter Aortic Valve Replacement (TAVR) Surgery: A Machine Learning Approach
Paschalidis, Phevos, Stoumpou, Vasiliki, Everest, Lisa, Ma, Yu, Azemi, Talhat, Haider, Jawad, Zweibel, Steven, Protopapas, Eleftherios M., Mather, Jeff, Tysarowski, Maciej, Sarris, George E., Hagberg, Robert C., Haronian, Howard L., Bertsimas, Dimitris
Transcatheter Aortic Valve Replacement (TAVR) has emerged as a minimally invasive treatment option for patients with severe aortic stenosis, a life-threatening cardiovascular condition. Multiple transcatheter heart valves (THV) have been approved for use in TAVR, but current guidelines regarding valve type prescription remain an active topic of debate. We propose a data-driven clinical support tool to identify the optimal valve type with the objective of minimizing the risk of permanent pacemaker implantation (PPI), a predominant postoperative complication. We synthesize a novel dataset that combines U.S. and Greek patient populations and integrates three distinct data sources (patient demographics, computed tomography scans, echocardiograms) while harmonizing differences in each country's record system. We introduce a leaf-level analysis to leverage population heterogeneity and avoid benchmarking against uncertain counterfactual risk estimates. The final prescriptive model shows a reduction in PPI rates of 26% and 16% compared with the current standard of care in our internal U.S. population and external Greek validation cohort, respectively. To the best of our knowledge, this work represents the first unified, personalized prescription strategy for THV selection in TAVR.
Unsupervised Risk Estimation Using Only Conditional Independence Structure
We show how to estimate a model's test error from unlabeled data, on distributions very different from the training distribution, while assuming only that certain conditional independencies are preserved between train and test. We do not need to assume that the optimal predictor is the same between train and test, or that the true distribution lies in any parametric family. We can also efficiently compute gradients of the estimated error and hence perform unsupervised discriminative learning. Our technical tool is the method of moments, which allows us to exploit conditional independencies in the absence of a fully-specified model. Our framework encompasses a large family of losses including the log and exponential loss, and extends to structured output settings such as conditional random fields.
ShiZhi: A Chinese Lightweight Large Language Model for Court View Generation
Criminal Court View Generation (CVG) is a fundamental task in legal artificial intelligence, aiming to automatically generate the "Court View" section of a legal case document. Generating court views is challenging due to the diversity and complexity of case facts, and directly generating from raw facts may limit performance. In this paper, we present ShiZhi, the first large language model (LLM) specifically designed for court view generation. We construct a Chinese Court View Generation dataset, CCVG, of more than 110K cases, each containing fact descriptions paired with corresponding court views. Based on this dataset, ShiZhi achieving 70.00 ROUGE-1 and 67.85 BLEU-1 on court view generation, as well as 86.48\% accuracy with 92.75\% macro F1 on charge prediction. Experimental results demonstrate that even a small LLM can generate reasonable and legally coherent court views when trained on high-quality domain-specific data. Our model and dataset are available at \href{https://github.com/ZhitianHou/ShiZhi}{https://github.com/ZhitianHou/ShiZhi}.
From Source to Target: Leveraging Transfer Learning for Predictive Process Monitoring in Organizations
Weinzierl, Sven, Zilker, Sandra, Liessmann, Annina, Käppel, Martin, Wang, Weixin, Matzner, Martin
Event logs reflect the behavior of business processes that are mapped in organizational information systems. Predictive process monitoring (PPM) transforms these data into value by creating process-related predictions that provide the insights required for proactive interventions at process runtime. Existing PPM techniques require sufficient amounts of event data or other relevant resources that might not be readily available, which prevents some organizations from utilizing PPM. The transfer learning-based PPM technique presented in this paper allows organizations without suitable event data or other relevant resources to implement PPM for effective decision support. This technique is instantiated in both a real-life intra- and an inter-organizational use case, based on which numerical experiments are performed using event logs for IT service management processes. The results of the experiments suggest that knowledge of one business process can be transferred to a similar business process in the same or a different organization to enable effective PPM in the target context. The proposed technique allows organizations to benefit from transfer learning in intra- and inter-organizational settings by transferring resources such as pre-trained models within and across organizational boundaries.
OwkinZero: Accelerating Biological Discovery with AI
Bigaud, Nathan, Cabeli, Vincent, Gürel, Meltem, Pignet, Arthur, Klein, John, Wainrib, Gilles, Durand, Eric
While large language models (LLMs) are rapidly advancing scientific research, they continue to struggle with core biological reasoning tasks essential for translational and biomedical discovery. To address this limitation, we created and curated eight comprehensive benchmark datasets comprising over 300,000 verifiable question-and-answer pairs, each targeting critical challenges in drug discovery including target druggability, modality suitability, and drug perturbation effects. Using this resource, we developed the OwkinZero models by post-training open-source LLMs through a Reinforcement Learning from Verifiable Rewards strategy. Our results demonstrate that specialized 8-32B OwkinZero models substantially outperform larger, state-of-the-art commercial LLMs on these biological benchmarks. Remarkably, we uncover evidence of a key aspect of generalization: specialist models trained on a single task consistently outperform their base models on previously unseen tasks. This generalization effect is further amplified in our comprehensive OwkinZero models, which were trained on a mixture of datasets and achieve even broader cross-task improvements. This study represents a significant step toward addressing the biological reasoning blind spot in current LLMs, demonstrating that targeted reinforcement learning on carefully curated data can unlock generalizable performance in specialized models, thereby accelerating AI-driven biological discovery.
From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents
Lindenbauer, Tobias, Groh, Georg, Schütze, Hinrich
We introduce CTIM-Rover, an AI agent for Software Engineering (SE) built on top of AutoCodeRover (Zhang et al., 2024) that extends agentic reasoning frameworks with an episodic memory, more specifically, a general and repository-level Cross-Task-Instance Memory (CTIM). While existing open-source SE agents mostly rely on ReAct (Yao et al., 2023b), Reflexion (Shinn et al., 2023), or Code-Act (Wang et al., 2024), all of these reasoning and planning frameworks inefficiently discard their long-term memory after a single task instance. As repository-level understanding is pivotal for identifying all locations requiring a patch for fixing a bug, we hypothesize that SE is particularly well positioned to benefit from CTIM. For this, we build on the Experiential Learning (EL) approach ExpeL (Zhao et al., 2024), proposing a Mixture-Of-Experts (MoEs) inspired approach to create both a general-purpose and repository-level CTIM. We find that CTIM-Rover does not outperform AutoCodeRover in any configuration and thus conclude that neither ExpeL nor DoT-Bank (Lingam et al., 2024) scale to real-world SE problems. Our analysis indicates noise introduced by distracting CTIM items or exemplar trajectories as the likely source of the performance degradation.