Goto

Collaborating Authors

 tir


DPA-Net: A Dual-Path Attention Neural Network for Inferring Glycemic Control Metrics from Self-Monitored Blood Glucose Data

Lei, Canyu, Lobo, Benjamin, Xie, Jianxin

arXiv.org Artificial Intelligence

Abstract--Continuous glucose monitoring (CGM) provides dense and dynamic glucose profiles that enable reliable estimation of Ambulatory Glucose Profile (AGP) metrics, such as Time in Range (TIR), Time Below Range (TBR), and Time Above Range (T AR). However, the high cost and limited accessibility of CGM restrict its widespread adoption, particularly in low-and middle-income regions. In contrast, self-monitoring of blood glucose (SMBG) is inexpensive and widely available, but yields sparse and irregular data that are challenging to translate into clinically meaningful glycemic metrics. In this work, we propose a Dual-Path Attention Neural Network (DPA-Net), to estimate AGP metrics directly from SMBG data. DPA-Net integrates two complementary paths: (1) a spatial-channel attention path that reconstructs a CGM-like trajectory from sparse SMBG observations, and (2) a multi-scale ResNet path that directly predicts AGP metrics. An alignment mechanism between the two paths is introduced to reduce bias and mitigate overfitting. In addition, we develop an active point selector to identify realistic and informative SMBG sampling points that reflect patient behavioral patterns. Experimental results on a large, real-world dataset demonstrate that DPA-Net achieves robust accuracy with low errors, while reducing systematic bias. T o the best of our knowledge, this is the first supervised machine learning framework for estimating AGP metrics from SMBG data, offering a practical and clinically relevant decision-support tool in settings where CGM is not accessible. With the steadily increasing prevalence of diabetes, it has become one of the most common and challenging chronic diseases worldwide, imposing a substantial burden on public health [1].


Understanding Tool-Integrated Reasoning

Lin, Heng, Xu, Zhongwen

arXiv.org Machine Learning

We study why Tool-Integrated Reasoning (TIR) makes Large Language Models (LLMs) more capable. While LLMs integrated with tools like Python code interpreters show great promise, a principled theory explaining why this paradigm is effective has been missing. This work provides the first formal proof that TIR fundamentally expands an LLM's capabilities. We demonstrate that tools enable a strict expansion of the model's empirical and feasible support, breaking the capability ceiling of pure-text models by unlocking problem-solving strategies that are otherwise impossible or intractably verbose. To guide model behavior without compromising training stability and performance, we also introduce Advantage Shaping Policy Optimization (ASPO), a novel algorithm that directly modifies the advantage function to guide the policy behavior. We conduct comprehensive experiments on challenging mathematical benchmarks, leveraging a Python interpreter as the external tool. Our results show that the TIR model decisively outperforms its pure-text counterpart on the pass@k metric. Crucially, this advantage is not confined to computationally-intensive problems but extends to those requiring significant abstract insight. We further identify the emergent cognitive patterns that illustrate how models learn to think with tools. Finally, we report improved tool usage behavior with early code invocation and much more interactive turns with ASPO. Overall, our work provides the first principled explanation for TIR's success, shifting the focus from the mere fact that tools work to why and how they enable more powerful reasoning.


Personalised Insulin Adjustment with Reinforcement Learning: An In-Silico Validation for People with Diabetes on Intensive Insulin Treatment

Panagiotou, Maria, Brigato, Lorenzo, Streit, Vivien, Hayoz, Amanda, Proennecke, Stephan, Athanasopoulos, Stavros, Olsen, Mikkel T., Brok, Elizabeth J. den, Svensson, Cecilie H., Makrilakis, Konstantinos, Xatzipsalti, Maria, Vazeou, Andriani, Mertens, Peter R., Pedersen-Bjergaard, Ulrik, de Galan, Bastiaan E., Mougiakakou, Stavroula

arXiv.org Artificial Intelligence

Despite recent advances in insulin preparations and technology, adjusting insulin remains an ongoing challenge for the majority of people with type 1 diabetes (T1D) and longstanding type 2 diabetes (T2D). In this study, we propose the Adaptive Basal-Bolus Advisor (ABBA), a personalised insulin treatment recommendation approach based on reinforcement learning for individuals with T1D and T2D, performing self-monitoring blood glucose measurements and multiple daily insulin injection therapy. We developed and evaluated the ability of ABBA to achieve better time-in-range (TIR) for individuals with T1D and T2D, compared to a standard basal-bolus advisor (BBA). The in-silico test was performed using an FDA-accepted population, including 101 simulated adults with T1D and 101 with T2D. An in-silico evaluation shows that ABBA significantly improved TIR and significantly reduced both times below- and above-range, compared to BBA. ABBA's performance continued to improve over two months, whereas BBA exhibited only modest changes. This personalised method for adjusting insulin has the potential to further optimise glycaemic control and support people with T1D and T2D in their daily self-management. Our results warrant ABBA to be trialed for the first time in humans.


Dissecting Tool-Integrated Reasoning: An Empirical Study and Analysis

Zhao, Yufeng, Liu, Junnan, Liu, Hongwei, Zhu, Dongsheng, Shen, Yuan, Zhang, Songyang, Chen, Kai

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have made significant strides in reasoning tasks through methods like chain-of-thought (CoT) reasoning. However, they often fall short in tasks requiring precise computations. Tool-Integrated Reasoning (TIR) has emerged as a solution by incorporating external tools into the reasoning process. Nevertheless, the generalization of TIR in improving the reasoning ability of LLM is still unclear. Additionally, whether TIR has improved the model's reasoning behavior and helped the model think remains to be studied. We introduce ReasonZoo, a comprehensive benchmark encompassing nine diverse reasoning categories, to evaluate the effectiveness of TIR across various domains. Additionally, we propose two novel metrics, Performance-Aware Cost (PAC) and Area Under the Performance-Cost Curve (AUC-PCC), to assess reasoning efficiency. Our empirical evaluation demonstrates that TIR-enabled models consistently outperform their non-TIR counterparts in both mathematical and non-mathematical tasks. Furthermore, TIR enhances reasoning efficiency, as evidenced by improved PAC and AUC-PCC, indicating reduced overthinking and more streamlined reasoning. These findings underscore the domain-general benefits of TIR and its potential to advance LLM capabilities in complex reasoning tasks.


Test-time Prompt Refinement for Text-to-Image Models

Khan, Mohammad Abdul Hafeez, Jain, Yash, Bhattacharyya, Siddhartha, Vineet, Vibhav

arXiv.org Artificial Intelligence

T ext-to-image (T2I) generation models have made significant strides but still struggle with prompt sensitivity: even minor changes in prompt wording can yield inconsistent or inaccurate outputs. T o address this challenge, we introduce a closed-loop, test-time prompt refinement framework that requires no additional training of the underlying T2I model, termed TIR . In our approach, each generation step is followed by a refinement step, where a pretrained multimodal large language model (MLLM) analyzes the output image and the user's prompt. The MLLM detects misalignments (e.g., missing objects, incorrect attributes) and produces a refined and physically grounded prompt for the next round of image generation. By iteratively refining the prompt and verifying alignment between the prompt and the image, TIR corrects errors, mirroring the iterative refinement process of human artists. W e demonstrate that this closed-loop strategy improves alignment and visual coherence across multiple benchmark datasets, all while maintaining plug-and-play integration with black-box T2I models.


Categorical Classification of Book Summaries Using Word Embedding Techniques

Keskin, Kerem, Keleş, Mümine Kaya

arXiv.org Artificial Intelligence

In this study, book summaries and categories taken from book sites were classified using word embedding methods, natural language processing techniques and machine learning algorithms. In addition, one hot encoding, Word2Vec and Term Frequency - Inverse Document Frequency (TF - IDF) methods, which are frequently used word embedding methods were used in this study and their success was compared. Additionally, the combination table of the pre - processing methods used is shown and added to the table. Looking at the results, it was observed that Support Vector Machine, Naive Bayes and Logistic Regression Models and TF - IDF and One - Hot Encoder word embedding techniques gave more successful results for Turkish texts. Using word2vec to process big text data.


Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Xu, Xin, Xu, Yan, Chen, Tianhao, Yan, Yuchen, Liu, Chengwu, Chen, Zaoyu, Wang, Yufei, Yin, Yichun, Wang, Yasheng, Shang, Lifeng, Liu, Qun

arXiv.org Artificial Intelligence

Existing approaches to mathematical reasoning with large language models (LLMs) rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation. While efforts have been made to combine these methods, they primarily rely on post-selection or predefined strategies, leaving an open question: whether LLMs can autonomously adapt their reasoning strategy based on their inherent capabilities. In this work, we propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously, aligning it with their intrinsic aptitude. TATA incorporates base-LLM-aware data selection during supervised fine-tuning (SFT) to tailor training data to the model's unique abilities. This approach equips LLMs to autonomously determine and apply the appropriate reasoning strategy at test time. We evaluate TATA through extensive experiments on six mathematical reasoning benchmarks, using both general-purpose and math-specialized LLMs. Empirical results demonstrate that TATA effectively combines the complementary strengths of CoT and TIR, achieving superior or comparable performance with improved inference efficiency compared to TIR alone. Further analysis underscores the critical role of aptitude-aware data selection in enabling LLMs to make effective and adaptive reasoning decisions and align reasoning strategies with model capabilities.


Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: G\"orselle Sohbet Etmek

Zeer, Ahmed, Dogan, Eren, Erdem, Yusuf, Ince, Elif, Shbib, Osama, Uzun, M. Egemen, Uz, Atahan, Yuce, M. Kaan, Kesgin, H. Toprak, Amasyali, M. Fatih

arXiv.org Artificial Intelligence

In this study, a Turkish visual instruction model was developed and various model architectures and dataset combinations were analysed to improve the performance of this model. The Cosmos-LLaVA model, which is built by combining different large language models and image coders, is designed to overcome the deficiencies in the Turkish language. In the experiments, the effects of fine-tuning with various datasets on the model performance are analysed in detail. The results show that model architecture and dataset selection have a significant impact on performance. Bu \c{c}al{\i}\c{s}mada bir T\"urk\c{c}e g\"orsel talimat modeli geli\c{s}tirilerek bu modelin performans{\i}n{\i} art{\i}rmaya y\"onelik \c{c}e\c{s}itli model mimarileri ve veri k\"umesi kombinasyonlar{\i} derinlemesine incelenmi\c{s}tir. Farkl{\i} b\"uy\"uk dil modelleri ve g\"or\"unt\"u kodlay{\i}c{\i}lar{\i}n{\i}n bir araya getirilmesiyle olu\c{s}turulan Cosmos-LLaVA modeli, T\"urk\c{c}e dilindeki eksiklikleri gidermeye y\"onelik olarak tasarlanm{\i}\c{s}t{\i}r. Yap{\i}lan deneylerde, \c{c}e\c{s}itli veri k\"umeleri ile yap{\i}lan ince ayarlar{\i}n model performans{\i}n{\i} nas{\i}l etkiledi\u{g}i detayl{\i} olarak ele al{\i}nm{\i}\c{s}t{\i}r. Sonu\c{c}lar, model mimarisi ve veri k\"umesi se\c{c}iminin performans \"uzerinde \"onemli bir etkiye sahip oldu\u{g}unu g\"ostermektedir.


Qwen2.5-32B: Leveraging Self-Consistent Tool-Integrated Reasoning for Bengali Mathematical Olympiad Problem Solving

Tahmid, Saad, Sarker, Sourav

arXiv.org Artificial Intelligence

We present an innovative approach for solving mathematical problems in Bengali, developed for the DL Sprint 3.0 BUET CSE Fest 2024 Competition. Our method uses advanced deep learning models, notably the Qwen 2.5 series, with improvements made through prompt engineering, model quantization, and Tool Integrated Reasoning (TIR) to handle complex calculations. Initially, we explored various model architectures, including fine-tuned Mistral and quantized Qwen models, refining them with translation techniques, Retrieval-Augmented Generation (RAG), and custom dataset curation. Manual hyperparameter tuning optimized parameters like temperature and top-p to enhance model adaptability and accuracy. Removal of RAG and parameter adjustments further improved robustness. Our approach highlights the potential of advanced NLP techniques in solving Bengali mathematical problems.


Tipta uzmanlik sinavinda (tus) buyuk dil modelleri insanlardan daha mi basarili?

Aygul, Yesim, Olucoglu, Muge, Alpkocak, Adil

arXiv.org Artificial Intelligence

The potential of artificial intelligence in medical education and assessment has been made evident by recent developments in natural language processing and artificial intelligence. Medical questions can now be successfully answered by artificial intelligence algorithms. It can help medical practitioners. This study evaluates the performance of three different artificial intelligence models in answering Turkish medical questions in the 2021 1st Term Medical Specialization Examination (MSE). MSE consists of a total of 240 questions across clinical (CMST) and basic (BMST) medical sciences. According to the results in CMST, it was concluded that Gemini correctly answered 82 questions, ChatGPT-4 answered 105 questions and ChatGPT-4o answered 117 questions. In BMST, Gemini and ChatGPT-4 answered 93 questions and ChatGPT-4o answered 107 questions correctly according to the answer key. ChatGPT-4o outperformed the candidate with the highest scores of 113 and 106 according to CMST and BMST respectively. This study highlights the importance of the potential of artificial intelligence in medical education and assessment. It demonstrates that advanced models can achieve high accuracy and contextual understanding, demonstrating their potential role in medical education and evaluation.