detailed analysis
Teaching Language Models to Reason with Tools
Large reasoning models (LRMs) like OpenAI-o1 have shown impressive capabilities in natural language reasoning. However, these models frequently demonstrate inefficiencies or inaccuracies when tackling complex mathematical operations. While integrating computational tools such as Code Interpreters (CIs) offers a promising solution, it introduces a critical challenge: a conflict between the model's internal, probabilistic reasoning and the external, deterministic knowledge provided by the CI, which often leads models to unproductive deliberation. To overcome this, we introduce CoRT (Code-Optimized Reasoning Training), a post-training framework designed to teach LRMs to effectively utilize CIs. We propose Hint-Engineering, a new data synthesis strategy that strategically injects diverse hints at optimal points within reasoning paths. This approach generates high-quality, code-integrated reasoning data specifically tailored to optimize LRMCI interaction. Using this method, we have synthesized 30 high-quality samples to post-train models ranging from 1.5B to 32B parameters through supervised fine-tuning.
Teaching Language Models to Reason with Tools
Li, Chengpeng, Tang, Zhengyang, Li, Ziniu, Xue, Mingfeng, Bao, Keqin, Ding, Tian, Sun, Ruoyu, Wang, Benyou, Wang, Xiang, Lin, Junyang, Liu, Dayiheng
Large reasoning models (LRMs) like OpenAI-o1 have shown impressive capabilities in natural language reasoning. However, these models frequently demonstrate inefficiencies or inaccuracies when tackling complex mathematical operations. While integrating computational tools such as Code Interpreters (CIs) offers a promising solution, it introduces a critical challenge: a conflict between the model's internal, probabilistic reasoning and the external, deterministic knowledge provided by the CI, which often leads models to unproductive deliberation. To overcome this, we introduce CoRT (Code-Optimized Reasoning Training), a post-training framework designed to teach LRMs to effectively utilize CIs. We propose \emph{Hint-Engineering}, a new data synthesis strategy that strategically injects diverse hints at optimal points within reasoning paths. This approach generates high-quality, code-integrated reasoning data specifically tailored to optimize LRM-CI interaction. Using this method, we have synthesized 30 high-quality samples to post-train models ranging from 1.5B to 32B parameters through supervised fine-tuning. CoRT further refines the multi-round interleaving of external CI usage and internal thinking by employing rejection sampling and reinforcement learning. Our experimental evaluations demonstrate CoRT's effectiveness, yielding absolute improvements of 4\% and 8\% on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B, respectively, across five challenging mathematical reasoning datasets. Moreover, CoRT significantly enhances efficiency, reducing token usage by approximately 30\% for the 32B model and 50\% for the 1.5B model compared to pure natural language reasoning baselines. The models and code are available at: https://github.com/ChengpengLi1003/CoRT.
CoRT: Code-integrated Reasoning within Thinking
Li, Chengpeng, Tang, Zhengyang, Li, Ziniu, Xue, Mingfeng, Bao, Keqin, Ding, Tian, Sun, Ruoyu, Wang, Benyou, Wang, Xiang, Lin, Junyang, Liu, Dayiheng
Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations. Addressing these limitations through computational tools (e.g., computation libraries and symbolic solvers) is promising, but it introduces a technical challenge: Code Interpreter (CI) brings external knowledge beyond the model's internal text representations, thus the direct combination is not efficient. This paper introduces CoRT, a post-training framework for teaching LRMs to leverage CI effectively and efficiently. As a first step, we address the data scarcity issue by synthesizing code-integrated reasoning data through Hint-Engineering, which strategically inserts different hints at appropriate positions to optimize LRM-CI interaction. We manually create 30 high-quality samples, upon which we post-train models ranging from 1.5B to 32B parameters, with supervised fine-tuning, rejection fine-tuning and reinforcement learning. Our experimental results demonstrate that Hint-Engineering models achieve 4\% and 8\% absolute improvements on DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-1.5B respectively, across five challenging mathematical reasoning datasets. Furthermore, Hint-Engineering models use about 30\% fewer tokens for the 32B model and 50\% fewer tokens for the 1.5B model compared with the natural language models. The models and code are available at https://github.com/ChengpengLi1003/CoRT.
Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection
Thorat, Shantanu, Yang, Tianbao
As LLMs increase in accessibility, LLM-generated texts have proliferated across several fields, such as scientific, academic, and creative writing. However, LLMs are not created equally; they may have different architectures and training datasets. Thus, some LLMs may be more challenging to detect than others. Using two datasets spanning four total writing domains, we train AI-generated (AIG) text classifiers using the LibAUC library - a deep learning library for training classifiers with imbalanced datasets. Our results in the Deepfake Text dataset show that AIG-text detection varies across domains, with scientific writing being relatively challenging. In the Rewritten Ivy Panda (RIP) dataset focusing on student essays, we find that the OpenAI family of LLMs was substantially difficult for our classifiers to distinguish from human texts. Additionally, we explore possible factors that could explain the difficulties in detecting OpenAI-generated texts.
Artificial Intelligence in Diabetes Management Market Overview with Detailed Analysis, Competitive landscape, Forecast to 2026
Worldwide Global Artificial Intelligence in Diabetes Management Market report of 2019 provides a detailed market overview as well as industry analysis for / of companies, manufacturers and distributors covering data on gross margin, cost structure, value, sale price and more. It also ensembles the challenges prevalent in this industry vertical and identifies opportunities that will further aid business expansion. Further, the report revisits all areas of the business to cover the impact of COVID-19 pandemic so as to assist stakeholders in devising new strategies and reinforcing their position in the market.
Global Artificial Intelligence Software Market: What it got next? Find out here. - Sound On Sound Fest
For instance, a mixture of primary and secondary research has been used to define Artificial Intelligence Software market estimates and forecasts. Sources used for secondary research contain (but not limited to) Paid Data Sources, Technology Journals of 2013-2018, SEC Filings Company Websites, Annual Reports, and various other Artificial Intelligence Software industry publications. Specific details on the methodology used for Artificial Intelligence Software market report can be provided on demand. In addition, It highlights the ability to increase possibilities in the coming years by 2023, also reviewing the marketplace drivers, constraints and restraints, growth signs, challenges, market dynamics. "Global Artificial Intelligence Software Market" gives a region-wise analysis like growth aspects, and revenue, Past, present and future forecast trends, Analysis of emerging market sectors and development opportunities in Artificial Intelligence Software will forecast the market growth. Regional scope: Artificial Intelligence Software market is divided into various regions like North America, Middle-East a and Africa, Asia-Pacific, South America, and Europe. Country scope: Artificial Intelligence Software market is divided into United States, Mexico, Canada, Germany, Singapore, U.K., Italy, Russia, France, Spain, China, India, Japan, South Korea, Australia, Brazil, Colombia, Paraguay, Saudi Arabia, South Africa, Egypt, and UAE, ASEAN countries.
Global Wireless Electronic Health Records Market – Detailed Analysis of Current Figures with Forecasts Growth By 2024 – Analytics News
The ' Wireless Electronic Health Records Industry market' study recently added by Market Study Report, LLC, offers an in-depth analysis of the current market trends influencing this business vertical. The study also includes market valuation, market size, revenue forecasts, geographical spectrum and SWOT Analysis of the industry. In addition, the report depicts key challenges and growth opportunities faced by the industry bigwigs, in consort with their product offerings and business strategies. The Wireless Electronic Health Records Industry market research report delivers an exhaustive analysis of the market landscape that is assessed through the perspective of two primary parameters – production and consumption. With respect to the industry's production perspective, the report facilitates insightful information about product manufacturing, revenue, and gross margins of the manufacturers that are known across the field for production of the same.
BREAKING: NO MORE SECRETS - Google achieves "quantum supremacy" that will soon render all cryptocurrency breakable, all military secrets revealed
According to a report published at Fortune.com, Google has achieved "quantum supremacy" with a 53-qubit quantum computer. From reading the report, it is obvious that Fortune.com Here's what this means for cryptocurrency, military secrets and all secrets which are protected by cryptography. Notably, NASA published the scientific paper at this link, then promptly removed it as soon as the implications of this technology started to become apparent to a few observers. The cover-up begins…) However, the Financial Times reported on the paper before it was removed.
Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis
Álvez, Javier, Gonzalez-Dios, Itziar, Rigau, German
We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their mapping. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge misalignments, mapping errors and lack of knowledge and resources. Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.
AI and Machine Learning - Detailed Analysis, Facts and Figures An Infographic
Machine learning is key technology behind use of artificial intelligence applications. We know that AI applications are growing tremendously and businesses are focusing on efficient use of such applications which is becoming mandate for every organization. We are hereby highlighting some viewpoints, facts, figures as findings on AI and machine learning in form of infographic.