AITopics | cft

Collaborating Authors

cft

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

46489c17893dfdcf028883202cefd6d1-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 06:56:56 GMT

In this paper, we study stochastic structured bandits for minimizing regret.

artificial intelligence, exp, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning

Ruan, Zhiwen, Li, Yixia, Zhu, He, Chen, Yun, Li, Peng, Liu, Yang, Chen, Guanhua

arXiv.org Artificial IntelligenceOct-14-2025

Large language models (LLMs) primarily rely on supervised fine-tuning (SFT) as a key method to adapt pre-trained models to domain-specific tasks such as mathematical reasoning. However, standard SFT uniformly penalizes all tokens, neglecting that only a small subset of critical tokens determines reasoning correctness. This uniform supervision often causes reduced output diversity and limited generalization. We propose Critical T oken Fine-tuning (CFT), a simple yet effective approach that updates only tokens identified as functionally indispensable via counterfactual perturbations. By focusing gradient signals on these decisive reasoning steps while preserving the diversity of non-critical tokens, CFT can enhance both generation and diversity. Extensive experiments on five models across three families (Qwen, OLMo, LLaMA) and eleven mathematical reasoning benchmarks show that CFT, despite fine-tuning on less than 12% of tokens, consistently outperforms standard SFT. Moreover, CFT enables test-time scaling through improved sampling diversity and provides a stronger initialization for reinforcement learning, sustaining performance gains in later training stages while maintaining higher entropy for better exploration. Large language models (LLMs) have achieved remarkable progress across a wide range of complex tasks, driven by the rapid scaling of both model parameters and training data (Fedus et al., 2022; Achiam et al., 2023; AI@Meta, 2024; Team, 2024; Brown et al., 2020). To adapt these general-purpose models to specialized downstream tasks (Y u et al., 2024), the prevailing paradigm is supervised fine-tuning (SFT) (Sanh et al.; Ruan et al., 2025), which optimizes on labeled prompt-response pairs using a maximum likelihood objective (Ouyang et al., 2022). SFT can also serve as an initialization for reinforcement learning (RL), providing a strong starting point that aids further RL optimization (Chu et al., 2025; Li et al., 2025).

large language model, machine learning, qwen2, (18 more...)

arXiv.org Artificial Intelligence

2510.10974

Country: North America > Mexico (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Hybrid Co-Finetuning Approach for Visual Bug Detection in Video Games

Yi, Faliu, Abdelfattah, Sherif, Huang, Wei, Brown, Adrian

arXiv.org Artificial IntelligenceOct-7-2025

Manual identification of visual bugs in video games is a resource-intensive and costly process, often demanding specialized domain knowledge. While supervised visual bug detection models offer a promising solution, their reliance on extensive labeled datasets presents a significant challenge due to the infrequent occurrence of such bugs. To overcome this limitation, we propose a hybrid Co-FineTuning (CFT) method that effectively integrates both labeled and unlabeled data. Our approach leverages labeled samples from the target game and diverse co-domain games, additionally incorporating unlabeled data to enhance feature representation learning. This strategy maximizes the utility of all available data, substantially reducing the dependency on labeled examples from the specific target game. The developed framework demonstrates enhanced scalability and adaptability, facilitating efficient visual bug detection across various game titles. Our experimental results show the robustness of the proposed method for game visual bug detection, exhibiting superior performance compared to conventional baselines across multiple gaming environments. Furthermore, CFT maintains competitive performance even when trained with only 50% of the labeled data from the target game.

artificial intelligence, hybrid co-finetuning approach, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2510.03591

Country:

North America > Canada (0.46)
Europe > Switzerland (0.28)

Genre:

Research Report > Experimental Study (0.95)
Research Report > New Finding (0.88)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Critique-Guided Distillation for Efficient and Robust Language Model Reasoning

Kapusuzoglu, Berkcan, Chakraborty, Supriyo, Lee, Chia-Hsuan, Sahu, Sambit

arXiv.org Artificial IntelligenceSep-30-2025

Supervised fine-tuning (SFT) with expert demonstrations often suffers from the imitation problem, where models reproduce correct responses without internalizing the underlying reasoning. We propose Critique-Guided Distillation (CGD), a multi-stage training framework that augments SFT with teacher-generated explanatory critiques and refined responses. Instead of directly imitating teacher outputs, a student learns to map the triplet of prompt, its own initial response, and teacher critique into the refined teacher response, thereby capturing both what to output and why. Our analyses show that CGD consistently reduces refinement uncertainty, improves alignment between critiques and responses, and enhances sample efficiency. On reasoning benchmarks, CGD achieves substantial gains across LLaMA and Qwen families, including +15.0% on AMC23 and +12.2% on MATH-500, while avoiding the format drift issues observed in prior critique-based fine-tuning. Importantly, on LLaMA-3.1-8B CGD approaches or exceeds the performance of SimpleRL-Zero, which is a DeepSeek-R1 replication, while requiring 60x less compute. Beyond reasoning, CGD maintains or improves general instruction-following and factual accuracy, matching baseline performance on IFEval, MUSR, TruthfulQA, and BBH. In contrast, prior critique-based methods degrade these capabilities (e.g., -21% on IFEval). Taken together, these results establish CGD} as a robust and generalizable alternative to both conventional SFT and RL-based methods, offering a more efficient path toward advancing the reasoning and safety of large language models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.11628

Country:

Asia (0.45)
North America > Mexico (0.28)
Europe > Austria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

EDAPT: Towards Calibration-Free BCIs with Continual Online Adaptation

Haxel, Lisa, Kapoor, Jaivardhan, Ziemann, Ulf, Macke, Jakob H.

arXiv.org Artificial IntelligenceAug-15-2025

Brain-computer interfaces (BCIs) suffer from accuracy degradation as neural signals drift over time and vary across users, requiring frequent recalibration that limits practical deployment. We introduce EDAPT, a task- and model-agnostic framework that eliminates calibration through continual model adaptation. EDAPT first trains a baseline decoder using data from multiple users, then continually personalizes this model via supervised finetuning as the neural patterns evolve during use. We tested EDAPT across nine datasets covering three BCI tasks, and found that it consistently improved accuracy over conventional, static methods. These improvements primarily stem from combining population-level pretraining and online continual finetuning, with unsupervised domain adaptation providing further gains on some datasets. EDAPT runs efficiently, updating models within 200 milliseconds on consumer-grade hardware. Finally, decoding accuracy scales with total data budget rather than its allocation between subjects and trials. EDAPT provides a practical pathway toward calibration-free BCIs, reducing a major barrier to BCI deployment.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.10474

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.15)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Wang, Yubo, Nie, Ping, Zou, Kai, Wu, Lijun, Chen, Wenhu

arXiv.org Artificial IntelligenceJun-6-2025

We have witnessed that strong LLMs like Qwen-Math, MiMo, and Phi-4 possess immense reasoning potential inherited from the pre-training stage. With reinforcement learning (RL), these models can improve dramatically on reasoning tasks. Recent studies have shown that even RL on a single problem can unleash these models' reasoning capabilities. However, RL is not only expensive but also unstable. Even one-shot RL requires hundreds of GPU hours. This raises a critical question: Is there a more efficient way to unleash the reasoning potential of these powerful base LLMs? In this work, we demonstrate that Critique Fine-Tuning (CFT) on only one problem can effectively unleash the reasoning potential of LLMs. Our method constructs critique data by collecting diverse model-generated solutions to a single problem and using teacher LLMs to provide detailed critiques. We fine-tune Qwen and Llama family models, ranging from 1.5B to 14B parameters, on the CFT data and observe significant performance gains across diverse reasoning tasks. For example, with just 5 GPU hours of training, Qwen-Math-7B-CFT show an average improvement of 15% on six math benchmarks and 16% on three logic reasoning benchmarks. These results are comparable to or even surpass the results from RL with 20x less compute. Ablation studies reveal the robustness of one-shot CFT across different prompt problems. These results highlight one-shot CFT as a simple, general, and compute-efficient approach to unleashing the reasoning capabilities of modern LLMs.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.03295

Genre: Research Report > New Finding (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Teaching LLMs How to Learn with Contextual Fine-Tuning

Choi, Younwoo, Asif, Muhammad Adil, Han, Ziwen, Willes, John, Krishnan, Rahul G.

arXiv.org Artificial IntelligenceMar-11-2025

Prompting Large Language Models (LLMs), or providing context on the expected model of operation, is an effective way to steer the outputs of such models to satisfy human desiderata after they have been trained. But in rapidly evolving domains, there is often need to fine-tune LLMs to improve either the kind of knowledge in their memory or their abilities to perform open ended reasoning in new domains. When human's learn new concepts, we often do so by linking the new material that we are studying to concepts we have already learned before. To that end, we ask, "can prompting help us teach LLMs how to learn". In this work, we study a novel generalization of instruction tuning, called contextual fine-tuning, to fine-tune LLMs. Our method leverages instructional prompts designed to mimic human cognitive strategies in learning and problem-solving to guide the learning process during training, aiming to improve the model's interpretation and understanding of domain-specific knowledge. We empirically demonstrate that this simple yet effective modification improves the ability of LLMs to be fine-tuned rapidly on new datasets both within the medical and financial domains.

contextual prompt, fine-tuning, information, (13 more...)

arXiv.org Artificial Intelligence

2503.09032

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Instructional Material (0.92)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Harnessing PDF Data for Improving Japanese Large Multimodal Models

Baek, Jeonghun, Aizawa, Akiko, Aizawa, Kiyoharu

arXiv.org Artificial IntelligenceFeb-20-2025

Large Multimodal Models (LMMs) have demonstrated strong performance in English, but their effectiveness in Japanese remains limited due to the lack of high-quality training data. Current Japanese LMMs often rely on translated English datasets, restricting their ability to capture Japan-specific cultural knowledge. To address this, we explore the potential of Japanese PDF data as a training resource, an area that remains largely underutilized. We introduce a fully automated pipeline that leverages pretrained models to extract image-text pairs from PDFs through layout analysis, OCR, and vision-language pairing, removing the need for manual annotation. Additionally, we construct instruction data from extracted image-text pairs to enrich the training data. To evaluate the effectiveness of PDF-derived data, we train Japanese LMMs and assess their performance on the Japanese LMM Benchmark. Our results demonstrate substantial improvements, with performance gains ranging from 3.9% to 13.8% on Heron-Bench. Further analysis highlights the impact of PDF-derived data on various factors, such as model size and language models, reinforcing its value as a multimodal resource for Japanese LMMs. We plan to make the source code and data publicly available upon acceptance.

image-text pair, instruction data, pdf data, (15 more...)

arXiv.org Artificial Intelligence

2502.14778

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Wang, Yubo, Yue, Xiang, Chen, Wenhu

arXiv.org Artificial IntelligenceFeb-5-2025

Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate annotated responses for given instructions. In this paper, we challenge this paradigm and propose Critique Fine-Tuning (CFT), a strategy where models learn to critique noisy responses rather than simply imitate correct ones. Inspired by human learning processes that emphasize critical thinking, CFT encourages deeper analysis and nuanced understanding-traits often overlooked by standard SFT. To validate the effectiveness of CFT, we construct a 50K-sample dataset from WebInstruct, using GPT-4o as the teacher to generate critiques in the form of ([query; noisy response], critique). CFT on this dataset yields a consistent 4-10% improvement over SFT on six math benchmarks with different base models like Qwen2.5, Qwen2.5-Math and DeepSeek-Math. We further expand to MetaMath and NuminaMath datasets and observe similar gains over SFT. Notably, our model Qwen2.5-Math-CFT only requires 1 hour training on 8xH100 over the 50K examples. It can match or outperform strong competitors like Qwen2.5-Math-Instruct on most benchmarks, which use over 2M samples. Moreover, it can match the performance of SimpleRL, which is a deepseek-r1 replication trained with 140x more compute. Ablation studies show that CFT is robust to the source of noisy response and teacher critique model. Through these findings, we argue that CFT offers a more effective alternative to advance the reasoning of language models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.17703

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Correlation to Causation: Understanding Climate Change through Causal Analysis and LLM Interpretations

Shan, Shan

arXiv.org Machine LearningDec-21-2024

This research presents a three-step causal inference framework that integrates correlation analysis, machine learning-based causality discovery, and LLM-driven interpretations to identify socioeconomic factors influencing carbon emissions and contributing to climate change. The approach begins with identifying correlations, progresses to causal analysis, and enhances decision making through LLM-generated inquiries about the context of climate change. The proposed framework offers adaptable solutions that support data-driven policy-making and strategic decision-making in climate-related contexts, uncovering causal relationships within the climate change domain.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2412.16691

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > North Carolina (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Energy > Renewable (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback