AITopics | Zhang, Kaifu

Collaborating Authors

Zhang, Kaifu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

New Trends for Modern Machine Translation with Large Reasoning Models

Liu, Sinuo, Lyu, Chenyang, Wu, Minghao, Wang, Longyue, Luo, Weihua, Zhang, Kaifu, Shang, Zifu

arXiv.org Artificial IntelligenceMar-14-2025

Recent advances in Large Reasoning Models (LRMs), particularly those leveraging Chain-of-Thought reasoning (CoT), have opened brand new possibility for Machine Translation (MT). This position paper argues that LRMs substantially transformed traditional neural MT as well as LLMs-based MT paradigms by reframing translation as a dynamic reasoning task that requires contextual, cultural, and linguistic understanding and reasoning. We identify three foundational shifts: 1) contextual coherence, where LRMs resolve ambiguities and preserve discourse structure through explicit reasoning over cross-sentence and complex context or even lack of context; 2) cultural intentionality, enabling models to adapt outputs by inferring speaker intent, audience expectations, and socio-linguistic norms; 3) self-reflection, LRMs can perform self-reflection during the inference time to correct the potential errors in translation especially extremely noisy cases, showing better robustness compared to simply mapping X->Y translation. We explore various scenarios in translation including stylized translation, document-level translation and multimodal translation by showcasing empirical examples that demonstrate the superiority of LRMs in translation. We also identify several interesting phenomenons for LRMs for MT including auto-pivot translation as well as the critical challenges such as over-localisation in translation and inference efficiency. In conclusion, we think that LRMs redefine translation systems not merely as text converters but as multilingual cognitive agents capable of reasoning about meaning beyond the text. This paradigm shift reminds us to think of problems in translation beyond traditional translation scenarios in a much broader context with LRMs - what we can achieve on top of it.

machine learning, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2503.10351

Country:

North America > United States > California (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Widening The Distillation Bottleneck for Reasoning Models

Yin, Huifeng, Zhao, Yu, Wu, Minghao, Ni, Xuanfan, Zeng, Bo, Wang, Hao, Shi, Tianqi, Shao, Liangying, Lyu, Chenyang, Wang, Longyue, Luo, Weihua, Zhang, Kaifu

arXiv.org Artificial IntelligenceMar-3-2025

Large Reasoning Models(LRMs) such as OpenAI o1 and DeepSeek-R1 have shown remarkable reasoning capabilities by scaling test-time compute and generating long Chain-of-Thought(CoT). Distillation--post-training on LRMs-generated data--is a straightforward yet effective method to enhance the reasoning abilities of smaller models, but faces a critical bottleneck: we found that distilled long CoT data poses learning difficulty for small models and leads to the inheritance of biases (i.e. over-thinking) when using Supervised Fine-tuning(SFT) and Reinforcement Learning(RL) methods. To alleviate this bottleneck, we propose constructing tree-based CoT data from scratch via Monte Carlo Tree Search(MCTS). We then exploit a set of CoT-aware approaches, including Thoughts Length Balance, Fine-grained DPO, and Joint Post-training Objective, to enhance SFT and RL on the construted data.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.01461

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

Towards Lightweight, Adaptive and Attribute-Aware Multi-Aspect Controllable Text Generation with Large Language Models

Zhu, Chenyu, Liu, Yefeng, Lyu, Chenyang, Yang, Xue, Chen, Guanhua, Wang, Longyue, Luo, Weihua, Zhang, Kaifu

arXiv.org Artificial IntelligenceFeb-19-2025

Multi-aspect controllable text generation aims to control text generation in attributes from multiple aspects, making it a complex but powerful task in natural language processing. Supervised fine-tuning methods are often employed for this task due to their simplicity and effectiveness. However, they still have some limitations: low rank adaptation (LoRA) only fine-tunes a few parameters and has suboptimal control effects, while full fine-tuning (FFT) requires significant computational resources and is susceptible to overfitting, particularly when data is limited. Moreover, existing works typically train multi-aspect controllable text generation models using only single-aspect annotated data, which results in discrepancies in data distribution; at the same time, accurately generating text with specific attributes is a challenge that requires strong attribute-aware capabilities. To address these limitations, we propose a lightweight, adaptive and attribute-aware framework for multi-aspect controllable text generation. Our framework can dynamically adjust model parameters according to different aspects of data to achieve controllable text generation, aiming to optimize performance across multiple aspects. Experimental results show that our framework outperforms other strong baselines, achieves state-of-the-art performance, adapts well to data discrepancies, and is more accurate in attribute perception.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.13474

Country:

North America > Canada (0.14)
Asia > Thailand (0.14)
Asia > Middle East > UAE (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy

Ruan, Zhiwen, Li, Yixia, Zhu, He, Wang, Longyue, Luo, Weihua, Zhang, Kaifu, Chen, Yun, Chen, Guanhua

arXiv.org Artificial IntelligenceFeb-16-2025

Despite being pretrained on multilingual corpora, large language models (LLMs) exhibit suboptimal performance on low-resource languages. Recent approaches have leveraged multilingual encoders alongside LLMs by introducing trainable parameters connecting the two models. However, these methods typically focus on the encoder's output, overlooking valuable information from other layers. We propose \aname (\mname), a framework that integrates representations from all encoder layers, coupled with the \attaname mechanism to enable layer-wise interaction between the LLM and the multilingual encoder. Extensive experiments on multilingual reasoning tasks, along with analyses of learned representations, show that our approach consistently outperforms existing baselines.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.11405

Country:

Asia (0.46)
North America (0.46)
Europe (0.46)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs

Sun, Hui, Lu, Shiyin, Wang, Huanyu, Chen, Qing-Guo, Xu, Zhao, Luo, Weihua, Zhang, Kaifu, Li, Ming

arXiv.org Artificial IntelligenceJan-6-2025

Video large language models (Video-LLMs) have made significant progress in understanding videos. However, processing multiple frames leads to lengthy visual token sequences, presenting challenges such as the limited context length cannot accommodate the entire video, and the inclusion of irrelevant frames hinders visual perception. Hence, effective frame selection is crucial. This paper emphasizes that frame selection should follow three key principles: query relevance, list-wise diversity, and sequentiality. Existing methods, such as uniform frame sampling and query-frame matching, do not capture all of these principles. Thus, we propose Markov decision determinantal point process with dynamic programming (MDP3) for frame selection, a training-free and model-agnostic method that can be seamlessly integrated into existing Video-LLMs. Our method first estimates frame similarities conditioned on the query using a conditional Gaussian kernel within the reproducing kernel Hilbert space~(RKHS). We then apply the determinantal point process~(DPP) to the similarity matrix to capture both query relevance and list-wise diversity. To incorporate sequentiality, we segment the video and apply DPP within each segment, conditioned on the preceding segment selection, modeled as a Markov decision process~(MDP) for allocating selection sizes across segments. Theoretically, MDP3 provides a \((1 - 1/e)\)-approximate solution to the NP-hard list-wise frame selection problem with pseudo-polynomial time complexity, demonstrating its efficiency. Empirically, MDP3 significantly outperforms existing methods, verifying its effectiveness and robustness.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.02885

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation

Duan, Lunhao, Zhao, Shanshan, Yan, Wenjun, Li, Yinglun, Chen, Qing-Guo, Xu, Zhao, Luo, Weihua, Zhang, Kaifu, Gong, Mingming, Xia, Gui-Song

arXiv.org Artificial IntelligenceDec-25-2024

Recently, text-to-image generation models have achieved remarkable advancements, particularly with diffusion models facilitating high-quality image synthesis from textual descriptions. However, these models often struggle with achieving precise control over pixel-level layouts, object appearances, and global styles when using text prompts alone. To mitigate this issue, previous works introduce conditional images as auxiliary inputs for image generation, enhancing control but typically necessitating specialized models tailored to different types of reference inputs. In this paper, we explore a new approach to unify controllable generation within a single framework. Specifically, we propose the unified image-instruction adapter (UNIC-Adapter) built on the Multi-Modal-Diffusion Transformer architecture, to enable flexible and controllable generation across diverse conditions without the need for multiple specialized models. Our UNIC-Adapter effectively extracts multi-modal instruction information by incorporating both conditional images and task instructions, injecting this information into the image generation process through a cross-attention mechanism enhanced by Rotary Position Embedding. Experimental results across a variety of tasks, including pixel-level spatial control, subject-driven image generation, and style-image-based image synthesis, demonstrate the effectiveness of our UNIC-Adapter in unified controllable image generation.

artificial intelligence, machine learning, unic-adapter, (15 more...)

arXiv.org Artificial Intelligence

2412.18928

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models

Tan, Yingshui, Zheng, Boren, Zheng, Baihui, Cao, Kerui, Jing, Huiyun, Wei, Jincheng, Liu, Jiaheng, He, Yancheng, Su, Wenbo, Zhu, Xiangyong, Zheng, Bo, Zhang, Kaifu

arXiv.org Artificial IntelligenceDec-23-2024

With the rapid advancement of Large Language Models (LLMs), significant safety concerns have emerged. Fundamentally, the safety of large language models is closely linked to the accuracy, comprehensiveness, and clarity of their understanding of safety knowledge, particularly in domains such as law, policy and ethics. This factuality ability is crucial in determining whether these models can be deployed and applied safely and compliantly within specific regions. To address these challenges and better evaluate the factuality ability of LLMs to answer short questions, we introduce the Chinese SafetyQA benchmark. Chinese SafetyQA has several properties (i.e., Chinese, Diverse, High-quality, Static, Easy-to-evaluate, Safety-related, Harmless). Based on Chinese SafetyQA, we perform a comprehensive evaluation on the factuality abilities of existing LLMs and analyze how these capabilities relate to LLM abilities, e.g., RAG ability and robustness against attacks.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.15265

Country: Asia > China (0.68)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Law (0.94)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement

Ming, Lingfeng, Zeng, Bo, Lyu, Chenyang, Shi, Tianqi, Zhao, Yu, Yang, Xue, Liu, Yefeng, Wang, Yiyu, Xu, Linlong, Liu, Yangyang, Zhao, Xiaohu, Wang, Hao, Liu, Heng, Zhou, Hao, Yin, Huifeng, Shang, Zifu, Li, Haijun, Wang, Longyue, Luo, Weihua, Zhang, Kaifu

arXiv.org Artificial IntelligenceDec-5-2024

Large Language Models (LLMs) have achieved remarkable progress in recent years; however, their excellent performance is still largely limited to major world languages, primarily English. Many LLMs continue to face challenges with multilingual tasks, especially when it comes to low-resource languages. To address this issue, we introduced Marco-LLM: Massive multilingual training for cross-lingual enhancement LLM. We have collected a substantial amount of multilingual data for several low-resource languages and conducted extensive continual pre-training using the Qwen2 models. This effort has resulted in a multilingual LLM named Marco-LLM. Through comprehensive evaluations on various multilingual benchmarks, including MMMLU, AGIEval, Belebele, Flores-200, XCOPA and many others, Marco-LLM has demonstrated substantial improvements over state-of-the-art LLMs. Furthermore, Marco-LLM achieved substantial enhancements in any-to-any machine translation tasks, showing the effectiveness of our multilingual LLM. Marco-LLM is a pioneering multilingual LLM designed to not only perform exceptionally well in multilingual tasks, including low-resource languages, but also maintain strong performance in English and other major languages, closing the performance gap between high- and low-resource language capabilities. By bridging languages, this effort demonstrates our dedication to ensuring LLMs work accurately across various languages.

large language model, machine learning, marco-llm, (18 more...)

arXiv.org Artificial Intelligence

2412.04003

Country:

Europe (1.00)
Asia > Middle East (0.27)

Genre: Research Report > New Finding (1.00)

Industry: Education > Assessment & Standards (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Zhao, Yu, Yin, Huifeng, Zeng, Bo, Wang, Hao, Shi, Tianqi, Lyu, Chenyang, Wang, Longyue, Luo, Weihua, Zhang, Kaifu

arXiv.org Artificial IntelligenceNov-25-2024

OpenAI recently introduces the groundbreaking o1 model [OpenAI, 2024, Zhong et al., 2024], renowned for its exceptional reasoning capabilities. This model has demonstrates outstanding performance on platforms such as AIME and CodeForces, surpassing other leading models. Inspired by this success, we aim to push the boundaries of LLMs even further, enhancing their reasoning abilities to tackle complex, real-world challenges. Inspired by OpenAI's o1, we aim to explore potential approaches to shed light on the currently unclear technical roadmap for large reasoning models (LRM). Marco-o1 leverages advanced techniques like CoT fine-tuning [Wei et al., 2022], MCTS [Wei et al., 2022, Feng et al., 2023, Silver et al., 2017], and Reasoning Action Strategies to enhance its reasoning power. As shown in Figure 2, by finetuning Qwen2-7B-Instruct [Yang et al., 2024] with a combination of the filtered Open-O1 CoT dataset [OpenO1 Team, 2024], Marco-o1 CoT dataset, and Marco-o1 Instruction dataset, Marco-o1 improves its handling of complex tasks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.14405

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Add feedback

PMMT: Preference Alignment in Multilingual Machine Translation via LLM Distillation

Sun, Shuqiao, Yao, Yutong, Wu, Peiwen, Jiang, Feijun, Zhang, Kaifu

arXiv.org Artificial IntelligenceOct-15-2024

Translation is important for cross-language communication, and many efforts have been made to improve its accuracy. However, less investment is conducted in aligning translations with human preferences, such as translation tones or styles. In this paper, a new method is proposed to effectively generate large-scale multilingual parallel corpora with specific translation preferences using Large Language Models (LLMs). Meanwhile, an automatic pipeline is designed to distill human preferences into smaller Machine Translation (MT) models for efficiently and economically supporting large-scale calls in online services. Experiments indicate that the proposed method takes the lead in translation tasks with aligned human preferences by a large margin. Meanwhile, on popular public benchmarks like WMT and Flores, on which our models were not trained, the proposed method also shows a competitive performance compared to SOTA works.

large language model, machine learning, translation, (19 more...)

arXiv.org Artificial Intelligence

2410.1141

Country:

Asia (0.46)
North America (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback