AITopics | Lu, Yiting

Collaborating Authors

Lu, Yiting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model

Guan, Fengbin, Yu, Zihao, Lu, Yiting, Li, Xin, Chen, Zhibo

arXiv.org Artificial IntelligenceFeb-26-2025

--Video quality assessment tasks rely heavily on the rich features required for video understanding, such as semantic information, texture, and temporal motion. The existing video foundational model, InternVideo2, has demonstrated strong potential in video understanding tasks due to its large parameter size and large-scale multimodal data pertaining. T o design a lightweight model suitable for this task, we proposed a distillation method to equip the smaller model with rich compression quality priors. Additionally, we examined the performance of different backbones during the distillation process. The results showed that, compared to other methods, our lightweight model distilled from InternVideo2 achieved excellent performance in compression video quality assessment.

artificial intelligence, machine learning, quality assessment, (15 more...)

arXiv.org Artificial Intelligence

2502.19026

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

Lei, Jiayi, Zhang, Renrui, Hu, Xiangfei, Lin, Weifeng, Li, Zhen, Sun, Wenjian, Du, Ruoyi, Zhuo, Le, Li, Zhongyu, Li, Xinyue, Zhao, Shitian, Guo, Ziyu, Lu, Yiting, Gao, Peng, Li, Hongsheng

arXiv.org Artificial IntelligenceJan-23-2025

With the rapid development of diffusion models, text-to-image(T2I) models have made significant progress, showcasing impressive abilities in prompt following and image generation. Recently launched models such as FLUX.1 and Ideogram2.0, along with others like Dall-E3 and Stable Diffusion 3, have demonstrated exceptional performance across various complex tasks, raising questions about whether T2I models are moving towards general-purpose applicability. Beyond traditional image generation, these models exhibit capabilities across a range of fields, including controllable generation, image editing, video, audio, 3D, and motion generation, as well as computer vision tasks like semantic segmentation and depth estimation. However, current evaluation frameworks are insufficient to comprehensively assess these models' performance across expanding domains. To thoroughly evaluate these models, we developed the IMAGINE-E and tested six prominent models: FLUX.1, Ideogram2.0, Midjourney, Dall-E3, Stable Diffusion 3, and Jimeng. Our evaluation is divided into five key domains: structured output generation, realism, and physical consistency, specific domain generation, challenging scenario generation, and multi-style creation tasks. This comprehensive assessment highlights each model's strengths and limitations, particularly the outstanding performance of FLUX.1 and Ideogram2.0 in structured and specific domain tasks, underscoring the expanding applications and potential of T2I models as foundational AI tools. This study provides valuable insights into the current state and future trajectory of T2I models as they evolve towards general-purpose usability. Evaluation scripts will be released at https://github.com/jylei16/Imagine-e.

artificial intelligence, ideogram2, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.1392

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Photography (1.00)
Health & Medicine (1.00)
Information Technology (0.92)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation

Shen, Maohao, Zhang, Shun, Wu, Jilong, Xiu, Zhiping, AlBadawy, Ehab, Lu, Yiting, Seltzer, Mike, He, Qing

arXiv.org Artificial IntelligenceOct-27-2024

Large language models (LLMs) have revolutionized natural language processing (NLP) with impressive performance across various text-based tasks. However, the extension of text-dominant LLMs to with speech generation tasks remains under-explored. In this work, we introduce a text-to-speech (TTS) system powered by a fine-tuned Llama model, named TTS-Llama, that achieves state-of-the-art speech synthesis performance. Building on TTS-Llama, we further propose MoLE-Llama, a text-and-speech multimodal LLM developed through purely late-fusion parameter-efficient fine-tuning (PEFT) and a mixture-of-expert architecture. Extensive empirical results demonstrate MoLE-Llama's competitive performance on both text-only question-answering (QA) and TTS tasks, mitigating catastrophic forgetting issue in either modality. Finally, we further explore MoLE-Llama in text-in-speech-out QA tasks, demonstrating its great potential as a multimodal dialog system capable of speech generation.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.20336

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Li, Xin, Yuan, Kun, Pei, Yajing, Lu, Yiting, Sun, Ming, Zhou, Chao, Chen, Zhibo, Timofte, Radu, Sun, Wei, Wu, Haoning, Zhang, Zicheng, Jia, Jun, Zhang, Zhichao, Cao, Linhan, Chen, Qiubo, Min, Xiongkuo, Lin, Weisi, Zhai, Guangtao, Sun, Jianhui, Wang, Tianyi, Li, Lei, Kong, Han, Wang, Wenxuan, Li, Bing, Luo, Cheng, Wang, Haiqiang, Chen, Xiangguang, Meng, Wenhui, Pan, Xiang, Shi, Huiying, Zhu, Han, Xu, Xiaozhong, Sun, Lei, Chen, Zhenzhong, Liu, Shan, Kong, Fangyuan, Fan, Haotian, Xu, Yifang, Xu, Haoran, Yang, Mengduo, Zhou, Jie, Li, Jiaze, Wen, Shijie, Xu, Mai, Li, Da, Yao, Shunyu, Du, Jiazhi, Zuo, Wangmeng, Li, Zhibo, He, Shuai, Ming, Anlong, Fu, Huiyuan, Ma, Huadong, Wu, Yong, Xue, Fie, Zhao, Guozhi, Du, Lina, Guo, Jie, Zhang, Yu, Zheng, Huimin, Chen, Junhao, Liu, Yue, Zhou, Dulan, Xu, Kele, Xu, Qisheng, Sun, Tao, Ding, Zhixiang, Hu, Yuhang

arXiv.org Artificial IntelligenceApr-17-2024

This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024.

artificial intelligence, machine learning, video, (15 more...)

arXiv.org Artificial Intelligence

2404.11313

Country: Asia > China (1.00)

Genre:

Research Report (1.00)
Overview (0.74)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine Strategy

Yu, Zihao, Guan, Fengbin, Lu, Yiting, Li, Xin, Chen, Zhibo

arXiv.org Artificial IntelligenceJan-16-2024

The objective of non-reference video quality assessment is to evaluate the quality of distorted video without access to reference high-definition references. In this study, we introduce an enhanced spatial perception module, pre-trained on multiple image quality assessment datasets, and a lightweight temporal fusion module to address the no-reference visual quality assessment (NR-VQA) task. This model implements Swin Transformer V2 as a local-level spatial feature extractor and fuses these multi-stage representations through a series of transformer layers. Furthermore, a temporal transformer is utilized for spatiotemporal feature fusion across the video. To accommodate compressed videos of varying bitrates, we incorporate a coarse-to-fine contrastive strategy to enrich the model's capability to discriminate features from videos of different bitrates. This is an expanded version of the one-page abstract.

artificial intelligence, machine learning, quality assessment, (11 more...)

arXiv.org Artificial Intelligence

2401.08522

Country: Asia > China > Anhui Province (0.15)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.74)
Information Technology > Artificial Intelligence > Vision (0.52)
Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Attention Forcing for Machine Translation

Dou, Qingyun, Lu, Yiting, Manakul, Potsawee, Wu, Xixin, Gales, Mark J. F.

arXiv.org Artificial IntelligenceApr-2-2021

Auto-regressive sequence-to-sequence models with attention mechanisms have achieved state-of-the-art performance in various tasks including Text-To-Speech (TTS) and Neural Machine Translation (NMT). The standard training approach, teacher forcing, guides a model with the reference output history. At inference stage, the generated output history must be used. This mismatch can impact performance. However, it is highly challenging to train the model using the generated output. Several approaches have been proposed to address this problem, normally by selectively using the generated output history. To make training stable, these approaches often require a heuristic schedule or an auxiliary classifier. This paper introduces attention forcing for NMT. This approach guides the model with the generated output history and reference attention, and can reduce the training-inference mismatch without a schedule or a classifier. Attention forcing has been successful in TTS, but its application to NMT is more challenging, due to the discrete and multi-modal nature of the output space. To tackle this problem, this paper adds a selection scheme to vanilla attention forcing, which automatically selects a suitable training approach for each pair of training data. Experiments show that attention forcing can improve the overall translation quality and the diversity of the translations.

deep learning, machine translation, neural network, (17 more...)

arXiv.org Artificial Intelligence

2104.01264

Country: Asia > Japan (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Attention Forcing for Sequence-to-sequence Model Training

Dou, Qingyun, Lu, Yiting, Efiong, Joshua, Gales, Mark J. F.

arXiv.org Machine LearningSep-26-2019

A BSTRACT Auto-regressive sequence-to-sequence models with attention mechanism have achieved state-of-the-art performance in many tasks such as machine translation and speech synthesis. These models can be difficult to train. The problem is that the model is unlikely to recover from its mistakes during inference, where the reference output is replaced by generated output. Several approaches deal with this problem, largely by guiding the model with generated output history. To make training stable, these approaches often require a heuristic schedule or an auxiliary classifier. This paper introduces attention forcing, which guides the model with generated output history and reference attention. This approach can train the model to recover from its mistakes, in a stable fashion, without the need for a schedule or a classifier. In addition, it allows the model to generate output sequences aligned with the references, which can be important for cascaded systems like many speech synthesis systems. Experiments on machine translation show that for tasks where various reorderings of the output are valid, guiding the model with generated output history is challenging, while guiding the model with reference attention is beneficial. 1 I NTRODUCTION Auto-regressive sequence-to-sequence (seq2seq) models with attention mechanism are widely used in a variety of areas including Neural Machine Translation (NMT) (Neubig, 2017; Huang et al., 2016) and speech synthesis (Shen et al., 2018; Wang et al., 2018), also known as Text-To-Speech (TTS). These models excel at connecting sequences of different length, but can be difficult to train. A standard approach is teacher forcing, which guides a model with reference output history during training.

deep learning, neural network, sequence, (18 more...)

arXiv.org Machine Learning

1909.12289

Country: Europe > United Kingdom (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback