AITopics | Sun, Wei

Collaborating Authors

Sun, Wei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming

Zhu, Zehao, Sun, Wei, Jia, Jun, Wu, Wei, Deng, Sibin, Li, Kai, Chen, Ying, Min, Xiongkuo, Wang, Jia, Zhai, Guangtao

arXiv.org Artificial IntelligenceSep-26-2024

In recent years, live video streaming has gained widespread popularity across various social media platforms. Quality of experience (QoE), which reflects end-users' satisfaction and overall experience, plays a critical role for media service providers to optimize large-scale live compression and transmission strategies to achieve perceptually optimal rate-distortion trade-off. Although many QoE metrics for video-on-demand (VoD) have been proposed, there remain significant challenges in developing QoE metrics for live video streaming. To bridge this gap, we conduct a comprehensive study of subjective and objective QoE evaluations for live video streaming. For the subjective QoE study, we introduce the first live video streaming QoE dataset, TaoLive QoE, which consists of $42$ source videos collected from real live broadcasts and $1,155$ corresponding distorted ones degraded due to a variety of streaming distortions, including conventional streaming distortions such as compression, stalling, as well as live streaming-specific distortions like frame skipping, variable frame rate, etc. Subsequently, a human study was conducted to derive subjective QoE scores of videos in the TaoLive QoE dataset. For the objective QoE study, we benchmark existing QoE models on the TaoLive QoE dataset as well as publicly available QoE datasets for VoD scenarios, highlighting that current models struggle to accurately assess video QoE, particularly for live content. Hence, we propose an end-to-end QoE evaluation model, Tao-QoE, which integrates multi-scale semantic features and optical flow-based motion features to predicting a retrospective QoE score, eliminating reliance on statistical quality of service (QoS) features.

artificial intelligence, machine learning, video, (17 more...)

arXiv.org Artificial Intelligence

2409.17596

Country:

Asia > China (1.00)
Europe (0.67)
North America > Canada > Ontario > Hamilton (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Sensing and Signal Processing > Image Processing (0.70)

Add feedback

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

Jiang, Yicong, Wang, Tianzi, Xie, Xurong, Liu, Juan, Sun, Wei, Yan, Nan, Chen, Hui, Wang, Lan, Liu, Xunying, Tian, Feng

arXiv.org Artificial IntelligenceJun-14-2024

Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model. We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs, to improve model recognition of Chinese dysarthric speech. Experimental results from our Chinese dysarthric speech dataset demonstrate consistent improvements in recognition performance with Perceiver-Prompt. Relative reduction up to 13.04% in CER is obtained over the fine-tuned Whisper.

artificial intelligence, perceiver-prompt, speech recognition, (13 more...)

arXiv.org Artificial Intelligence

2406.09873

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.75)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

Zhang, Zicheng, Wu, Haoning, Li, Chunyi, Zhou, Yingjie, Sun, Wei, Min, Xiongkuo, Chen, Zijian, Liu, Xiaohong, Lin, Weisi, Zhai, Guangtao

arXiv.org Artificial IntelligenceJun-5-2024

How to accurately and efficiently assess AI-generated images (AIGIs) remains a critical challenge for generative models. Given the high costs and extensive time commitments required for user studies, many researchers have turned towards employing large multi-modal models (LMMs) as AIGI evaluators, the precision and validity of which are still questionable. Furthermore, traditional benchmarks often utilize mostly natural-captured content rather than AIGIs to test the abilities of LMMs, leading to a noticeable gap for AIGIs. Therefore, we introduce A-Bench in this paper, a benchmark designed to diagnose whether LMMs are masters at evaluating AIGIs. Specifically, A-Bench is organized under two key principles: 1) Emphasizing both high-level semantic understanding and low-level visual quality perception to address the intricate demands of AIGIs. 2) Various generative models are utilized for AIGI creation, and various LMMs are employed for evaluation, which ensures a comprehensive validation scope. Ultimately, 2,864 AIGIs from 16 text-to-image models are sampled, each paired with question-answers annotated by human experts, and tested across 18 leading LMMs. We hope that A-Bench will significantly enhance the evaluation process and promote the generation quality for AIGIs. The benchmark is available at https://github.com/Q-Future/A-Bench.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.0307

Genre:

Research Report (0.82)
Questionnaire & Opinion Survey (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)

Add feedback

Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian

Sun, Wei, Zhang, Qi, Zhou, Yanzhao, Ye, Qixiang, Jiao, Jianbin, Li, Yuan

arXiv.org Artificial IntelligenceMay-29-2024

3D Gaussian splatting has demonstrated impressive performance in real-time novel view synthesis. However, achieving successful reconstruction from RGB images generally requires multiple input views captured under static conditions. To address the challenge of sparse input views, previous approaches have incorporated depth supervision into the training of 3D Gaussians to mitigate overfitting, using dense predictions from pretrained depth networks as pseudo-ground truth. Nevertheless, depth predictions from monocular depth estimation models inherently exhibit significant uncertainty in specific areas. Relying solely on pixel-wise L2 loss may inadvertently incorporate detrimental noise from these uncertain areas. In this work, we introduce a novel method to supervise the depth distribution of 3D Gaussians, utilizing depth priors with integrated uncertainty estimates. To address these localized errors in depth predictions, we integrate a patch-wise optimal transport strategy to complement traditional L2 loss in depth supervision. Extensive experiments conducted on the LLFF, DTU, and Blender datasets demonstrate that our approach, UGOT, achieves superior novel view synthesis and consistently outperforms state-of-the-art methods.

artificial intelligence, machine learning, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2405.19657

Country:

Asia > Japan > Honshū > Chūbu (0.14)
North America > United States (0.14)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

DMON: A Simple yet Effective Approach for Argument Structure Learning

Sun, Wei, Li, Mingxiao, Sun, Jingyuan, Davis, Jesse, Moens, Marie-Francine

arXiv.org Artificial IntelligenceMay-2-2024

Argument structure learning~(ASL) entails predicting relations between arguments. Because it can structure a document to facilitate its understanding, it has been widely applied in many fields~(medical, commercial, and scientific domains). Despite its broad utilization, ASL remains a challenging task because it involves examining the complex relationships between the sentences in a potentially unstructured discourse. To resolve this problem, we have developed a simple yet effective approach called Dual-tower Multi-scale cOnvolution neural Network~(DMON) for the ASL task. Specifically, we organize arguments into a relationship matrix that together with the argument embeddings forms a relationship tensor and design a mechanism to capture relations with contextual arguments. Experimental results on three different-domain argument mining datasets demonstrate that our framework outperforms state-of-the-art models. The code is available at https://github.com/VRCMF/DMON.git .

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.01216

Country:

North America > United States > Minnesota (0.14)
Asia > Middle East > Israel (0.14)
Asia > India > NCT (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

Zhang, Zicheng, Wu, Haoning, Zhou, Yingjie, Li, Chunyi, Sun, Wei, Chen, Chaofeng, Min, Xiongkuo, Liu, Xiaohong, Lin, Weisi, Zhai, Guangtao

arXiv.org Artificial IntelligenceApr-28-2024

Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs through text supervision. To achieve this, we transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. To compensate for the loss of perception in the 3D domain, structural features are extracted as well. These quality logits and structural features are then combined and regressed into quality scores. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA that enhances model understanding and assessment accuracy. We hope our contributions can inspire subsequent investigations into the fusion of LMMs with PCQA, fostering advancements in 3D visual quality analysis and beyond.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.18203

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
(2 more...)

Add feedback

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Li, Xin, Yuan, Kun, Pei, Yajing, Lu, Yiting, Sun, Ming, Zhou, Chao, Chen, Zhibo, Timofte, Radu, Sun, Wei, Wu, Haoning, Zhang, Zicheng, Jia, Jun, Zhang, Zhichao, Cao, Linhan, Chen, Qiubo, Min, Xiongkuo, Lin, Weisi, Zhai, Guangtao, Sun, Jianhui, Wang, Tianyi, Li, Lei, Kong, Han, Wang, Wenxuan, Li, Bing, Luo, Cheng, Wang, Haiqiang, Chen, Xiangguang, Meng, Wenhui, Pan, Xiang, Shi, Huiying, Zhu, Han, Xu, Xiaozhong, Sun, Lei, Chen, Zhenzhong, Liu, Shan, Kong, Fangyuan, Fan, Haotian, Xu, Yifang, Xu, Haoran, Yang, Mengduo, Zhou, Jie, Li, Jiaze, Wen, Shijie, Xu, Mai, Li, Da, Yao, Shunyu, Du, Jiazhi, Zuo, Wangmeng, Li, Zhibo, He, Shuai, Ming, Anlong, Fu, Huiyuan, Ma, Huadong, Wu, Yong, Xue, Fie, Zhao, Guozhi, Du, Lina, Guo, Jie, Zhang, Yu, Zheng, Huimin, Chen, Junhao, Liu, Yue, Zhou, Dulan, Xu, Kele, Xu, Qisheng, Sun, Tao, Ding, Zhixiang, Hu, Yuhang

arXiv.org Artificial IntelligenceApr-17-2024

This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024.

artificial intelligence, machine learning, video, (15 more...)

arXiv.org Artificial Intelligence

2404.11313

Country: Asia > China (1.00)

Genre:

Research Report (1.00)
Overview (0.74)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

A resource-constrained stochastic scheduling algorithm for homeless street outreach and gleaning edible food

Artman, Conor M., Mate, Aditya, Nwankwo, Ezinne, Heching, Aliza, Idé, Tsuyoshi, Jiří\, null, Navrátil, null, Shanmugam, Karthikeyan, Sun, Wei, Varshney, Kush R., Goldkind, Lauri, Kroch, Gidi, Sawyer, Jaclyn, Watson, Ian

arXiv.org Machine LearningMar-15-2024

We developed a common algorithmic solution addressing the problem of resource-constrained outreach encountered by social change organizations with different missions and operations: Breaking Ground -- an organization that helps individuals experiencing homelessness in New York transition to permanent housing and Leket -- the national food bank of Israel that rescues food from farms and elsewhere to feed the hungry. Specifically, we developed an estimation and optimization approach for partially-observed episodic restless bandits under $k$-step transitions. The results show that our Thompson sampling with Markov chain recovery (via Stein variational gradient descent) algorithm significantly outperforms baselines for the problems of both organizations. We carried out this work in a prospective manner with the express goal of devising a flexible-enough but also useful-enough solution that can help overcome a lack of sustainable impact in data science for social good.

data mining, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2403.10638

Country:

North America > United States > New York (0.48)
Asia > Middle East > Israel (0.34)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Social Sector (0.69)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
(3 more...)

Add feedback

API Pack: A Massive Multilingual Dataset for API Call Generation

Guo, Zhen, Soria, Adriana Meza, Sun, Wei, Shen, Yikang, Panda, Rameswar

arXiv.org Artificial IntelligenceFeb-16-2024

We introduce API Pack, a multilingual dataset featuring over one million instruction-API call pairs aimed at advancing large language models' API call generation capabilities. Through experiments, we demonstrate API Pack's efficacy in enhancing models for this specialized task while maintaining their overall proficiency at general coding. Fine-tuning CodeLlama-13B on just 20,000 Python instances yields over 10% and 5% higher accuracy than GPT-3.5 and GPT-4 respectively in generating unseen API calls. Scaling to 100k examples improves generalization to new APIs not seen during training. In addition, cross-lingual API call generation is achieved without needing extensive data per language. The dataset, fine-tuned models, and overall code base are publicly available at https://github.com/zguo0525/API-Pack.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2402.09615

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PresAIse, An Enterprises Prescriptive AI Solution

Sun, Wei, McFaddin, Scott, Tran, Linh Ha, Subramanian, Shivaram, Greenewald, Kristjan, Tenzin, Yeshi, Xue, Zack, Drissi, Youssef, Ettl, Markus

arXiv.org Artificial IntelligenceFeb-2-2024

Prescriptive AI represents a transformative shift in decision-making, offering causal insights and actionable recommendations. Despite its huge potential, enterprise adoption often faces several challenges. The first challenge is caused by the limitations of observational data for accurate causal inference which is typically a prerequisite for good decision-making. The second pertains to the interpretability of recommendations, which is crucial for enterprise decision-making settings. The third challenge is the silos between data scientists and business users, hindering effective collaboration. This paper outlines an initiative from IBM Research, aiming to address some of these challenges by offering a suite of prescriptive AI solutions. Leveraging insights from various research papers, the solution suite includes scalable causal inference methods, interpretable decision-making approaches, and the integration of large language models (LLMs) to bridge communication gaps via a conversation agent. A proof-of-concept, PresAIse, demonstrates the solutions' potential by enabling non-ML experts to interact with prescriptive AI models via a natural language interface, democratizing advanced analytics for strategic decision-making.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.02006

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback