AITopics | Chen, Jun-Cheng

Collaborating Authors

Chen, Jun-Cheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bridging The Multi-Modality Gaps of Audio, Visual and Linguistic for Speech Enhancement

Lin, Meng-Ping, Hou, Jen-Cheng, Chen, Chia-Wei, Chien, Shao-Yi, Chen, Jun-Cheng, Lu, Xugang, Tsao, Yu

arXiv.org Artificial IntelligenceJan-22-2025

Speech Enhancement (SE) aims to improve the quality of noisy speech. It has been shown that additional visual cues can further improve performance. Given that speech communication involves audio, visual, and linguistic modalities, it is natural to expect another performance boost by incorporating linguistic information. However, bridging the modality gaps to efficiently incorporate linguistic information, along with audio and visual modalities during knowledge transfer, is a challenging task. In this paper, we propose a novel multi-modality learning framework for SE. In the model framework, a state-of-the-art diffusion Model backbone is utilized for Audio-Visual Speech Enhancement (AVSE) modeling where both audio and visual information are directly captured by microphones and video cameras. Based on this AVSE, the linguistic modality employs a PLM to transfer linguistic knowledge to the visual acoustic modality through a process termed Cross-Modal Knowledge Transfer (CMKT) during AVSE model training. After the model is trained, it is supposed that linguistic knowledge is encoded in the feature processing of the AVSE model by the CMKT, and the PLM will not be involved during inference stage. We carry out SE experiments to evaluate the proposed model framework. Experimental results demonstrate that our proposed AVSE system significantly enhances speech quality and reduces generative artifacts, such as phonetic confusion compared to the state-of-the-art. Moreover, our visualization results demonstrate that our Cross-Modal Knowledge Transfer method further improves the generated speech quality of our AVSE system. These findings not only suggest that Diffusion Model-based techniques hold promise for advancing the state-of-the-art in AVSE but also justify the effectiveness of incorporating linguistic information to improve the performance of Diffusion-based AVSE systems.

artificial intelligence, machine learning, visual and linguistic, (4 more...)

arXiv.org Artificial Intelligence

2501.13375

Genre: Research Report (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback

ACCEPT: Adaptive Codebook for Composite and Efficient Prompt Tuning

Lin, Yu-Chen, Li, Wei-Hua, Chen, Jun-Cheng, Chen, Chu-Song

arXiv.org Artificial IntelligenceOct-17-2024

Prompt Tuning has been a popular Parameter-Efficient Fine-Tuning method attributed to its remarkable performance with few updated parameters on various large-scale pretrained Language Models (PLMs). Traditionally, each prompt has been considered indivisible and updated independently, leading the parameters increase proportionally as prompt length grows. To address this issue, we propose Adaptive Codebook for Composite and Efficient Prompt Tuning (ACCEPT). In our method, we refer to the concept of product quantization (PQ), allowing all soft prompts to share a set of learnable codebook vectors in each subspace, with each prompt differentiated by a set of adaptive weights. We achieve the superior performance on 17 diverse natural language tasks including natural language understanding (NLU) and question answering (QA) tasks by tuning only 0.3% of parameters of the PLMs. Our approach also excels in few-shot and large model settings, highlighting its significant potential.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.12847

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Training-free Diffusion Model Alignment with Sampling Demons

Yeh, Po-Hung, Lee, Kuang-Huei, Chen, Jun-Cheng

arXiv.org Machine LearningOct-8-2024

Aligning diffusion models with user preferences has been a key challenge. Existing methods for aligning diffusion models either require retraining or are limited to differentiable reward functions. To address these limitations, we propose a stochastic optimization approach, dubbed Demon, to guide the denoising process at inference time without backpropagation through reward functions or model retraining. Our approach works by controlling noise distribution in denoising steps to concentrate density on regions corresponding to high rewards through stochastic optimization. We provide comprehensive theoretical and empirical evidence to support and validate our approach, including experiments that use non-differentiable sources of rewards such as Visual-Language Model (VLM) APIs and human judgements. To the best of our knowledge, the proposed approach is the first inference-time, backpropagation-free preference alignment method for diffusion models. Our method can be easily integrated with existing diffusion models without further training. Our experiments show that the proposed approach significantly improves the average aesthetics scores for text-to-image generation.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2410.0576

Country:

Asia (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Visual Object Tracking through Visual Prompting

Chen, Shih-Fang, Chen, Jun-Cheng, Jhuo, I-Hong, Lin, Yen-Yu

arXiv.org Artificial IntelligenceSep-27-2024

--Learning a discriminative model to distinguish a target from its surrounding distractors is essential to generic visual object tracking. Dynamic target representation adaptation against distractors is challenging due to the limited discriminative capabilities of prevailing trackers. We present a new visual Prompting mechanism for generic Visual Object Tracking (PiVOT) to address this issue. PiVOT proposes a prompt generation network with the pre-trained foundation model CLIP to automatically generate and refine visual prompts, enabling the transfer of foundation model knowledge for tracking. While CLIP offers broad category-level knowledge, the tracker, trained on instance-specific data, excels at recognizing unique object instances. Thus, PiVOT first compiles a visual prompt highlighting potential target locations. T o transfer the knowledge of CLIP to the tracker, PiVOT leverages CLIP to refine the visual prompt based on the similarities between candidate objects and the reference templates across potential targets. Once the visual prompt is refined, it can better highlight potential target locations, thereby reducing irrelevant prompt information. With the proposed prompting mechanism, the tracker can generate improved instance-aware feature maps through the guidance of the visual prompt, thus effectively reducing distractors. The proposed method does not involve CLIP during training, thereby keeping the same training complexity and preserving the generalization capability of the pretrained foundation model. Extensive experiments across multiple benchmarks indicate that PiVOT, using the proposed prompting method can suppress distracting objects and enhance the tracker . ENERIC visual object tracking (GOT) estimates the state of a target object in every video sequence frame, given its initial state in the first frame. Learning a discriminative representation for this target object is essential to alleviate the interference from other distracting objects. Manuscript received March 23, 2024; revised May 27, 2024 and August 12, 2024; accepted September 20, 2024. We also thank to National Center for High-performance Computing (NCHC) of National Applied Research Laboratories (NARLabs) in Taiwan for providing computational and storage resources. Lin are with the Department of Computer Science, National Y ang Ming Chiao Tung University, Taiwan (e-mail: csf.cs09@nycu.edu.tw;

machine learning, natural language, tracker, (19 more...)

arXiv.org Artificial Intelligence

2409.18901

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

DDPG based on multi-scale strokes for financial time series trading strategy

Chen, Jun-Cheng, Chen, Cong-Xiao, Duan, Li-Juan, Cai, Zhi

arXiv.org Artificial IntelligenceJun-5-2022

With the development of artificial intelligence,more and more financial practitioners apply deep reinforcement learning to financial trading strategies.However,It is difficult to extract accurate features due to the characteristics of considerable noise,highly non-stationary,and non-linearity of single-scale time series,which makes it hard to obtain high returns.In this paper,we extract a multi-scale feature matrix on multiple time scales of financial time series,according to the classic financial theory-Chan Theory,and put forward to an approach of multi-scale stroke deep deterministic policy gradient reinforcement learning model(MSSDDPG)to search for the optimal trading strategy.We carried out experiments on the datasets of the Dow Jones,S&P 500 of U.S. stocks, and China's CSI 300,SSE Composite,evaluate the performance of our approach compared with turtle trading strategy, Deep Q-learning(DQN)reinforcement learning strategy,and deep deterministic policy gradient (DDPG) reinforcement learning strategy.The result shows that our approach gets the best performance in China CSI 300,SSE Composite,and get an outstanding result in Dow Jones,S&P 500 of U.S.

financial time sery trading strategy, machine learning, reinforcement learning, (3 more...)

arXiv.org Artificial Intelligence

2207.10071

Country: Asia > China (0.44)

Genre: Research Report (0.69)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback