AITopics | dfp

Collaborating Authors

dfp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison

Lam, Tsz Kin, Gaido, Marco, Papi, Sara, Bentivogli, Luisa, Haddow, Barry

arXiv.org Artificial IntelligenceJan-4-2025

Following the remarkable success of Large Language Models (LLMs) in NLP tasks, there is increasing interest in extending their capabilities to speech -- the most common form in communication. To integrate speech into LLMs, one promising approach is dense feature prepending (DFP) which prepends the projected speech representations to the textual representations, allowing end-to-end training with the speech encoder. However, DFP typically requires connecting a text decoder to a speech encoder. This raises questions about the importance of having a sophisticated speech encoder for DFP, and how its performance compares with a standard encoder-decoder (i.e. cross-attention) architecture. In order to perform a controlled architectural comparison, we train all models from scratch, rather than using large pretrained models, and use comparable data and parameter settings, testing speech-to-text recognition (ASR) and translation (ST) on MuST-C v1.0 and CoVoST2 datasets. We study the influence of a speech encoder in DFP. More importantly, we compare DFP and cross-attention under a variety of configurations, such as CTC compression, sequence-level knowledge distillation, generation speed and GPU memory footprint on monolingual, bilingual and multilingual models. Despite the prevalence of DFP over cross-attention, our overall results do not indicate a clear advantage of DFP.

computational linguistic, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2501.0237

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Thailand > Bangkok > Bangkok (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Near-Field Spot Beamfocusing: A Correlation-Aware Transfer Learning Approach

Fallah, Mohammad Amir, Monemi, Mehdi, Rasti, Mehdi, Latva-Aho, Matti

arXiv.org Artificial IntelligenceMay-21-2024

3D spot beamfocusing (SBF), in contrast to conventional angular-domain beamforming, concentrates radiating power within very small volume in both radial and angular domains in the near-field zone. Recently the implementation of channel-state-information (CSI)-independent machine learning (ML)-based approaches have been developed for effective SBF using extremely-largescale-programable-metasurface (ELPMs). These methods involve dividing the ELPMs into subarrays and independently training them with Deep Reinforcement Learning to jointly focus the beam at the Desired Focal Point (DFP). This paper explores near-field SBF using ELPMs, addressing challenges associated with lengthy training times resulting from independent training of subarrays. To achieve a faster CSIindependent solution, inspired by the correlation between the beamfocusing matrices of the subarrays, we leverage transfer learning techniques. First, we introduce a novel similarity criterion based on the Phase Distribution Image of subarray apertures. Then we devise a subarray policy propagation scheme that transfers the knowledge from trained to untrained subarrays. We further enhance learning by introducing Quasi-Liquid-Layers as a revised version of the adaptive policy reuse technique. We show through simulations that the proposed scheme improves the training speed about 5 times. Furthermore, for dynamic DFP management, we devised a DFP policy blending process, which augments the convergence rate up to 8-fold.

dfp, elpm, subarray, (17 more...)

arXiv.org Artificial Intelligence

2405.19347

Country:

Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
Asia > Middle East > Iran > Fars Province > Shiraz (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
(6 more...)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Dynamic Prompt Optimizing for Text-to-Image Generation

Mo, Wenyi, Zhang, Tianyu, Bai, Yalong, Su, Bing, Wen, Ji-Rong, Yang, Qing

arXiv.org Artificial IntelligenceApr-5-2024

Text-to-image generative models, specifically those based on diffusion models like Imagen and Stable Diffusion, have made substantial advancements. Recently, there has been a surge of interest in the delicate refinement of text prompts. Users assign weights or alter the injection time steps of certain words in the text prompts to improve the quality of generated images. However, the success of fine-control prompts depends on the accuracy of the text prompts and the careful selection of weights and time steps, which requires significant manual intervention. To address this, we introduce the \textbf{P}rompt \textbf{A}uto-\textbf{E}diting (PAE) method. Besides refining the original prompts for image generation, we further employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts. The reward function during training encourages the model to consider aesthetic score, semantic consistency, and user preferences. Experimental results demonstrate that our proposed method effectively improves the original prompts, generating visually more appealing images while maintaining semantic alignment. Code is available at https://github.com/Mowenyii/PAE.

artstation, concept art, digital painting, (16 more...)

arXiv.org Artificial Intelligence

2404.04095

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
(7 more...)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Deep Fusion Prior for Plenoptic Super-Resolution All-in-Focus Imaging

Gu, Yuanjie, Guan, Yinghan, Xiao, Zhibo, Dai, Haoran, Liu, Cheng, Wang, Shouyu

arXiv.org Artificial IntelligenceOct-15-2022

Plenoptic imaging offers not only 2-D projections but also adds light array directions, thus supporting single-shot all-in-focus imaging. While its poor spatial resolution becomes an obstacle to high-quality all-in-focus imaging performance. Although various super-resolution (SR) methods have been designed and combined with multifocus image fusion (MFIF), high-quality multi-focus fused super-resolution images can be reconstructed for various applications, almost all of them deal with MFIF and SR separately. To our best knowledge, we first unify MFIF and SR problems as the multi-focus image super-resolution fusion (MFISRF) in the optical perspective and thus propose a novel dataset-free unsupervised framework named deep fusion prior (DFP) to address such MFISRF, particularly for plenoptic super-resolution all-in-focus imaging. Both numerical and practical experiments have proved that our proposed DFP approaches or even outperforms those state-of-the-art MFIF and SR method combinations. Therefore, we believe DFP can be potentially used in various computational photography applications.

all-in-focus imaging, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1117/1.OE.61.12.123103

2110.05706

Country:

Asia > China > Jiangsu Province (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Media > Photography (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.91)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.57)

Add feedback