AITopics | transformer 0

Collaborating Authors

transformer 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transformer-Based Model for Cold Start Mitigation in FaaS Architecture

Mouen, Alexandre Savi Fayam Mbala, Zeutouo, Jerry Lacmou, Tchendji, Vianney Kengne

arXiv.org Artificial IntelligenceApr-16-2025

IEEE TRANSACTIONS ON XXX 1 Transformer-Based Model for Cold Start Mitigation in FaaS Architecture Alexandre Savi Fayam Mbala Mouen, Jerry Lacmou Zeutouo, Vianney Kengne Tchendji Member, IEEE Abstract --Serverless architectures, particularly the Function as a Service (FaaS) model, have become a cornerstone of modern cloud computing due to their ability to simplify resource management and enhance application deployment agility. However, a significant challenge remains: the cold start problem. This phenomenon occurs when an idle FaaS function is invoked, requiring a full initialization process, which increases latency and degrades user experience. Existing solutions for cold start mitigation are limited in terms of invocation pattern generalization and implementation complexity. In this study, we propose an innovative approach leveraging Transformer models to mitigate the impact of cold starts in FaaS architectures. Experimental evaluation using a public dataset provided by Azure demonstrates a significant reduction in cold start times, reaching up to 79% compared to conventional methods. I NTRODUCTION T HE rapid emergence of cloud computing has transformed the digital industry by enabling immediate and flexible access to virtualized computing resources. Services such as storage, computing, and networking allow organizations to reduce costs [1] while benefiting from scalable, on-demand infrastructure managed by external providers. At the core of this evolution is the Function as a Service (FaaS) model, an innovative solution that allows functions to be deployed in response to specific events without requiring the management of underlying resources. This model, foundational to serverless architectures, provides increased flexibility and usage-based billing, reducing both costs and operational complexity for developers. Serverless architectures, built upon the FaaS model, abstract infrastructure management, allowing developers to focus solely on the code and features of their applications [2]. This approach optimizes application scalability and simplifies infrastructure management, which is a notable advantage in an era of massive automation.

cloud computing, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.11338

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.86)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Temporal-contextual Event Learning for Pedestrian Crossing Intent Prediction

Liang, Hongbin, Qiao, Hezhe, Huang, Wei, Wang, Qizhou, Shang, Mingsheng, Chen, Lin

arXiv.org Artificial IntelligenceApr-10-2025

Ensuring the safety of vulnerable road users through accurate prediction of pedestrian crossing intention (PCI) plays a crucial role in the context of autonomous and assisted driving. Analyzing the set of observation video frames in ego-view has been widely used in most PCI prediction methods to forecast the cross intent. However, they struggle to capture the critical events related to pedestrian behaviour along the temporal dimension due to the high redundancy of the video frames, which results in the sub-optimal performance of PCI prediction. Our research addresses the challenge by introducing a novel approach called \underline{T}emporal-\underline{c}ontextual Event \underline{L}earning (TCL). The TCL is composed of the Temporal Merging Module (TMM), which aims to manage the redundancy by clustering the observed video frames into multiple key temporal events. Then, the Contextual Attention Block (CAB) is employed to adaptively aggregate multiple event features along with visual and non-visual data. By synthesizing the temporal feature extraction and contextual attention on the key information across the critical events, TCL can learn expressive representation for the PCI prediction. Extensive experiments are carried out on three widely adopted datasets, including PIE, JAAD-beh, and JAAD-all. The results show that TCL substantially surpasses the state-of-the-art methods. Our code can be accessed at https://github.com/dadaguailhb/TCL.

machine learning, natural language, prediction, (20 more...)

arXiv.org Artificial Intelligence

2504.06292

Country: Asia > China (0.29)

Genre: Research Report > Promising Solution (0.68)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Infrastructure & Services (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

Training Frozen Feature Pyramid DINOv2 for Eyelid Measurements with Infinite Encoding and Orthogonal Regularization

Chen, Chun-Hung

arXiv.org Artificial IntelligenceApr-1-2025

Accurate measurement of eyelid parameters such as Margin Reflex Distances (MRD1, MRD2) and Levator Function (LF) is critical in oculoplastic diagnostics but remains limited by manual, inconsistent methods. This study evaluates deep learning models: SE-ResNet, EfficientNet, and the vision transformer-based DINOv2 for automating these measurements using smartphone-acquired images. We assess performance across frozen and fine-tuned settings, using MSE, MAE, and R2 metrics. DINOv2, pretrained through self-supervised learning, demonstrates superior scalability and robustness, especially under frozen conditions ideal for mobile deployment. Lightweight regressors such as MLP and Deep Ensemble offer high precision with minimal computational overhead. To address class imbalance and improve generalization, we integrate focal loss, orthogonal regularization, and binary encoding strategies. Our results show that DINOv2 combined with these enhancements delivers consistent, accurate predictions across all tasks, making it a strong candidate for real-world, mobile-friendly clinical applications. This work highlights the potential of foundation models in advancing AI-powered ophthalmic care.

artificial intelligence, machine learning, tabtransformer 1, (18 more...)

arXiv.org Artificial Intelligence

2504.00515

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.48)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Amortized In-Context Bayesian Posterior Estimation

Mittal, Sarthak, Bracher, Niels Leif, Lajoie, Guillaume, Jaini, Priyank, Brubaker, Marcus

arXiv.org Machine LearningFeb-10-2025

Bayesian inference provides a natural way of incorporating prior beliefs and assigning a probability measure to the space of hypotheses. Current solutions rely on iterative routines like Markov Chain Monte Carlo (MCMC) sampling and Variational Inference (VI), which need to be re-run whenever new observations are available. Amortization, through conditional estimation, is a viable strategy to alleviate such difficulties and has been the guiding principle behind simulation-based inference, neural processes and in-context methods using pre-trained models. In this work, we conduct a thorough comparative analysis of amortized in-context Bayesian posterior estimation methods from the lens of different optimization objectives and architectural choices. Such methods train an amortized estimator to perform posterior parameter inference by conditioning on a set of data examples passed as context to a sequence model such as a transformer. In contrast to language models, we leverage permutation invariant architectures as the true posterior is invariant to the ordering of context examples. Our empirical study includes generalization to out-of-distribution tasks, cases where the assumed underlying model is misspecified, and transfer from simulated to real problems. Subsequently, it highlights the superiority of the reverse KL estimator for predictive problems, especially when combined with the transformer architecture and normalizing flows.

machine learning, natural language, probabilistic model, (21 more...)

arXiv.org Machine Learning

2502.06601

Country:

North America > United States (0.27)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

Wang, Xiao, Wang, Fuling, Li, Yuehang, Ma, Qingchuan, Wang, Shiao, Jiang, Bo, Li, Chuanfu, Tang, Jin

arXiv.org Artificial IntelligenceOct-1-2024

X-ray image-based medical report generation (MRG) is a pivotal area in artificial intelligence which can significantly reduce diagnostic burdens and patient wait times. Despite significant progress, we believe that the task has reached a bottleneck due to the limited benchmark datasets and the existing large models' insufficient capability enhancements in this specialized domain. Specifically, the recently released CheXpert Plus dataset lacks comparative evaluation algorithms and their results, providing only the dataset itself. This situation makes the training, evaluation, and comparison of subsequent algorithms challenging. Thus, we conduct a comprehensive benchmarking of existing mainstream X-ray report generation models and large language models (LLMs), on the CheXpert Plus dataset. We believe that the proposed benchmark can provide a solid comparative basis for subsequent algorithms and serve as a guide for researchers to quickly grasp the state-of-the-art models in this field. More importantly, we propose a large model for the X-ray image report generation using a multi-stage pre-training strategy, including self-supervised autoregressive generation and Xray-report contrastive learning, and supervised fine-tuning. Extensive experimental results indicate that the autoregressive pre-training based on Mamba effectively encodes X-ray images, and the image-text contrastive pre-training further aligns the feature spaces, achieving better experimental results. Source code can be found on \url{https://github.com/Event-AHU/Medical_Image_Analysis}.

dataset, report generation, x-ray image, (14 more...)

arXiv.org Artificial Intelligence

2410.00379

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland (0.04)
Asia > China > Anhui Province > Hefei (0.04)
(6 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)

Add feedback

Flash STU: Fast Spectral Transform Units

Liu, Y. Isabel, Nguyen, Windsor, Devre, Yagiz, Dogariu, Evan, Majumdar, Anirudha, Hazan, Elad

arXiv.org Artificial IntelligenceSep-17-2024

This paper describes an efficient, open source PyTorch implementation of the Spectral Transform Unit. We investigate sequence prediction tasks over several modalities including language, robotics, and simulated dynamical systems. We find that for the same parameter count, the STU and its variants outperform the Transformer as well as other leading state space models across various modalities.

architecture, transformer, transformer 0, (15 more...)

arXiv.org Artificial Intelligence

2409.10489

Country:

South America (0.04)
North America > United States > Washington (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mamba for Scalable and Efficient Personalized Recommendations

Starnes, Andrew, Webster, Clayton

arXiv.org Artificial IntelligenceSep-11-2024

In this effort, we propose using the Mamba for handling tabular data in personalized recommendation systems. We present the \textit{FT-Mamba} (Feature Tokenizer\,$+$\,Mamba), a novel hybrid model that replaces Transformer layers with Mamba layers within the FT-Transformer architecture, for handling tabular data in personalized recommendation systems. The \textit{Mamba model} offers an efficient alternative to Transformers, reducing computational complexity from quadratic to linear by enhancing the capabilities of State Space Models (SSMs). FT-Mamba is designed to improve the scalability and efficiency of recommendation systems while maintaining performance. We evaluate FT-Mamba in comparison to a traditional Transformer-based model within a Two-Tower architecture on three datasets: Spotify music recommendation, H\&M fashion recommendation, and vaccine messaging recommendation. Each model is trained on 160,000 user-action pairs, and performance is measured using precision (P), recall (R), Mean Reciprocal Rank (MRR), and Hit Ratio (HR) at several truncation values. Our results demonstrate that FT-Mamba outperforms the Transformer-based model in terms of computational efficiency while maintaining or exceeding performance across key recommendation metrics. By leveraging Mamba layers, FT-Mamba provides a scalable and effective solution for large-scale personalized recommendation systems, showcasing the potential of the Mamba architecture to enhance both efficiency and accuracy.

mamba, recommendation, transformer, (14 more...)

arXiv.org Artificial Intelligence

2409.17165

Country: North America > United States > Tennessee > Knox County > Knoxville (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (0.71)
Leisure & Entertainment (0.71)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Feature Representations for Automatic Meerkat Vocalization Classification

Mahmoud, Imen Ben, Sarkar, Eklavya, Manser, Marta, -Doss, Mathew Magimai.

arXiv.org Artificial IntelligenceAug-27-2024

Understanding evolution of vocal communication in social animals is an important research problem. In that context, beyond humans, there is an interest in analyzing vocalizations of other social animals such as, meerkats, marmosets, apes. While existing approaches address vocalizations of certain species, a reliable method tailored for meerkat calls is lacking. To that extent, this paper investigates feature representations for automatic meerkat vocalization analysis. Both traditional signal processing-based representations and data-driven representations facilitated by advances in deep learning are explored. Call type classification studies conducted on two data sets reveal that feature extraction methods developed for human speech processing can be effectively employed for automatic meerkat call analysis.

feature representation, representation, vocalization, (15 more...)

arXiv.org Artificial Intelligence

2408.15296

Country: