AITopics | positional embedding

Collaborating Authors

positional embedding

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GeoPE:A Unified Geometric Positional Embedding for Structured Tensors

Yao, Yupu, Yang, Bowen

arXiv.org Artificial IntelligenceDec-5-2025

Standard Vision Transformers flatten 2D images into 1D sequences, disrupting the natural spatial topology. While Rotary Positional Embedding (RoPE) excels in 1D, it inherits this limitation, often treating spatially distant patches (e.g., at row edges) as sequence neighbors. Existing 2D approaches typically treat spatial axes independently, failing to decouple this false sequential proximity from true spatial distance. To restore the 2D spatial manifold, we introduce Geometric Positional Embedding (GeoPE), a framework that extends rotations to 3D Euclidean space using quaternions. To overcome non-commutativity and ensure symmetry, GeoPE constructs a unified rotational operator by computing the geometric mean in the Lie algebra. This creates a geometrically coupled encoding that effectively separates spatial dimensions. Extensive experiments on image classification, object detection, and 3D semantic segmentation demonstrate that GeoPE consistently outperforms existing 2D RoPE variants and significantly enhances shape bias, confirming its ability to capture true geometric structure.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.04963

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers

Chowdhury, Md Abtahi Majeed, Rahman, Md Rifat Ur, Taki, Akil Ahmad

arXiv.org Artificial IntelligenceApr-22-2025

Positional embeddings (PE) play a crucial role in Vision Transformers (ViTs) by providing spatial information otherwise lost due to the permutation invariant nature of self attention. While absolute positional embeddings (APE) have shown theoretical advantages over relative positional embeddings (RPE), particularly due to the ability of sinusoidal functions to preserve spatial inductive biases like monotonicity and shift invariance, a fundamental challenge arises when mapping a 2D grid to a 1D sequence. Existing methods have mostly overlooked or never explored the impact of patch ordering in positional embeddings. To address this, we propose LOOPE, a learnable patch-ordering method that optimizes spatial representation for a given set of frequencies, providing a principled approach to patch order optimization. Empirical results show that our PE significantly improves classification accuracy across various ViT architectures. To rigorously evaluate the effectiveness of positional embeddings, we introduce the "Three Cell Experiment", a novel benchmarking framework that assesses the ability of PEs to retain relative and absolute positional information across different ViT architectures. Unlike standard evaluations, which typically report a performance gap of 4 to 6% between models with and without PE, our method reveals a striking 30 to 35% difference, offering a more sensitive diagnostic tool to measure the efficacy of PEs. Our experimental analysis confirms that the proposed LOOPE demonstrates enhanced effectiveness in retaining both relative and absolute positional information.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.14386

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Using Machine Learning for move sequence visualization and generation in climbing

Rimbot, Thomas, Jaggi, Martin, Barba, Luis

arXiv.org Artificial IntelligenceMar-1-2025

Using Machine Learning for move sequence visualization and generation in climbing Thomas Rimbot, Martin Jaggi, Luis Barba - EPFL Abstract --In this work, we investigate the application of Machine Learning techniques to sport climbing. Expanding upon previous projects, we develop a visualization tool for move sequence evaluation on a given boulder . Then, we look into move sequence prediction from simple holds sequence information using three different Transformer models. While the results are not conclusive, they are a first step in this kind of approach and lay the ground for future work. I NTRODUCTION Applying Machine Learning techniques to competitive sport has been an increasing trend in the past few years. We can for example cite the case of car racing or hockey. In this project, we focus on bouldering, a form of rock climbing where athletes are tasked with overcoming a small natural or artificial feature (about 4m high), requiring both physical strengths and problem-solving skills.

information, move sequence, sequence, (12 more...)

arXiv.org Artificial Intelligence

2503.00458

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Sports (0.54)
Health & Medicine (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation

Agrawal, Atharva Mangeshkumar, Shinde, Rutika Pandurang, Bhukya, Vasanth Kumar, Chakraborty, Ashmita, Shah, Sagar Bharat, Shukla, Tanmay, Relangi, Sree Pradeep Kumar, Mutyam, Nilesh

arXiv.org Artificial IntelligenceFeb-4-2025

Large language models (LLMs) have shown impressive capabilities in natural language processing tasks, including dialogue generation. This research aims to conduct a novel comparative analysis of two prominent techniques, fine-tuning with LoRA (Low-Rank Adaptation) and the Retrieval-Augmented Generation (RAG) framework, in the context of doctor-patient chat conversations with multiple datasets of mixed medical domains. The analysis involves three state-of-the-art models: Llama-2, GPT, and the LSTM model. Employing real-world doctor-patient dialogues, we comprehensively evaluate the performance of models, assessing key metrics such as language quality (perplexity, BLEU score), factual accuracy (fact-checking against medical knowledge bases), adherence to medical guidelines, and overall human judgments (coherence, empathy, safety). The findings provide insights into the strengths and limitations of each approach, shedding light on their suitability for healthcare applications. Furthermore, the research investigates the robustness of the models in handling diverse patient queries, ranging from general health inquiries to specific medical conditions. The impact of domain-specific knowledge integration is also explored, highlighting the potential for enhancing LLM performance through targeted data augmentation and retrieval strategies.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.02249

Country:

North America > United States > Arizona (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Health Care Providers & Services > Reimbursement (0.40)
Government > Regional Government > North America Government > United States Government (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Interchangeability of Positional Embeddings in Multilingual Neural Machine Translation Models

Gumma, Varun, Chitale, Pranjal A., Bali, Kalika

arXiv.org Artificial IntelligenceAug-21-2024

Standard Neural Machine Translation (NMT) models have traditionally been trained with Sinusoidal Positional Embeddings (PEs), which are inadequate for capturing long-range dependencies and are inefficient for long-context or document-level translation. In contrast, state-of-the-art large language models (LLMs) employ relative PEs, demonstrating superior length generalization. This work explores the potential for efficiently switching the Positional Embeddings of pre-trained NMT models from absolute sinusoidal PEs to relative approaches such as RoPE and ALiBi. Our findings reveal that sinusoidal PEs can be effectively replaced with RoPE and ALiBi with negligible or no performance loss, achieved by fine-tuning on a small fraction of high-quality data. Additionally, models trained without Positional Embeddings (NoPE) are not a viable solution for Encoder-Decoder architectures, as they consistently under-perform compared to models utilizing any form of Positional Embedding. Furthermore, even a model trained from scratch with these relative PEs slightly under-performs a fine-tuned model, underscoring the efficiency and validity of our hypothesis.

computational linguistic, positional embedding, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2408.11382

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > California (0.04)
(12 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback