AITopics | position prediction

Collaborating Authors

position prediction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions

Neural Information Processing SystemsFeb-15-2026, 21:03:20 GMT

To answer this question, we begin by revisiting the forward procedure of ViTs. A sequence of positional embeddings (PEs) [51] is added to patch embeddings to preserve position information. Intuitively, simply discarding these PEs and requesting the model to reconstruct the position for each patch naturally becomes a qualified location-aware pretext task.

artificial intelligence, computer vision, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Heilongjiang Province > Daqing (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Learning the relative composition of EEG signals using pairwise relative shift pretraining

Sandino, Christopher, Lala, Sayeri, Chau, Geeling, Ayoughi, Melika, Mahasseni, Behrooz, Zippi, Ellen, Moin, Ali, Azemi, Erdrin, Goh, Hanlin

arXiv.org Artificial IntelligenceNov-18-2025

Self-supervised learning (SSL) offers a promising approach for learning electroencephalography (EEG) representations from unlabeled data, reducing the need for expensive annotations for clinical applications like sleep staging and seizure detection. While current EEG SSL methods predominantly use masked reconstruction strategies like masked autoencoders (MAE) that capture local temporal patterns, position prediction pretraining remains underexplored despite its potential to learn long-range dependencies in neural signals. We introduce PAirwise Relative Shift or PARS pretraining, a novel pretext task that predicts relative temporal shifts between randomly sampled EEG window pairs. Unlike reconstruction-based methods that focus on local pattern recovery, PARS encourages encoders to capture relative temporal composition and long-range dependencies inherent in neural signals. Through comprehensive evaluation on various EEG decoding tasks, we demonstrate that PARS-pretrained transformers consistently outperform existing pretraining strategies in label-efficient and transfer learning settings, establishing a new paradigm for self-supervised EEG representation learning.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.1194

Country: North America > United States (0.93)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Velocity and Acceleration: Self-Supervised Motion Consistency for Pedestrian Trajectory Prediction

Huang, Yizhou, Cheng, Yihua, Wang, Kezhi

arXiv.org Artificial IntelligenceMar-31-2025

Understanding human motion is crucial for accurate pedestrian trajectory prediction. Conventional methods typically rely on supervised learning, where ground-truth labels are directly optimized against predicted trajectories. This amplifies the limitations caused by long-tailed data distributions, making it difficult for the model to capture abnormal behaviors. In this work, we propose a self-supervised pedestrian trajectory prediction framework that explicitly models position, velocity, and acceleration. We leverage velocity and acceleration information to enhance position prediction through feature injection and a self-supervised motion consistency mechanism. Our model hierarchically injects velocity features into the position stream. Acceleration features are injected into the velocity stream. This enables the model to predict position, velocity, and acceleration jointly. From the predicted position, we compute corresponding pseudo velocity and acceleration, allowing the model to learn from data-generated pseudo labels and thus achieve self-supervised learning. We further design a motion consistency evaluation strategy grounded in physical principles; it selects the most reasonable predicted motion trend by comparing it with historical dynamics and uses this trend to guide and constrain trajectory generation. We conduct experiments on the ETH-UCY and Stanford Drone datasets, demonstrating that our method achieves state-of-the-art performance on both datasets.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2503.24272

Country:

Asia > China > Guangxi Province > Nanning (0.04)
North America > United States (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning

Xu, Zhu, Zhao, Zhiqiang, Zhang, Zihan, Liu, Yuchi, Shen, Quanwei, Liu, Fei, Kuang, Yu, He, Jian, Liu, Conglin

arXiv.org Artificial IntelligenceDec-17-2024

Tokenization methods like Byte-Pair Encoding (BPE) enhance computational efficiency in large language models (LLMs) but often obscure internal character structures within tokens. This limitation hinders LLMs' ability to predict precise character positions, which is crucial in tasks like Chinese Spelling Correction (CSC) where identifying the positions of misspelled characters accelerates correction processes. We propose Token Internal Position Awareness (TIPA), a method that significantly improves models' ability to capture character positions within tokens by training them on reverse character prediction tasks using the tokenizer's vocabulary. Experiments demonstrate that TIPA enhances position prediction accuracy in LLMs, enabling more precise identification of target characters in original text. Furthermore, when applied to downstream tasks that do not require exact position prediction, TIPA still boosts performance in tasks needing character-level information, validating its versatility and effectiveness.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.17679

Country:

Asia > Southeast Asia (0.04)
Asia > China > Liaoning Province > Dalian (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Label-Efficient Sleep Staging Using Transformers Pre-trained with Position Prediction

Lala, Sayeri, Goh, Hanlin, Sandino, Christopher

arXiv.org Artificial IntelligenceMar-29-2024

Sleep staging is a clinically important task for diagnosing various sleep disorders, but remains challenging to deploy at scale because it because it is both labor-intensive and time-consuming. Supervised deep learning-based approaches can automate sleep staging but at the expense of large labeled datasets, which can be unfeasible to procure for various settings, e.g., uncommon sleep disorders. While self-supervised learning (SSL) can mitigate this need, recent studies on SSL for sleep staging have shown performance gains saturate after training with labeled data from only tens of subjects, hence are unable to match peak performance attained with larger datasets. We hypothesize that the rapid saturation stems from applying a sub-optimal pretraining scheme that pretrains only a portion of the architecture, i.e., the feature encoder, but not the temporal encoder; therefore, we propose adopting an architecture that seamlessly couples the feature and temporal encoding and a suitable pretraining scheme that pretrains the entire model. On a sample sleep staging dataset, we find that the proposed scheme offers performance gains that do not saturate with amount of labeled training data (e.g., 3-5\% improvement in balanced sleep staging accuracy across low- to high-labeled data settings), reducing the amount of labeled training data needed for high performance (e.g., by 800 subjects). Based on our findings, we recommend adopting this SSL paradigm for subsequent work on SSL for sleep staging.

architecture, dataset, sleep staging, (15 more...)

arXiv.org Artificial Intelligence

2404.15308

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Illinois > DuPage County > Darien (0.04)

Genre: Research Report > New Finding (0.89)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Yang, Dong, Koriyama, Tomoki, Saito, Yuki, Saeki, Takaaki, Xin, Detai, Saruwatari, Hiroshi

arXiv.org Artificial IntelligenceFeb-27-2023

Pause insertion, also known as phrase break prediction and phrasing, is an essential part of TTS systems because proper pauses with natural duration significantly enhance the rhythm and intelligibility of synthetic speech. However, conventional phrasing models ignore various speakers' different styles of inserting silent pauses, which can degrade the performance of the model trained on a multi-speaker speech corpus. To this end, we propose more powerful pause insertion frameworks based on a pre-trained language model. Our approach uses bidirectional encoder representations from transformers (BERT) pre-trained on a large-scale text corpus, injecting speaker embedding to capture various speaker characteristics. We also leverage duration-aware pause insertion for more natural multi-speaker TTS. We develop and evaluate two types of models. The first improves conventional phrasing models on the position prediction of respiratory pauses (RPs), i.e., silent pauses at word transitions without punctuation. It performs speaker-conditioned RP prediction considering contextual information and is used to demonstrate the effect of speaker information on the prediction. The second model is further designed for phoneme-based TTS models and performs duration-aware pause insertion, predicting both RPs and punctuation-indicated pauses (PIPs) that are categorized by duration. The evaluation results show that our models improve the precision and recall of pause insertion and the rhythm of synthetic speech.

machine learning, natural language, prediction, (19 more...)

arXiv.org Artificial Intelligence

2302.13652

Country:

Europe > Austria > Vienna (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Quebec > Montreal (0.05)
(17 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback

Position prediction using disturbance observer for planar pushing

Kim, Jongrae

arXiv.org Artificial IntelligenceJan-17-2023

The position and the orientation of a rigid body object pushed by a robot on a planar surface are extremely difficult to predict. In this paper, the prediction problem is formulated as a disturbance observer design problem. The disturbance observer provides accurate estimation of the total sum of model errors and external disturbances acting on the object. From the estimation results, it is revealed that there is a strong linear relationship between the applied force or torque and the estimated disturbances. The proposed prediction algorithm has two phases: the identification & the prediction. During the identification phase, the linear relationship is identified from the observer output using a recursive least-square algorithm. In the prediction phase, the identified linear relationship is used with a force plan, which would be provided by a mission planner, to predict the position and the orientation of an object. The algorithm is tested for six different push experimental data available from the MIT MCube Lab. The proposed algorithm shows improved performance in reducing the prediction error compared to a simple correction algorithm.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.06896

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (0.70)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Position Prediction as an Effective Pretraining Strategy

Zhai, Shuangfei, Jaitly, Navdeep, Ramapuram, Jason, Busbridge, Dan, Likhomanenko, Tatiana, Cheng, Joseph Yitan, Talbott, Walter, Huang, Chen, Goh, Hanlin, Susskind, Joshua

arXiv.org Artificial IntelligenceJul-15-2022

Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Transformer has been unlocked by self-supervised pretraining strategies based on masked autoencoders which rely on reconstructing masked inputs, directly, or contrastively from unmasked content. This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives. In this paper, we propose a novel, but surprisingly simple alternative to content reconstruction~-- that of predicting locations from content, without providing positional information for it. Doing so requires the Transformer to understand the positional relationships between different parts of the input, from their content alone. This amounts to an efficient implementation where the pretext task is a classification problem among all possible positions for each input token. We experiment on both Vision and Speech benchmarks, where our approach brings improvements over strong supervised training baselines and is comparable to modern unsupervised/self-supervised pretraining methods. Our method also enables Transformers trained without position embeddings to outperform ones trained with full position information.

position prediction, representation, transformer, (14 more...)

arXiv.org Artificial Intelligence

2207.07611

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Hierarchical Bayesian Model for the Transfer of Knowledge on Spatial Concepts based on Multimodal Information

Hagiwara, Yoshinobu, Taguchi, Keishiro, Ishibushi, Satoshi, Taniguchi, Akira, Taniguchi, Tadahiro

arXiv.org Artificial IntelligenceMar-10-2021

This paper proposes a hierarchical Bayesian model based on spatial concepts that enables a robot to transfer the knowledge of places from experienced environments to a new environment. The transfer of knowledge based on spatial concepts is modeled as the calculation process of the posterior distribution based on the observations obtained in each environment with the parameters of spatial concepts generalized to environments as prior knowledge. We conducted experiments to evaluate the generalization performance of spatial knowledge for general places such as kitchens and the adaptive performance of spatial knowledge for unique places such as `Emma's room' in a new environment. In the experiments, the accuracies of the proposed method and conventional methods were compared in the prediction task of location names from an image and a position, and the prediction task of positions from a location name. The experimental results demonstrated that the proposed method has a higher prediction accuracy of location names and positions than the conventional method owing to the transfer of knowledge.

information, location name, spatial concept, (16 more...)

arXiv.org Artificial Intelligence

2103.06442

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
(2 more...)

Add feedback

Filters

Collaborating Authors

position prediction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions

Learning the relative composition of EEG signals using pairwise relative shift pretraining

9098e2901b4eb54772f83535f89cb8ac-Paper-Conference.pdf

Learning Velocity and Acceleration: Self-Supervised Motion Consistency for Pedestrian Trajectory Prediction

Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning

Label-Efficient Sleep Staging Using Transformers Pre-trained with Position Prediction

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Position prediction using disturbance observer for planar pushing

Position Prediction as an Effective Pretraining Strategy

Hierarchical Bayesian Model for the Transfer of Knowledge on Spatial Concepts based on Multimodal Information