AITopics | bidirectional context

Collaborating Authors

bidirectional context

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, Quoc V. Le

Neural Information Processing SystemsFeb-14-2026, 14:43:33 GMT

Neural Information Processing Systems http://nips.cc/

arxiv preprint arxiv, bert, objective, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback

486ff0b164cf92b0255fe39863bcf99e-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 17:34:45 GMT

bidirectional context, slm, transcormer, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Transcormer: TransformerforSentenceScoringwith SlidingLanguageModeling

Neural Information Processing SystemsFeb-8-2026, 17:34:41 GMT

Sentence scoring aims at measuring the likelihood score of a sentence and is widely usedinnatural language processing scenarios, likereranking, which isto select the best sentence from multiple candidates.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Neural Information Processing SystemsDec-26-2025, 01:27:49 GMT

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation.

generalized autoregressive pretraining, name change, xlnet, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

Neural Information Processing SystemsDec-24-2025, 03:42:13 GMT

Sentence scoring aims at measuring the likelihood score of a sentence and is widely used in many natural language processing scenarios, like reranking, which is to select the best sentence from multiple candidates. Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost. In this paper, we propose \textit{Transcormer} -- a Transformer model with a novel \textit{sliding language modeling} (SLM) for sentence scoring. Specifically, our SLM adopts a triple-stream self-attention mechanism to estimate the probability of all tokens in a sentence with bidirectional context and only requires a single forward pass. SLM can avoid the limitations of CLM (only unidirectional context) and MLM (multiple forward passes) and inherit their advantages, and thus achieve high effectiveness and efficiency in scoring. Experimental results on multiple tasks demonstrate that our method achieves better performance than other language modelings.

language modeling, sentence scoring, transcormer, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

Neural Information Processing SystemsDec-23-2025, 20:22:49 GMT

Autoregressive models and their sequential factorization of the data likelihood have recently demonstrated great potential for image representation and synthesis. Nevertheless, they incorporate image context in a linear 1D order by attending only to previously synthesized image patches above or to the left. Not only is this unidirectional, sequential bias of attention unnatural for images as it disregards large parts of a scene until synthesis is almost complete. It also processes the entire image on a single scale, thus ignoring more global contextual information up to the gist of the entire scene. As a remedy we incorporate a coarse-to-fine hierarchy of context by combining the autoregressive formulation with a multinomial diffusion process: Whereas a multistage diffusion process successively compresses and removes information to coarsen an image, we train a Markov chain to invert this process. In each stage, the resulting autoregressive ImageBART model progressively incorporates context from previous stages in a coarse-to-fine manner. Experiments demonstrate the gain over current autoregressive models, continuous diffusion probabilistic models, and latent variable models. Moreover, the approach enables to control the synthesis process and to trade compression rate against reconstruction accuracy, while still guaranteeing visually plausible results.

autoregressive image synthesis, bidirectional context, multinomial diffusion, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.60)

Add feedback

Spanning Tree Autoregressive Visual Generation

Lee, Sangkyu, Lee, Changho, Han, Janghoon, Song, Hosung, You, Tackgeun, Lim, Hwasup, Choi, Stanley Jungkyu, Lee, Honglak, Yu, Youngjae

arXiv.org Artificial IntelligenceNov-24-2025

W e present Spanning Tree Autoregressive (STAR) modeling, which can incorporate prior knowledge of images, such as center bias and locality, to maintain sampling performance while also providing sufficiently flexible sequence orders to accommodate image editing at inference. Approaches that expose randomly permuted sequence orders to conventional autoregressive (AR) models in visual generation for bidirectional context either suffer from a decline in performance or compromise the flexibility in sequence order choice at inference. Instead, STAR utilizes traversal orders of uniform spanning trees sampled in a lattice defined by the positions of image patches. Traversal orders are obtained through breadth-first search, allowing us to efficiently construct a spanning tree whose traversal order ensures that the connected partial observation of the image appears as a prefix in the sequence through rejection sampling. Through the tailored yet structured randomized strategy compared to random permutation, STAR preserves the capability of postfix completion while maintaining sampling performance without any significant changes to the model architecture widely adopted in the language AR modeling.

machine learning, natural language, sequence order, (17 more...)

arXiv.org Artificial Intelligence

2511.17089

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, Quoc V. Le

Neural Information Processing SystemsAug-20-2025, 05:38:26 GMT

arxiv preprint arxiv, bert, objective, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback

A Appendix

Neural Information Processing SystemsAug-14-2025, 15:12:19 GMT

Hyper-parameter Setup The pre-training hyper-parameters of Transcormer are described in Table 8. As mentioned in Section 2.1, some works [ MLM model caused by N-passes. K tokens via masked prediction as the final sentence probability. To fulfill this target, DLM only feeds word embeddings as the key/value for each Transformer layer, rather than the previous layer. Just as discussed in Section 3.3, this model learns forward and backward A.3 Results A.3.1 Comparison with other works As aforementioned, previous works [35, 34] have tried some strategies to calculate the probabilities MLM adopts one bidirectional context and SLM adopts forward and backward contexts.

bidirectional context, slm, transcormer, (16 more...)

Neural Information Processing Systems

Technology: