AITopics | table 2

Collaborating Authors

table 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Unified Framework for Critical Scaling of Inverse Temperature in Self-Attention

Hayase, Tomohiro, Karakida, Ryo

arXiv.org Machine LearningMay-14-2026

Length-dependent logit rescaling is widely used to stabilize long-context self-attention, but existing analyses and methods suggest conflicting inverse-temperature laws for the context length $n$, ranging from $(\log n)^{1/2}$ to $\log n$ and $(\log n)^2$. We provide a general theory showing that the desirable scale is determined by the gap-counting function $N_n$ of each attention row. Counting how many competitors lie within each gap from the maximum, we define an upper-tail accumulation scale and prove that it gives the critical inverse-temperature scale for softmax concentration: below this scale, the top competitors remain unseparated, whereas above it, the attention entropy collapses. This framework unifies prior scaling laws as different $N_n$ and yields a direct diagnostic for attention-score families, from idealized theoretical models to more practical transformers.

artificial intelligence, logn, machine learning, (18 more...)

arXiv.org Machine Learning

2605.12697

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification: Appendix

Neural Information Processing SystemsMay-1-2026, 01:42:12 GMT

For ease and reliable comparison, we report the average of all Rank-1 and mAP results on all test datasets over several random runs for ablation study and parameter analysis. This is denoted by mAcc. There are three reasons that we use mAcc. It is a unified measure, which is convenient for algorithm comparison. Both Rank-1 and mAP are accuracy measures ranging from 0%-100%, thus averaging them is possible. Besides, if a method's mAcc is 1% higher than another method, on average it means that every single measure on each dataset has been increased by 1%, which is a perceptible achievement.

artificial intelligence, machine learning, transformer, (14 more...)

Neural Information Processing Systems

Country: Asia > Middle East > UAE (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

e6c2e85db1f1039177c4495ccd399ac4-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 03:23:00 GMT

A.1 Preliminary Study2 The basic GPT-2 model1 is trained from scratch on each corpus, which has 12 transformer blocks3 and 12 attention heads with 768 hidden dimensions. The Huggingface transformers [4] and Pytorch4 toolkit [2] are used to train the GPT-2 model in the distributed manner on A100 GPU server. The5 hyper-parameters during training are shown in Table 1.6 Hyper-parameter Value Optimization steps 100K Test interval 10K Dropout rate 0.1 Grad clipping 1.0 Learning rate 5e 5 Batch size 128 Maximum sequence length 256 Warmup steps 10K Learning scheduler Linear decay Random seed 0 Number of GPUs 4 Learning objective Cross-Entropy Loss Table 1: The hyper-parameters during GPT-2 training procedure. Most of the hyper-parameters for our proposed method are the same as that in Table 1 for better8 variable controlling. The specific hyper-parameters for our proposed method are the length of9 repetitive n-gram and its repetition dropout rate p, which are set as 2 and 0.6, respectively.10

category, large language model, machine learning, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Our 8BMoE model achieves stronger pre-training perplexity than its dense counterpart. However, a better perplexity does not always directly translate to downstream performance as demonstrated in Section 4.4. To this end, we compare fine-tuning performance of the 8B dense model and MoE model in Table 1. As shown in the table, our MoE model using expert choice routing consistently outperforms the dense model across the 11 tasks in GLUE and SuperGLUE. We evaluate the downstream task fine-tuning performance by varying the capacity factors.

artificial intelligence, expert choice, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

259a5df46308d60f8454bd4adcc3b462-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 03:28:03 GMT

As action decoder their mentioned architectures of is multimodal adopted in the in to paper Figure information generate, the 1. visual-gr natural with languages cross-attention ounded alignment conditioned blocks, decoder on while the is visual applied the visual-grounded input. Based on these deeply fused representations, we finally generate the predicted answers with the visual-grounded generation decoder. In this section, we describe the settings used when fine-tuning the pretrained models on various downstream tasks. We use RandomAugment [1] for data augmentation. The default settings for finetuning on each dataset are shown in Table 1.

artificial intelligence, downstream task, video, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.73)

Add feedback