AITopics | better

Collaborating Authors

better

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tanh Works Better with Asymmetry

Neural Information Processing SystemsDec-24-2025, 07:34:40 GMT

Batch Normalization is commonly located in front of activation functions, as proposed by the original paper. Swapping the order, i.e., using Batch Normalization after activation functions, has also been attempted, but its performance is generally not much different from the conventional order when ReLU or a similar activation function is used. However, in the case of bounded activation functions like Tanh, we discovered that the swapped order achieves considerably better performance than the conventional order on various benchmarks and architectures. This paper reports this remarkable phenomenon and closely examines what contributes to this performance improvement. By looking at the output distributions of individual activation functions, not the whole layers, we found that many of them are asymmetrically saturated.

activation function, batch normalization, name change, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

What Makes Multi-Modal Learning Better than Single (Provably)

Neural Information Processing SystemsDec-24-2025, 04:17:05 GMT

The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning, there is an influential line of work on deep multi-modal learning, which has remarkable empirical results on various applications. However, theoretical justifications in this field are notably lacking. Can multi-modal learning provably perform better than uni-modal?In this paper, we answer this question under a most popular multi-modal fusion framework, which firstly encodes features from different modalities into a common latent space and seamlessly maps the latent representations into the task space. We prove that learning with multiple modalities achieves a smaller population risk than only using its subset of modalities. The main intuition is that the former has a more accurate estimate of the latent space representation. To the best of our knowledge, this is the first theoretical treatment to capture important qualitative phenomena observed in real multi-modal applications from the generalization perspective. Combining with experiment results, we show that multi-modal learning does possess an appealing formal guarantee.

modality, name change, provably, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Neural Information Processing SystemsDec-23-2025, 21:18:50 GMT

Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-$k$. First proposed by Seide et al [2014] as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018]. While these early breakthroughs were followed by a steady stream of works offering various improvements and generalizations, the current theoretical understanding of EF is still very limited. Indeed, to the best of our knowledge, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such as global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost. In this work we fix all these deficiencies by proposing and analyzing a new EF mechanism, which we call EF21, which consistently and substantially outperforms EF in practice. Moreover, our theoretical analysis relies on standard assumptions only, works in the distributed heterogeneous data setting, and leads to better and more meaningful rates. In particular, we prove that EF21 enjoys a fast $\mathcal{O}(1/T)$ convergence rate for smooth nonconvex problems, beating the previous bound of $\mathcal{O}(1/T^{2/3})$, which was shown under a strong bounded gradients assumption. We further improve this to a fast linear rate for Polyak-Lojasiewicz functions, which is the first linear convergence result for an error feedback method not relying on unbiased compressors. Since EF has a large number of applications where it reigns supreme, we believe that our 2021 variant, EF21, will have a large impact on the practice of communication efficient distributed learning.

ef21, faster error feedback, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback

Do Adversarially Robust ImageNet Models Transfer Better?

Neural Information Processing SystemsDec-23-2025, 20:58:37 GMT

Transfer learning is a widely-used paradigm in deep learning, where models pre-trained on standard datasets can be efficiently adapted to downstream tasks. Typically, better pre-trained models yield better transfer results, suggesting that initial accuracy is a key aspect of transfer learning performance. In this work, we identify another such aspect: we find that adversarially robust models, while less accurate, often perform better than their standard-trained counterparts when used for transfer learning. Specifically, we focus on adversarially robust ImageNet classifiers, and show that they yield improved accuracy on a standard suite of downstream classification tasks.

better, name change, proceedings, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

Neural Information Processing SystemsDec-23-2025, 20:01:42 GMT

Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown.

deep feedforward network, name change, neural tangent kernel perspective, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Okapi: Generalising Better by Making Statistical Matches Match

Neural Information Processing SystemsDec-23-2025, 17:52:17 GMT

We propose Okapi, a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching. Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss, while eliminating statistical outliers. In order to perform the online matching in a runtime-and memory-efficient way, we draw upon the self-supervised literature and combine a memory bank with a slow-moving momentum encoder. The consistency loss is applied within the feature space, rather than on the predictive distribution, making the method agnostic to both the modality and the task in question. We experiment on the WILDS 2.0 datasets Sagawa et al., which significantly expands the range of modalities, applications, and shifts available for studying and benchmarking real-world unsupervised adaptation. Contrary to Sagawa et al., we show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation (ERM) results with the right method. Our method outperforms the baseline methods in terms of out-of-distribution (OOD) generalisation on the iWildCam (a multi-class classification task) and PovertyMap (a regression task) image datasets as well as the CivilComments (a binary classification task) text dataset. Furthermore, from a qualitative perspective, we show the matches obtained from the learned encoder are strongly semantically related. Code for our paper is publicly available at https://github.com/wearepal/okapi/.

name change, okapi, statistical match match, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fans Call on Taylor Swift to 'Do Better' After Accusations of Using AI for Promo Videos

WIREDOct-8-2025, 20:53:21 GMT

Fans Call on Taylor Swift to'Do Better' After Accusations of Using AI for Promo Videos A scavenger hunt campaign to promote Taylor Swift's new album, resulted in a viral #SwiftiesAgainstAI campaign. Fans attend a screening of at a theater in Los Angeles. These were just some of the alleged clues that fans spotted in promo videos for Taylor Swift's new album,, this weekend. They were, to their eyes, telltale indicators that the videos were purportedly made with generative AI . "The first sign that it was AI was that it didn't look great," claims Marcela Lobo, a graphic designer in Brazil who has been a Swift fan since she was 12. "It was wonky, the shadows didn't match, the windows and the painted piano, it looked like shit, basically."

swift, taylor swift, video, (15 more...)

WIRED

Country:

South America > Brazil (0.25)
North America > United States > California > Los Angeles County > Los Angeles (0.25)
North America > United States > California > San Francisco County > San Francisco (0.05)
(2 more...)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Applied AI (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.39)

Add feedback

Multimodal Large Language Models Make Text-to-Image Generative Models Align Better

Neural Information Processing SystemsMay-27-2025, 09:18:14 GMT

Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts. Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited applicability for instruction tuning in open-source text-to-image generative models and hinder further exploration. To address these challenges and promote the alignment of generative models through instruction tuning, we leverage multimodal large language models to create VisionPrefer, a high-quality and fine-grained preference dataset that captures multiple preference aspects. We aggregate feedback from AI annotators across four aspects: prompt-following, aesthetic, fidelity, and harmlessness to construct VisionPrefer. To validate the effectiveness of VisionPrefer, we train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.

alignment, generative model, visionprefer, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback

Early Convolutions Help Transformers See Better

Neural Information Processing SystemsMay-27-2025, 07:27:17 GMT

In particular, they are sensitive to the choice of optimizer (AdamW vs. SGD), optimizer hyperparameters, and training schedule length. In comparison, modern convolutional neural networks are easier to optimize. Why is this the case? In this work, we conjecture that the issue lies with the patchify stem of ViT models, which is implemented by a stride-p p p convolution (p 16 by default) applied to the input image. This large-kernel plus large-stride convolution runs counter to typical design choices of convolutional layers in neural networks.

better, convolutional stem, vit model, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: Are ResNets Provably Better than Linear Predictors?

Neural Information Processing SystemsOct-7-2024, 07:00:50 GMT

I also tend to agree with the authors that the obtained results, relatively speaking, are significant and do shed new insights in understanding ResNet. As such I voted for acceptance (without strong opinion) although the outcome could be either... end The main goal of this work is to understand the effect of skip-connections in ResNet, through the lens of optimization. Although ResNet is strictly more powerful than simple linear regression (in the sense that linear regression is a special case of ResNet, if the weights follow a trivial pattern), its optimization may be more challenging than the linear regression special case. The authors formally ruled out this possibility by proving that any local minima of a particular ResNet architecture, or more generally any approximate stationary point, has objective value no larger than that of linear regression. However, finding such a local minima, as the authors showed through a simple example, may still be challenging.

linear regression, local minima, resnet, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback