AITopics | xavier

Collaborating Authors

xavier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

S-GAI: Spectral Geometry-Aware Initialization for Sigmoidal MLPs -- From Dataset Geometry to Network Weights

Chu, Yi-Shan

arXiv.org Machine LearningJun-30-2026

Classical universal approximation theorems establish the expressive power of sigmoidal multilayer perceptrons, but they do not prescribe how initial weights should encode the geometry of a data distribution. We propose S-GAI, a spectral geometry-aware initialization framework for one-hidden-layer sigmoidal MLPs. Starting from the constructive idea that sigmoid units can act as smooth half-space gates, we move from hand-specified planar geometry to class-wise spectral geometry estimated from image data. For each class, SVD provides a mean, principal directions, and spectral scales. An energy threshold selects the retained directions, and each retained direction is represented by two sigmoid gates. These class-specific gates form a shared hidden layer initialized directly from the training set. We also formulate a SVD-based subspace classifier as a non-neural geometric reference, which tests whether the estimated spectral class geometry is already discriminative before being embedded into the MLP. Experiments on MNIST, Fashion-MNIST, and a more challenging CIFAR-10 test show that the S-GAI-initialized MLP starts from a substantially more informative hidden state than Xavier initialization and reaches comparable final accuracy under full training. When the hidden layer is frozen, training only the output layer still gives stronger performance than frozen random gates, providing evidence that S-GAI effectively embeds class-wise spectral geometry into the MLP.

artificial intelligence, geometry, machine learning, (18 more...)

arXiv.org Machine Learning

2606.28444

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

876e8108f87eb61877c6263228b67256-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 20:31:52 GMT

experiment, gradient deviation, initialization, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models

Han, Yankun

arXiv.org Artificial IntelligenceOct-13-2025

Weight initialization governs signal propagation and gradient flow at the start of training. This paper offers a theory-grounded and empirically validated study across two regimes: compact ReLU multilayer perceptrons and GPT-2-style transformers. First, a logarithmic sweep of the initial standard deviation maps vanishing and exploding regimes and identifies a broad stability band with standard deviations between 1e-2 and 1e-1. Second, a controlled comparison shows that Kaiming (fan-in) initialization converges faster and more stably than Xavier under ReLU, consistent with variance-preserving theory. Third, in a from-scratch 12-layer GPT-2-style model, this paper tracks layerwise Q/K/V weight variance through pretraining and observe depth-dependent equilibration into narrow bands: shallow layers expand rapidly while deeper layers change more gradually. Together, these results connect classic initialization principles with modern transformer behavior and yield simple, practical recipes for robust training.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.09423

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

876e8108f87eb61877c6263228b67256-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 03:58:52 GMT

artificial intelligence, initialization, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

THOR: A Generic Energy Estimation Approach for On-Device Training

Zhang, Jiaru, Wang, Zesong, Wang, Hao, Song, Tao, Su, Huai-an, Chen, Rui, Hua, Yang, Zhou, Xiangwei, Ma, Ruhui, Pan, Miao, Guan, Haibing

arXiv.org Artificial IntelligenceJan-26-2025

Battery-powered mobile devices (e.g., smartphones, AR/VR glasses, and various IoT devices) are increasingly being used for AI training due to their growing computational power and easy access to valuable, diverse, and real-time data. On-device training is highly energy-intensive, making accurate energy consumption estimation crucial for effective job scheduling and sustainable AI. However, the heterogeneity of devices and the complexity of models challenge the accuracy and generalizability of existing estimation methods. This paper proposes THOR, a generic approach for energy consumption estimation in deep neural network (DNN) training. First, we examine the layer-wise energy additivity property of DNNs and strategically partition the entire model into layers for fine-grained energy consumption profiling. Then, we fit Gaussian Process (GP) models to learn from layer-wise energy consumption measurements and estimate a DNN's overall energy consumption based on its layer-wise energy additivity property. We conduct extensive experiments with various types of models across different real-world platforms. The results demonstrate that THOR has effectively reduced the Mean Absolute Percentage Error (MAPE) by up to 30%. Moreover, THOR is applied in guiding energy-aware pruning, successfully reducing energy consumption by 50%, thereby further demonstrating its generality and potential.

energy consumption, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.16397

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Virginia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Industry: Energy > Energy Storage (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Democratizing MLLMs in Healthcare: TinyLLaVA-Med for Efficient Healthcare Diagnostics in Resource-Constrained Settings

Mir, Aya El, Luoga, Lukelo Thadei, Chen, Boyuan, Hanif, Muhammad Abdullah, Shafique, Muhammad

arXiv.org Artificial IntelligenceSep-2-2024

These MLLMs in healthcare is hindered by their high computational demands integrate Large Language Models (LLMs) with Vision Encoders, and significant memory requirements, which are thus possessing capabilities that extend beyond textual particularly challenging for resource-constrained devices understanding and analysis to include image processing like the Nvidia Jetson Xavier. This problem is particularly capabilities. This enables them to simultaneously interpret evident in remote medical settings where advanced both textual data and medical images, facilitating more accurate diagnostics are needed but resources are limited. In this and comprehensive diagnostics and decision-making in paper, we introduce an optimization method for the generalpurpose healthcare. By rapidly processing and synthesizing diverse MLLM, TinyLLaVA, which we have adapted data types, these models can significantly advance patient and renamed TinyLLaVA-Med. This adaptation involves care, enabling quicker, more precise diagnoses and personalized instruction-tuning and fine-tuning TinyLLaVA on a medical treatment plans, thus, transforming healthcare into a dataset by drawing inspiration from the LLaVA-Med training more efficient, effective, and patient-centered service [5] [6].

healthcare, mllm, tinyllava-med, (13 more...)

arXiv.org Artificial Intelligence

2409.12184

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.89)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

On Initializing Transformers with Pre-trained Embeddings

Kim, Ha Young, Balasubramanian, Niranjan, Kang, Byungkon

arXiv.org Artificial IntelligenceJul-17-2024

It has become common practice now to use random initialization schemes, rather than the pre-trained embeddings, when training transformer based models from scratch. Indeed, we find that pre-trained word embeddings from GloVe, and some sub-word embeddings extracted from language models such as T5 and mT5 fare much worse compared to random initialization. This is counter-intuitive given the well-known representational and transfer-learning advantages of pre-training. Interestingly, we also find that BERT and mBERT embeddings fare better than random initialization, showing the advantages of pre-trained representations. In this work, we posit two potential factors that contribute to these mixed results: the model sensitivity to parameter distribution and the embedding interactions with position encodings. We observe that pre-trained GloVe, T5, and mT5 embeddings have a wider distribution of values. As argued in the initialization studies, such large value initializations can lead to poor training because of saturated outputs. Further, the larger embedding values can, in effect, absorb the smaller position encoding values when added together, thus losing position information. Standardizing the pre-trained embeddings to a narrow range (e.g. as prescribed by Xavier) leads to substantial gains for Glove, T5, and mT5 embeddings. On the other hand, BERT pre-trained embeddings, while larger, are still relatively closer to Xavier initialization range which may allow it to effectively transfer the pre-trained knowledge.

information, initialization, xavier, (15 more...)

arXiv.org Artificial Intelligence

2407.12514

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(13 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)

Add feedback

First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

Aoki, Yoichi, Kudo, Keito, Kuribayashi, Tatsuki, Sone, Shusaku, Taniguchi, Masaya, Sakaguchi, Keisuke, Inui, Kentaro

arXiv.org Artificial IntelligenceJun-23-2024

Multi-step reasoning is widely adopted in the community to explore the better performance of language models (LMs). We report on the systematic strategy that LMs use in this process. Our controlled experiments reveal that LMs rely more heavily on heuristics, such as lexical overlap, in the earlier stages of reasoning Figure 1: Illustration of the systematic strategy we discovered when more steps are required to reach an in language models (LMs). When the goal is answer. Conversely, as LMs progress closer distant from the current state in a multi-step reasoning to the final answer, their reliance on heuristics process, the models tend to rely on heuristics, such as decreases. This suggests that LMs track only superficial overlap, which can lead them in the wrong a limited number of future steps and dynamically direction. In contrast, when the goal is within a limited combine heuristic strategies with logical distance, the models are more likely to take rational actions ones in tasks involving multi-step reasoning.

apple, distractor, walter, (17 more...)

arXiv.org Artificial Intelligence

2406.16078

Country:

Asia > Japan > Honshū > Tōhoku (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

On Limitations of the Transformer Architecture

Peng, Binghui, Narayanan, Srini, Papadimitriou, Christos

arXiv.org Machine LearningFeb-12-2024

What are the root causes of hallucinations in large language models (LLMs)? We use Communication Complexity to prove that the Transformer layer is incapable of composing functions (e.g., identify a grandparent of a person in a genealogy) if the domains of the functions are large enough; we show through examples that this inability is already empirically present when the domains are quite small. We also point out that several mathematical tasks that are at the core of the so-called compositional tasks thought to be hard for LLMs are unlikely to be solvable by Transformers, for large enough instances and assuming that certain well accepted conjectures in the field of Computational Complexity are true.

composition, transformer, xavier, (15 more...)

arXiv.org Machine Learning

2402.08164

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Principled Weight Initialization for Hypernetworks

Chang, Oscar, Flokas, Lampros, Lipson, Hod

arXiv.org Artificial IntelligenceDec-12-2023

Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to Bayesian deep learning, the problem of optimizing hypernetworks has not been studied to date. We observe that classical weight initialization methods like Glorot & Bengio (2010) and He et al. (2015), when applied directly on a hypernet, fail to produce weights for the mainnet in the correct scale. We develop principled techniques for weight initialization in hypernets, and show that they lead to more stable mainnet weights, lower training loss, and faster convergence. Meta-learning describes a broad family of techniques in machine learning that deals with the problem of learning to learn. An emerging branch of meta-learning involves the use of hypernetworks, which are meta neural networks that generate the weights of a main neural network to solve a given task in an end-to-end differentiable manner. Hypernetworks were originally introduced by Ha et al. (2016) as a way to induce weight-sharing and achieve model compression by training the same meta network to learn the weights belonging to different layers in the main network.

hypernetwork, var, xavier, (11 more...)

arXiv.org Artificial Intelligence

2312.08399

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
Africa > Kenya (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback