AITopics | layer number

Collaborating Authors

layer number

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

50abc3e730e36b387ca8e02c26dc0a22-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 21:27:31 GMT

accuracy, machine learning, node, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

Appendix of Joint Data-T ask Generation for Auxiliary Learning Hong Chen

Neural Information Processing SystemsFeb-9-2026, 09:36:32 GMT

We provide the derivation of the upper implicit gradient in eq. We summarize the whole DTG-AuxL algorithm in Algorithm 1, where the lower and upper optimization updates are conducted alternatingly. We use the batch stochastic gradient optimization for both the lower and upper update. STL: It is a natural baseline where we only train on the primary task. Equal: It is a multi-task learning method, where we assign an equal weight of 1.0 to the loss of each MAXL can be only applied to the classification problem.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Maryland > Baltimore (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Appendix of Joint Data-T ask Generation for Auxiliary Learning Hong Chen

Neural Information Processing SystemsOct-8-2025, 08:35:50 GMT

auxiliary task, dataset, primary task, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Maryland > Baltimore (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash

Jia, Fucheng, Wu, Zewen, Jiang, Shiqi, Jiang, Huiqiang, Zhang, Qianxi, Yang, Yuqing, Liu, Yunxin, Ren, Ju, Zhang, Deyu, Cao, Ting

arXiv.org Artificial IntelligenceSep-24-2025

Large language models (LLMs) are increasingly being deployed on mobile devices, but the limited DRAM capacity constrains the deployable model size. This paper introduces ActiveFlow, the first LLM inference framework that can achieve adaptive DRAM usage for modern LLMs (not ReLU-based), enabling the scaling up of deployable model sizes. The framework is based on the novel concept of active weight DRAM-flash swapping and incorporates three novel techniques: (1) Cross-layer active weights preloading. It uses the activations from the current layer to predict the active weights of several subsequent layers, enabling computation and data loading to overlap, as well as facilitating large I/O transfers. (2) Sparsity-aware self-distillation. It adjusts the active weights to align with the dense-model output distribution, compensating for approximations introduced by contextual sparsity. (3) Active weight DRAM-flash swapping pipeline. It orchestrates the DRAM space allocation among the hot weight cache, preloaded active weights, and computation-involved weights based on available memory. Results show ActiveFlow achieves the performance-cost Pareto frontier compared to existing efficiency optimization methods.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.08378

Country:

North America > United States (0.93)
Europe (0.68)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Interpretable inverse design of optical multilayer thin films based on extended neural adjoint and regression activation mapping

Kim, Sungjun, Kim, Jungho

arXiv.org Artificial IntelligenceJul-28-2025

We propose an extended neural adjoint (ENA) framework, which meets six key criteria for artificial intelligence-assisted inverse design of optical multilayer thin films (OMTs): accuracy, efficiency, diversity, scalability, flexibility, and interpretability. To enhance the scalability of the existing neural adjoint method, we present a novel forward neural network architecture for OMTs and introduce a material loss function into the existing neural adjoint loss function, facilitating the exploration of material configurations of OMTs. Furthermore, we present the detailed formulation of the regression activation mapping for the presented forward neural network architecture (F-RAM), a feature visualization method aimed at improving interpretability. We validated the efficacy of the material loss by conducting an ablation study, where each component of the loss function is systematically removed and evaluated. The results indicated that the inclusion of the material loss significantly improves accuracy and diversity. To substantiate the performance of the ENA-based inverse design, we compared it against the residual network-based global optimization network (Res-GLOnet). The ENA yielded the OMT solutions of an inverse design with higher accuracy and better diversity compared to the Res-GLOnet. To demonstrate the interpretability, we applied F-RAM to diverse OMT structures with similar optical properties, obtained by the proposed ENA method. We showed that distributions of feature importance for various OMT structures exhibiting analogous optical properties are consistent, despite variations in material configurations, layer number, and thicknesses. Furthermore, we demonstrate the flexibility of the ENA method by restricting the initial layer of OMTs to SiO2 and 100 nm.

artificial intelligence, machine learning, omt, (18 more...)

arXiv.org Artificial Intelligence

2507.18644

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry: Energy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploiting Edited Large Language Models as General Scientific Optimizers

Lv, Qitan, Liu, Tianyu, Wang, Hong

arXiv.org Artificial IntelligenceMar-17-2025

Large language models (LLMs) have been widely adopted in mathematical optimization in scientific scenarios for their extensive knowledge and advanced reasoning capabilities. Existing methods mainly focus on utilizing LLMs to solve optimization problems in a prompt-based manner, which takes observational feedback as additional textual descriptions. However, due to LLM's \textbf{high sensitivity to the prompts} and \textbf{tendency to get lost in lengthy prompts}, these methods struggle to effectively utilize the {observational} feedback from each optimization step, which severely hinders the applications for real-world scenarios. To address these challenges, we propose a conceptually simple and general {bi-level} optimization method, namely \textbf{G}eneral \textbf{S}cientific \textbf{O}ptimizers (GSO). Specifically, GSO first utilizes inner-level simulators as experimental platforms to evaluate the current solution and provide observational feedback. Then, LLMs serve as knowledgeable and versatile scientists, generating new solutions by refining potential errors from the feedback as the outer-level optimization. Finally, simulations together with the expert knowledge in LLMs are jointly updated with bi-level interactions via model editing. Extensive experiments show that GSO consistently outperforms existing state-of-the-art methods using \textit{six} different LLM backbones on \textit{seven} different tasks, demonstrating the effectiveness and a wide range of applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.0962

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
(4 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (0.68)

Industry: Materials > Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion

Saynova, Denitsa, Hagström, Lovisa, Johansson, Moa, Johansson, Richard, Kuhlmann, Marco

arXiv.org Artificial IntelligenceOct-31-2024

Previous interpretations of language models (LMs) miss important distinctions in how these models process factual information. For example, given the query "Astrid Lindgren was born in" with the corresponding completion "Sweden", no difference is made between whether the prediction was based on having the exact knowledge of the birthplace of the Swedish author or assuming that a person with a Swedish-sounding name was born in Sweden. In this paper, we investigate four different prediction scenarios for which the LM can be expected to show distinct behaviors. These scenarios correspond to different levels of model reliability and types of information being processed - some being less desirable for factual predictions. To facilitate precise interpretations of LMs for fact completion, we propose a model-specific recipe called PrISM for constructing datasets with examples of each scenario based on a set of diagnostic criteria. We apply a popular interpretability method, causal tracing (CT), to the four prediction scenarios and find that while CT produces different results for each scenario, aggregations over a set of mixed examples may only represent the results from the scenario with the strongest measured signal. In summary, we contribute tools for a more granular study of fact completion in language models and analyses that provide a more nuanced understanding of how LMs process fact-related queries.

layer number, prediction, scenario, (14 more...)

arXiv.org Artificial Intelligence

2410.14405

Country:

Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Ohio (0.04)
(17 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Beyond Individual Facts: Investigating Categorical Knowledge Locality of Taxonomy and Meronomy Concepts in GPT Models

Burger, Christopher, Hu, Yifan, Le, Thai

arXiv.org Artificial IntelligenceJun-22-2024

The location of knowledge within Generative Pre-trained Transformer (GPT)-like models has seen extensive recent investigation. However, much of the work is focused towards determining locations of individual facts, with the end goal being the editing of facts that are outdated, erroneous, or otherwise harmful, without the time and expense of retraining the entire model. In this work, we investigate a broader view of knowledge location, that of concepts or clusters of related information, instead of disparate individual facts. To do this, we first curate a novel dataset, called DARC, that includes a total of 34 concepts of ~120K factual statements divided into two types of hierarchical categories, namely taxonomy and meronomy. Next, we utilize existing causal mediation analysis methods developed for determining regions of importance for individual facts and apply them to a series of related categories to provide detailed investigation into whether concepts are associated with distinct regions within these models. We find that related categories exhibit similar areas of importance in contrast to less similar categories. However, fine-grained localization of individual category subsets to specific regions is not apparent.

layer number, subject token, token, (13 more...)

arXiv.org Artificial Intelligence

2406.1594

Country:

North America > United States > Mississippi (0.04)
North America > United States > Indiana (0.04)
North America > Dominican Republic (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Industry: Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Do Llamas Work in English? On the Latent Language of Multilingual Transformers

Wendler, Chris, Veselovsky, Veniamin, Monea, Giovanni, West, Robert

arXiv.org Artificial IntelligenceJun-8-2024

We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language -- a question of key importance for understanding how language models function and the origins of linguistic bias. Focusing on the Llama-2 family of transformer models, our study uses carefully constructed non-English prompts with a unique correct single-token continuation. From layer to layer, transformers gradually map an input embedding of the final prompt token to an output embedding from which next-token probabilities are computed. Tracking intermediate embeddings through their high-dimensional space reveals three distinct phases, whereby intermediate embeddings (1) start far away from output token embeddings; (2) already allow for decoding a semantically correct next token in the middle layers, but give higher probability to its version in English than in the input language; (3) finally move into an input-language-specific region of the embedding space. We cast these results into a conceptual model where the three phases operate in "input space", "concept space", and "output space", respectively. Crucially, our evidence suggests that the abstract "concept space" lies closer to English than to other languages, which may have important consequences regarding the biases held by multilingual language models.

layer layer layer, probability, translation, (13 more...)

arXiv.org Artificial Intelligence

2402.10588

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Finland (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

On the Role of Attention Masks and LayerNorm in Transformers

Wu, Xinyi, Ajorlou, Amir, Wang, Yifei, Jegelka, Stefanie, Jadbabaie, Ali

arXiv.org Machine LearningMay-29-2024

Self-attention is the key mechanism of transformers, which are the essential building blocks of modern foundation models. Recent studies have shown that pure self-attention suffers from an increasing degree of rank collapse as depth increases, limiting model expressivity and further utilization of model depth. The existing literature on rank collapse, however, has mostly overlooked other critical components in transformers that may alleviate the rank collapse issue. In this paper, we provide a general analysis of rank collapse under self-attention, taking into account the effects of attention masks and layer normalization (LayerNorm). In particular, we find that although pure masked attention still suffers from exponential collapse to a rank one subspace, local masked attention can provably slow down the collapse rate. In the case of self-attention with LayerNorm, we first show that for certain classes of value matrices, collapse to a rank one subspace still happens exponentially. However, through construction of nontrivial counterexamples, we then establish that with proper choice of value matrices, a general class of sequences may not converge to a rank one subspace, and the self-attention dynamics with LayerNorm can simultaneously possess a rich set of equilibria with any possible rank between one and full. Our result refutes the previous hypothesis that LayerNorm plays no role in the rank collapse of self-attention and suggests that self-attention with LayerNorm constitutes a much more expressive, versatile nonlinear dynamical system than what was originally thought.

layernorm, rank collapse, transformer, (16 more...)

arXiv.org Machine Learning

2405.18781

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback