AITopics | activation state

Collaborating Authors

activation state

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Polyhedron Attention Module: Learning Adaptive-order Interactions Anonymous Author(s) Affiliation Address email Appendixes1

Neural Information Processing SystemsApr-25-2026, 15:31:13 GMT

Contents2 ADeriving Eq. 2. 23 BThe hyperplane set generated by the oblique tree is a superset of that created by the4 ReLU-activated plain DNN 35 CProof of Theorem 1 46 DProof of Theorem 2 57 EProof of Theorem 3 68 FProof of Theorem 4 79 GImplementation Detail 810 We consider a L-layer (L 2) ReLU activated plain DNN module f: Rn0 RnL with input12 x Rp. Eq. 2 in the main text can be30 obtained by rewriting P An oblique tree is a binary tree where each node splits the space by a hyperplane rather than by34 thresholding a single feature. The tree starts with the root of the full input space S, and by recursively35 splitting S, the tree grows deeper. For a D-depth (D 3) binary tree, there are 2D 1 1 internal36 nodes and 2D 1 leaf nodes. As shown in Figure 1, each internal and leaf node maintains a sub-space37 representing a polyhedron in S, and each layer of the tree corresponds to a partition of the input38 space into polyhedrons.

activation state, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Polyhedron Attention Module: Learning Adaptive-order Interactions Anonymous Author(s) Affiliation Address email Appendixes

Neural Information Processing SystemsFeb-8-2026, 15:35:15 GMT

's leaf nodes to form Given the definition of our attention in Eq. 9 in the main text, the highest polynomial order is Before providing the proof of Theorem 4, we establish Lemma 1 as its foundation. We follow the principle of Y an et al's work [ Figure 1, we consider two kinds of value functions, i.e., In P AM-Net, we set the number of levels to 2. A grid search is performed over different configurations We conduct grid searches on the dropout rate over {0, 0.1, 0.2} and the initial

activation state, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

XiCAD: Camera Activation Detection in the Da Vinci Xi User Interface

Jenke, Alexander C., Just, Gregor, de Boer, Claas, Wagner, Martin, Bodenstedt, Sebastian, Speidel, Stefanie

arXiv.org Artificial IntelligenceNov-26-2025

Purpose: Robot-assisted minimally invasive surgery relies on endoscopic video as the sole intraoperative visual feedback. The DaVinci Xi system overlays a graphical user interface (UI) that indicates the state of each robotic arm, including the activation of the endoscope arm. Detecting this activation provides valuable metadata such as camera movement information, which can support downstream surgical data science tasks including tool tracking, skill assessment, or camera control automation. Methods: We developed a lightweight pipeline based on a ResNet18 convolutional neural network to automatically identify the position of the camera tile and its activation state within the DaVinci Xi UI. The model was fine-tuned on manually annotated data from the SurgToolLoc dataset and evaluated across three public datasets comprising over 70,000 frames. Results: The model achieved F1-scores between 0.993 and 1.000 for the binary detection of active cameras and correctly localized the camera tile in all cases without false multiple-camera detections. Conclusion: The proposed pipeline enables reliable, real-time extraction of camera activation metadata from surgical videos, facilitating automated preprocessing and analysis for diverse downstream applications. All code, trained models, and annotations are publicly available.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.20254

Country: Europe (0.16)

Genre: Research Report (0.40)

Industry:

Health & Medicine (1.00)
Media > Television (0.35)
Media > Photography (0.35)
Media > Film (0.35)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

Zhang, Zhi, Shen, Yixian, Cao, Congfeng, Shutova, Ekaterina

arXiv.org Artificial IntelligenceOct-23-2025

Existing parameter-efficient fine-tuning (PEFT) methods primarily fall into two categories: addition-based and selective in-situ adaptation. The former, such as LoRA, introduce additional modules to adapt the model to downstream tasks, offering strong memory efficiency. However, their representational capacity is often limited, making them less suitable for fine-grained adaptation. In contrast, the latter directly fine-tunes a carefully chosen subset of the original model parameters, allowing for more precise and effective adaptation, but at the cost of significantly increased memory consumption. To reconcile this trade-off, we propose NeuroAda, a novel PEFT method that enables fine-grained model finetuning while maintaining high memory efficiency. Our approach first identifies important parameters (i.e., connections within the network) as in selective adaptation, and then introduces bypass connections for these selected parameters. During finetuning, only the bypass connections are updated, leaving the original model parameters frozen. Empirical results on 23+ tasks spanning both natural language generation and understanding demonstrate that NeuroAda achieves state-of-the-art performance with as little as $\leq \textbf{0.02}\%$ trainable parameters, while reducing CUDA memory usage by up to 60%. We release our code here: https://github.com/FightingFighting/NeuroAda.git.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.1894

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Partial Resilient Leader-Follower Consensus in Time-Varying Graphs

Lee, Haejoon, Panagou, Dimitra

arXiv.org Artificial IntelligenceOct-2-2025

Existing approaches typically require robustness conditions of the entire network to guarantee resilient consensus. However, the behavior of such systems when these conditions are not fully met remains unexplored. T o address this gap, we introduce the notion of partial leader-follower consensus, in which a subset of non-adversarial followers successfully tracks the leader's reference state despite insufficient robustness. We propose a novel distributed algorithm -- the Bootstrap Percolation and Mean Subsequence Reduced (BP-MSR) algorithm -- and establish sufficient conditions for individual followers to achieve consensus via the BP-MSR algorithm in arbitrary time-varying graphs. We validate our findings through simulations, demonstrating that our method guarantees partial leader-follower consensus, even when standard resilient consensus algorithms fail.

artificial intelligence, consensus, follower, (17 more...)

arXiv.org Artificial Intelligence

2510.01144

Country: North America > United States > Michigan (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

CLUE: Conflict-guided Localization for LLM Unlearning Framework

Chen, Hang, Zhu, Jiaying, Yang, Xinyu, Wang, Wenya

arXiv.org Artificial IntelligenceSep-26-2025

The LLM unlearning aims to eliminate the influence of undesirable data without affecting causally unrelated information. This process typically involves using a forget set to remove target information, alongside a retain set to maintain non-target capabilities. While recent localization-based methods demonstrate promise in identifying important neurons to be unlearned, they fail to disentangle neurons responsible for forgetting undesirable knowledge or retaining essential skills, often treating them as a single entangled group. As a result, these methods apply uniform interventions, risking catastrophic over-forgetting or incomplete erasure of the target knowledge. To address this, we turn to circuit discovery, a mechanistic interpretability technique, and propose the Conflict-guided Localization for LLM Unlearning framEwork (CLUE). This framework identifies the forget and retain circuit composed of important neurons, and then the circuits are transformed into conjunctive normal forms (CNF). The assignment of each neuron in the CNF satisfiability solution reveals whether it should be forgotten or retained. We then provide targeted fine-tuning strategies for different categories of neurons. Extensive experiments demonstrate that, compared to existing localization methods, CLUE achieves superior forget efficacy and retain utility through precise neural localization.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.20977

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report (0.81)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

OUI Need to Talk About Weight Decay: A New Perspective on Overfitting Detection

Fernández-Hernández, Alberto, Mestre, Jose I., Dolz, Manuel F., Duato, Jose, Quintana-Ortí, Enrique S.

arXiv.org Machine LearningApr-23-2025

--We introduce the Overfitting-Underfitting Indicator (OUI), a novel tool for monitoring the training dynamics of Deep Neural Networks (DNNs) and identifying optimal regularization hyperparameters. Specifically, we validate that OUI can effectively guide the selection of the Weight Decay (WD) hyperparameter by indicating whether a model is overfitting or underfitting during training without requiring validation data. Through experiments on DenseNet-BC-100 with CIF AR-100, EfficientNet-B0 with TinyImageNet and ResNet-34 with ImageNet-1K, we show that maintaining OUI within a prescribed interval correlates strongly with improved generalization and validation scores. Notably, OUI converges significantly faster than traditional metrics such as loss or accuracy, enabling practitioners to identify optimal WD (hyperparameter) values within the early stages of training. By leveraging OUI as a reliable indicator, we can determine early in training whether the chosen WD value leads the model to underfit the training data, overfit, or strike a well-balanced trade-off that maximizes validation scores. This enables more precise WD tuning for optimal performance on the tested datasets and DNNs. The challenge of overfitting in training DNNs has become increasingly pronounced, fueled by the overparameterization characteristic of many state-of-the-art architectures. Although DNNs with strong expressive power [1]-[3]-- i.e., the ability to approximate arbitrarily complex functions with increasing precision--hold the promise of exceptional performance in terms of validation scores, they often exploit this by memorizing specific details of the training set that are not related Manuel F. Dolz was supported by the Plan Gen-T grant CIDEXG/2022/013 of the Generalitat V alenciana. This misdirection undermines the DNN's ability to generalize, resulting in a significant gap between training and validation scores. To address this problem, regularization techniques have emerged as essential tools in modern Deep Learning (DL) [4], [5]. Indeed, understanding and enhancing generalization has become a central focus of contemporary research, as highlighted by works such as [6] and [7].

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2504.1716

Country:

North America > United States > Iowa > Story County > Ames (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Breaking the Loop: Detecting and Mitigating Denial-of-Service Vulnerabilities in Large Language Models

Yu, Junzhe, Liu, Yi, Sun, Huijia, Shi, Ling, Chen, Yuqi

arXiv.org Artificial IntelligenceMar-1-2025

Large Language Models (LLMs) have significantly advanced text understanding and generation, becoming integral to applications across education, software development, healthcare, entertainment, and legal services. Despite considerable progress in improving model reliability, latency remains under-explored, particularly through recurrent generation, where models repeatedly produce similar or identical outputs, causing increased latency and potential Denial-of-Service (DoS) vulnerabilities. We propose RecurrentGenerator, a black-box evolutionary algorithm that efficiently identifies recurrent generation scenarios in prominent LLMs like LLama-3 and GPT-4o. Additionally, we introduce RecurrentDetector, a lightweight real-time classifier trained on activation patterns, achieving 95.24% accuracy and an F1 score of 0.87 in detecting recurrent loops. Our methods provide practical solutions to mitigate latency-related vulnerabilities, and we publicly share our tools and data to support further research.

llm, recurrent generation, test input, (14 more...)

arXiv.org Artificial Intelligence

2503.00416

Country:

Asia > Singapore (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Timeline and Analysis for Representation Plasticity in Large Language Models

Kannan, Akshat

arXiv.org Artificial IntelligenceOct-8-2024

The ability to steer AI behavior is crucial to preventing its long term dangerous and catastrophic potential. Representation Engineering (RepE) has emerged as a novel, powerful method to steer internal model behaviors, such as "honesty", at a top-down level. Understanding the steering of representations should thus be placed at the forefront of alignment initiatives. Unfortunately, current efforts to understand plasticity at this level are highly neglected. This paper aims to bridge the knowledge gap and understand how LLM representation stability, specifically for the concept of "honesty", and model plasticity evolve by applying steering vectors extracted at different fine-tuning stages, revealing differing magnitudes of shifts in model behavior. The findings are pivotal, showing that while early steering exhibits high plasticity, later stages have a surprisingly responsive critical window. This pattern is observed across different model architectures, signaling that there is a general pattern of model plasticity that can be used for effective intervention. These insights greatly contribute to the field of AI transparency, addressing a pressing lack of efficiency limiting our ability to effectively steer model behavior.

gpt-2 medium, representation, vector, (15 more...)

arXiv.org Artificial Intelligence

2410.06225

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Mitigate Negative Transfer with Similarity Heuristic Lifelong Prompt Tuning

Wu, Chenyuan, Jiang, Gangwei, Lian, Defu

arXiv.org Artificial IntelligenceJun-17-2024

Lifelong prompt tuning has significantly advanced parameter-efficient lifelong learning with its efficiency and minimal storage demands on various tasks. Our empirical studies, however, highlights certain transferability constraints in the current methodologies: a universal algorithm that guarantees consistent positive transfer across all tasks is currently unattainable, especially when dealing dissimilar tasks that may engender negative transfer. Identifying the misalignment between algorithm selection and task specificity as the primary cause of negative transfer, we present the Similarity Heuristic Lifelong Prompt Tuning (SHLPT) framework. This innovative strategy partitions tasks into two distinct subsets by harnessing a learnable similarity metric, thereby facilitating fruitful transfer from tasks regardless of their similarity or dissimilarity. Additionally, SHLPT incorporates a parameter pool to combat catastrophic forgetting effectively. Our experiments shows that SHLPT outperforms state-of-the-art techniques in lifelong learning benchmarks and demonstrates robustness against negative transfer in diverse task sequences.

computational linguistic, negative transfer, similarity, (14 more...)

arXiv.org Artificial Intelligence

2406.12251

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(9 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Education > Educational Setting (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback