AITopics | skip layer

Collaborating Authors

skip layer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

Chen, Qian, Wang, Wen, Zhang, Qinglin, Zheng, Siqi, Zhang, Shiliang, Deng, Chong, Yu, Hai, Liu, Jiaqing, Ma, Yukun, Zhang, Chong

arXiv.org Artificial IntelligenceJun-17-2024

The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention between non-adjacent layers. This method improves the model's ability to capture dependencies between high-level abstract features and low-level details. By facilitating direct attention between these diverse feature levels, our approach overcomes the limitations of current Transformers, which often rely on suboptimal intra-layer attention. Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer, thus enhancing the diversity of multi-head attention without additional computational burden. Extensive experiments demonstrate that our enhanced Transformer model achieves superior performance in language modeling tasks, highlighting the effectiveness of our skip-layer attention mechanism.

architecture, config, transformer, (12 more...)

arXiv.org Artificial Intelligence

2406.11274

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.05)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(5 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

PEMMA: Parameter-Efficient Multi-Modal Adaptation for Medical Image Segmentation

Saadi, Nada, Saeed, Numan, Yaqub, Mohammad, Nandakumar, Karthik

arXiv.org Artificial IntelligenceApr-21-2024

Imaging modalities such as Computed Tomography (CT) and Positron Emission Tomography (PET) are key in cancer detection, inspiring Deep Neural Networks (DNN) models that merge these scans for tumor segmentation. When both CT and PET scans are available, it is common to combine them as two channels of the input to the segmentation model. However, this method requires both scan types during training and inference, posing a challenge due to the limited availability of PET scans, thereby sometimes limiting the process to CT scans only. Hence, there is a need to develop a flexible DNN architecture that can be trained/updated using only CT scans but can effectively utilize PET scans when they become available. In this work, we propose a parameter-efficient multi-modal adaptation (PEMMA) framework for lightweight upgrading of a transformer-based segmentation model trained only on CT scans to also incorporate PET scans. The benefits of the proposed approach are two-fold. Firstly, we leverage the inherent modularity of the transformer architecture and perform low-rank adaptation (LoRA) of the attention weights to achieve parameter-efficient adaptation. Secondly, since the PEMMA framework attempts to minimize cross modal entanglement, it is possible to subsequently update the combined model using only one modality, without causing catastrophic forgetting of the other modality. Our proposed method achieves comparable results with the performance of early fusion techniques with just 8% of the trainable parameters, especially with a remarkable +28% improvement on the average dice score on PET scans when trained on a single modality.

adaptation, modality, pet scan, (15 more...)

arXiv.org Artificial Intelligence

2404.13704

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Switzerland (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Two Novel Performance Improvements for Evolving CNN Topologies

Strauch, Yaron, Grundy, Jo

arXiv.org Artificial IntelligenceFeb-10-2021

Convolutional Neural Networks (CNNs) are the state-of-the-art algorithms for the processing of images. However the configuration and training of these networks is a complex task requiring deep domain knowledge, experience and much trial and error. Using genetic algorithms, competitive CNN topologies for image recognition can be produced for any specific purpose, however in previous work this has come at high computational cost. In this work two novel approaches are presented to the utilisation of these algorithms, effective in reducing complexity and training time by nearly 20%. This is accomplished via regularisation directly on training time, and the use of partial training to enable early ranking of individual architectures. Both approaches are validated on the benchmark CIFAR10 data set, and maintain accuracy.

accuracy, algorithm, base experiment, (14 more...)

arXiv.org Artificial Intelligence

2102.05451

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Brain Signal Classification via Learning Connectivity Structure

Jang, Soobeom, Moon, Seong-Eun, Lee, Jong-Seok

arXiv.org Machine LearningMay-31-2019

Connectivity between different brain regions is one of the most important properties for classification of brain signals including electroencephalography (EEG). However, how to define the connectivity structure for a given task is still an open problem, because there is no ground truth about how the connectivity structure should be in order to maximize the performance. In this paper, we propose an end-to-end neural network model for EEG classification, which can extract an appropriate multi-layer graph structure and signal features directly from a set of raw EEG signals and perform classification. Experimental results demonstrate that our method yields improved performance in comparison to the existing approaches where manually defined connectivity structures and signal features are used. Furthermore, we show that the graph structure extraction process is reliable in terms of consistency, and the learned graph structures make much sense in the neuroscientific viewpoint.

artificial intelligence, graph structure, machine learning, (18 more...)

arXiv.org Machine Learning

1905.11678

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

The idea is ridiculously simple (perhaps why it is effective?): randomly skip layers while training • /r/MachineLearning

#artificialintelligenceApr-3-2016, 16:15:28 GMT

The idea is ridiculously simple (perhaps why it is effective?): I don't understand the claim "Remember all the narratives we told about how depth learns hierarchical representations, and higher level representations -- those higher level representations don't seem to matter so much after all.". The net has over 100 layers!?! I imagine that this also works reasonably well in the RNN encoder in an encoder/decoder framework. I wonder if it also applies to generative RNNs.

artificial intelligence, skip layer, social media, (4 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.91)

Add feedback