AITopics | Wozniak, Stanislaw

Collaborating Authors

Wozniak, Stanislaw

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RWKV: Reinventing RNNs for the Transformer Era

Peng, Bo, Alcaide, Eric, Anthony, Quentin, Albalak, Alon, Arcadinho, Samuel, Biderman, Stella, Cao, Huanqi, Cheng, Xin, Chung, Michael, Grella, Matteo, GV, Kranthi Kiran, He, Xuzheng, Hou, Haowen, Lin, Jiaju, Kazienko, Przemyslaw, Kocon, Jan, Kong, Jiaming, Koptyra, Bartlomiej, Lau, Hayden, Mantri, Krishna Sri Ipsit, Mom, Ferdinand, Saito, Atsushi, Song, Guangyu, Tang, Xiangru, Wang, Bolun, Wind, Johan S., Wozniak, Stanislaw, Zhang, Ruichong, Zhang, Zhenyuan, Zhao, Qihang, Zhou, Peng, Zhou, Qinghua, Zhu, Jian, Zhu, Rui-Jie

arXiv.org Artificial IntelligenceDec-10-2023

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.13048

Country:

Europe (1.00)
North America > United States > New York (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning in Deep Neural Networks Using a Biologically Inspired Optimizer

Dellaferrera, Giorgia, Wozniak, Stanislaw, Indiveri, Giacomo, Pantazi, Angeliki, Eleftheriou, Evangelos

arXiv.org Artificial IntelligenceApr-23-2021

Plasticity circuits in the brain are known to be influenced by the distribution of the synaptic weights through the mechanisms of synaptic integration and local regulation of synaptic strength. However, the complex interplay of stimulation-dependent plasticity with local learning signals is disregarded by most of the artificial neural network training algorithms devised so far. Here, we propose a novel biologically inspired optimizer for artificial (ANNs) and spiking neural networks (SNNs) that incorporates key principles of synaptic integration observed in dendrites of cortical neurons: GRAPES (Group Responsibility for Adjusting the Propagation of Error Signals). GRAPES implements a weight-distribution dependent modulation of the error signal at each node of the neural network. We show that this biologically inspired mechanism leads to a systematic improvement of the convergence rate of the network, and substantially improves classification accuracy of ANNs and SNNs with both feedforward and recurrent architectures. Furthermore, we demonstrate that GRAPES supports performance scalability for models of increasing complexity and mitigates catastrophic forgetting by enabling networks to generalize to unseen tasks based on previously acquired knowledge. The local characteristics of GRAPES minimize the required memory resources, making it optimally suited for dedicated hardware implementations. Overall, our work indicates that reconciling neurophysiology insights with machine intelligence is key to boosting the performance of neural networks.

deep learning, neural network, neuron, (20 more...)

arXiv.org Artificial Intelligence

2104.11604

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Energy > Oil & Gas (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback