AITopics

2406.09062

Country:

Europe (1.00)
North America > United States > Oregon (0.13)
North America > United States > Hawaii (0.13)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)
Research Report > Promising Solution (0.67)

Industry:

Energy (1.00)
Health & Medicine > Therapeutic Area (0.92)
Information Technology (0.87)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

arXiv.org Artificial IntelligenceFeb-14-2024

On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era

Tiezzi, Matteo, Casoni, Michele, Betti, Alessandro, Guidi, Tommaso, Gori, Marco, Melacci, Stefano

A longstanding challenge for the Machine Learning community is the one of developing models that are capable of processing and learning from very long sequences of data. The outstanding results of Transformers-based networks (e.g., Large Language Models) promotes the idea of parallel attention as the key to succeed in such a challenge, obfuscating the role of classic sequential processing of Recurrent Models. However, in the last few years, researchers who were concerned by the quadratic complexity of self-attention have been proposing a novel wave of neural models, which gets the best from the two worlds, i.e., Transformers and Recurrent Nets. Meanwhile, Deep Space-State Models emerged as robust approaches to function approximation over time, thus opening a new perspective in learning from sequential data, followed by many people in the field and exploited to implement a special class of (linear) Recurrent Neural Networks. This survey is aimed at providing an overview of these trends framed under the unifying umbrella of Recurrence. Moreover, it emphasizes novel research opportunities that become prominent when abandoning the idea of processing long sequences whose length is known-in-advance for the more realistic setting of potentially infinite-length sequences, thus intersecting the field of lifelong-online learning from streamed data.

artificial intelligence, machine learning, natural language, (18 more...)

2402.08132

Country: Europe > France > Provence-Alpes-Côte d'Azur (0.14)

Genre:

Overview (0.48)
Research Report (0.40)

Industry: Education > Educational Setting > Online (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-4-2024

Nature-Inspired Local Propagation

Betti, Alessandro, Gori, Marco

The spectacular results achieved in machine learning, including the recent advances in generative AI, rely on large data collections. On the opposite, intelligent processes in nature arises without the need for such collections, but simply by online processing of the environmental information. In particular, natural learning processes rely on mechanisms where data representation and learning are intertwined in such a way to respect spatiotemporal locality. This paper shows that such a feature arises from a pre-algorithmic view of learning that is inspired by related studies in Theoretical Physics. We show that the algorithmic interpretation of the derived "laws of learning", which takes the structure of Hamiltonian equations, reduces to Backpropagation when the speed of propagation goes to infinity. This opens the doors to machine learning studies based on full on-line information processing that are based the replacement of Backpropagation with the proposed spatiotemporal local algorithm.

artificial intelligence, deep learning, machine learning, (20 more...)

2402.05959

Country: Europe > France > Provence-Alpes-Côte d'Azur (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceDec-14-2023

Neural Time-Reversed Generalized Riccati Equation

Betti, Alessandro, Casoni, Michele, Gori, Marco, Marullo, Simone, Melacci, Stefano, Tiezzi, Matteo

Optimal control deals with optimization problems in which variables steer a dynamical system, and its outcome contributes to the objective function. Two classical approaches to solving these problems are Dynamic Programming and the Pontryagin Maximum Principle. In both approaches, Hamiltonian equations offer an interpretation of optimality through auxiliary variables known as costates. However, Hamiltonian equations are rarely used due to their reliance on forward-backward algorithms across the entire temporal domain. This paper introduces a novel neural-based approach to optimal control, with the aim of working forward-in-time. Neural networks are employed not only for implementing state dynamics but also for estimating costate variables. The parameters of the latter network are determined at each time step using a newly introduced local policy referred to as the time-reversed generalized Riccati equation. This policy is inspired by a result discussed in the Linear Quadratic (LQ) problem, which we conjecture stabilizes state dynamics. We support this conjecture by discussing experimental results from a range of optimal control case studies.

artificial intelligence, machine learning, optimization problem, (17 more...)

2312.0931

Country: Europe > France > Provence-Alpes-Côte d'Azur (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

arXiv.org Artificial IntelligenceDec-2-2022

PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks

Meloni, Enrico, Faggi, Lapo, Marullo, Simone, Betti, Alessandro, Tiezzi, Matteo, Gori, Marco, Melacci, Stefano

In this paper, we present PARTIME, a software library written in Python and based on PyTorch, designed specifically to speed up neural networks whenever data is continuously streamed over time, for both learning and inference. Existing libraries are designed to exploit data-level parallelism, assuming that samples are batched, a condition that is not naturally met in applications that are based on streamed data. Differently, PARTIME starts processing each data sample at the time in which it becomes available from the stream. PARTIME wraps the code that implements a feed-forward multi-layer network and it distributes the layer-wise processing among multiple devices, such as Graphics Processing Units (GPUs). Thanks to its pipeline-based computational scheme, PARTIME allows the devices to perform computations in parallel. At inference time this results in scaling capabilities that are theoretically linear with respect to the number of devices. During the learning stage, PARTIME can leverage the non-i.i.d. nature of the streamed data with samples that are smoothly evolving over time for efficient gradient computations. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning, distributing operations on up to 8 NVIDIA GPUs, showing significant speedups that are almost linear in the number of devices, mitigating the impact of the data transfer overhead.

artificial intelligence, machine learning, partime, (18 more...)

2210.09147

Country:

Europe > Italy (0.46)
Europe > France (0.28)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningSep-1-2020

Developing Constrained Neural Units Over Time

Betti, Alessandro, Gori, Marco, Marullo, Simone, Melacci, Stefano

In this paper we present a foundational study on a constrained method that defines learning problems with Neural Networks in the context of the principle of least cognitive action, which very much resembles the principle of least action in mechanics. Starting from a general approach to enforce constraints into the dynamical laws of learning, this work focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. In particular, the structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data, leading to "architectural" and "input-related" constraints, respectively. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner, that makes this study an important step toward alternative ways of processing continuous streams of data with Neural Networks. The connection with the classic Backpropagation-based update rule of the weights of networks is discussed, showing that there are conditions under which our approach degenerates to Backpropagation. Moreover, the theory is experimentally evaluated on a simple problem that allows us to deeply study several aspects of the theory itself and to show the soundness of the model.

constraint, deep learning, neural network, (17 more...)

2009.00296

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.40)

Industry: Education (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJun-19-2020

Wave Propagation of Visual Stimuli in Focus of Attention

Faggi, Lapo, Betti, Alessandro, Zanca, Dario, Melacci, Stefano, Gori, Marco

Fast reactions to changes in the surrounding visual environment require efficient attention mechanisms to reallocate computational resources to most relevant locations in the visual field. While current computational models keep improving their predictive ability thanks to the increasing availability of data, they still struggle approximating the effectiveness and efficiency exhibited by foveated animals. In this paper, we present a biologically-plausible computational model of focus of attention that exhibits spatiotemporal locality and that is very well-suited for parallel and distributed implementations. Attention emerges as a wave propagation process originated by visual stimuli corresponding to details and motion information. The resulting field obeys the principle of "inhibition of return" so as not to get stuck in potential holes. An accurate experimentation of the model shows that it achieves top level performance in scanpath prediction tasks. This can easily be understood at the light of a theoretical result that we establish in the paper, where we prove that as the velocity of wave propagation goes to infinity, the proposed model reduces to recently proposed state of the art gravitational models of focus of attention.

artificial intelligence, equation, neural network, (16 more...)

2006.11035

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > France > Provence-Alpes-Côte d'Azur (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Machine LearningJun-16-2020

Focus of Attention Improves Information Transfer in Visual Features

Tiezzi, Matteo, Melacci, Stefano, Betti, Alessandro, Maggini, Marco, Gori, Marco

Unsupervised learning from continuous visual streams is a challenging problem that cannot be naturally and efficiently managed in the classic batch-mode setting of computation. The information stream must be carefully processed accordingly to an appropriate spatio-temporal distribution of the visual data, while most approaches of learning commonly assume uniform probability density. In this paper we focus on unsupervised learning for transferring visual information in a truly online setting by using a computational model that is inspired to the principle of least action in physics. The maximization of the mutual information is carried out by a temporal process which yields online estimation of the entropy terms. The model, which is based on second-order differential equations, maximizes the information transfer from the input to a discrete space of symbols related to the visual features of the input, whose computation is supported by hidden neurons. In order to better structure the input probability distribution, we use a human-like focus of attention model that, coherently with the information maximization model, is also based on second-order differential equations. We provide experimental results to support the theory by showing that the spatio-temporal filtering induced by the focus of attention allows the system to globally transfer more information from the input stream over the focused areas and, in some contexts, over the whole frames with respect to the unfiltered case that yields uniform probability distributions.

deep learning, neural network, trajectory, (20 more...)

2006.09229

Country: Europe > France > Provence-Alpes-Côte d'Azur (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningOct-8-2019

A Machine Learning Model for Long-Term Power Generation Forecasting at Bidding Zone Level

Moschella, Michela, Tucci, Mauro, Crisostomi, Emanuele, Betti, Alessandro

--The increasing penetration level of energy generation from renewable sources is demanding for more accurate and reliable forecasting tools to support classic power grid operations (e.g., unit commitment, electricity market clearing or maintenance planning). For this purpose, many physical models have been employed, and more recently many statistical or machine learning algorithms, and data-driven methods in general, are becoming subject of intense research. While generally the power research community focuses on power forecasting at the level of single plants, in a short future horizon of time, in this time we are interested in aggregated macro-area power generation (i.e., in a territory of size greater than 100000 km Real data are used to validate the proposed forecasting methodology on a test set of several months. A. Motivations As the penetration level of Renewable Energy (RE) sources is growing worldwide to meet ever tightening sustainability goals [1], the intermittent and uncertain nature of RE is posing increasing challenges to efficiently manage a power grid, eventually endangering its own stability. In this context, the availability of accurate forecasts of power generation from RE may mitigate the impact of the increasing penetration level and improve the operation of power systems [2].

energy conservation, power generation, renewable energy, (21 more...)

1910.03276

Country: Europe > Italy (0.30)

Genre: Research Report (0.40)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Solar (0.50)
Energy > Renewable > Wind (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

arXiv.org Machine LearningJul-14-2019

On the Role of Time in Learning

Betti, Alessandro, Gori, Marco

By and large the process of learning concepts that are embedded in time is regarded as quite a mature research topic. Hidden Markov models, recurrent neural networks are, amongst others, successful approaches to learning from temporal data. In this paper, we claim that the dominant approach minimizing appropriate risk functions defined over time by classic stochastic gradient might miss the deep interpretation of time given in other fields like physics. We show that a recent reformulation of learning according to the principle of Least Cognitive Action is better suited whenever time is involved in learning. The principle gives rise to a learning process that is driven by differential equations, that can somehow descrive the process within the same framework as other laws of nature.

alessandro betti, artificial intelligence, neural network, (16 more...)

1907.06198

Country: Europe > Italy (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)