AITopics | Cuzzolin, Fabio

Collaborating Authors

Cuzzolin, Fabio

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An End-to-End Baseline for Video Captioning

Olivastri, Silvio, Singh, Gurkirt, Cuzzolin, Fabio

arXiv.org Artificial IntelligenceApr-4-2019

Building correspondences across different modalities, such as video and language, has recently become critical in many visual recognition applications, such as video captioning. Inspired by machine translation, recent models tackle this task using an encoder-decoder strategy. The (video) encoder is traditionally a Convolutional Neural Network (CNN), while the decoding (for language generation) is done using a Recurrent Neural Network (RNN). Current state-of-the-art methods, however, train encoder and decoder separately. CNNs are pretrained on object and/or action recognition tasks and used to encode video-level features. The decoder is then optimised on such static features to generate the video's description. This disjoint setup is arguably sub-optimal for input (video) to output (description) mapping. In this work, we propose to optimise both encoder and decoder simultaneously in an end-to-end fashion. In a two-stage training setting, we first initialise our architecture using pre-trained encoders and decoders -- then, the entire network is trained end-to-end in a fine-tuning stage to learn the most relevant features for video caption generation. In our experiments, we use GoogLeNet and Inception-ResNet-v2 as encoders and an original Soft-Attention (SA-) LSTM as a decoder. Analogously to gains observed in other computer vision problems, we show that end-to-end training significantly improves over the traditional, disjoint training process. We evaluate our End-to-End (EtENet) Networks on the Microsoft Research Video Description (MSVD) and the MSR Video to Text (MSR-VTT) benchmark datasets, showing how EtENet achieves state-of-the-art performance across the board.

deep learning, neural network, video, (21 more...)

arXiv.org Artificial Intelligence

1904.02628

Country: Europe > Italy (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Visions of a generalized probability theory

Cuzzolin, Fabio

arXiv.org Artificial IntelligenceOct-18-2018

In this Book we argue that the fruitful interaction of computer vision and belief calculus is capable of stimulating significant advances in both fields. From a methodological point of view, novel theoretical results concerning the geometric and algebraic properties of belief functions as mathematical objects are illustrated and discussed in Part II, with a focus on both a perspective 'geometric approach' to uncertainty and an algebraic solution to the issue of conflicting evidence. In Part III we show how these theoretical developments arise from important computer vision problems (such as articulated object tracking, data association and object pose estimation) to which, in turn, the evidential formalism is able to provide interesting new solutions. Finally, some initial steps towards a generalization of the notion of total probability to belief functions are taken, in the perspective of endowing the theory of evidence with a complete battery of estimation and inference tools to the benefit of all scientists and practitioners.

categorical belief function, logic programming, probabilistic approximation, (31 more...)

arXiv.org Artificial Intelligence

1810.10341

Country:

Asia (1.00)
Europe > France (0.67)
North America > United States > Kansas > Douglas County > Lawrence (0.13)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (0.92)

Industry:

Energy (0.92)
Health & Medicine > Therapeutic Area (0.65)
Health & Medicine > Diagnostic Medicine (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(10 more...)

Add feedback

Predicting Action Tubes

Singh, Gurkirt, Saha, Suman, Cuzzolin, Fabio

arXiv.org Artificial IntelligenceAug-23-2018

In this work, we present a method to predict an entire `action tube' (a set of temporally linked bounding boxes) in a trimmed video just by observing a smaller subset of it. Predicting where an action is going to take place in the near future is essential to many computer vision based applications such as autonomous driving or surgical robotics. Importantly, it has to be done in real-time and in an online fashion. We propose a Tube Prediction network (TPnet) which jointly predicts the past, present and future bounding boxes along with their action classification scores. At test time TPnet is used in a (temporal) sliding window setting, and its predictions are put into a tube estimation framework to construct/predict the video long action tubes not only for the observed part of the video but also for the unobserved part. Additionally, the proposed action tube predictor helps in completing action tubes for unobserved segments of the video. We quantitatively demonstrate the latter ability, and the fact that TPnet improves state-of-the-art detection performance, on one of the standard action detection benchmarks - J-HMDB-21 dataset.

deep learning, neural network, prediction, (18 more...)

arXiv.org Artificial Intelligence

1808.07712

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (0.48)
Information Technology > Robotics & Automation (0.48)
Automobiles & Trucks (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)

Add feedback

Belief likelihood function for generalised logistic regression

Cuzzolin, Fabio

arXiv.org Artificial IntelligenceAug-20-2018

The notion of belief likelihood function of repeated trials is introduced, whenever the uncertainty for individual trials is encoded by a belief measure (a finite random set). This generalises the traditional likelihood function, and provides a natural setting for belief inference from statistical data. Factorisation results are proven for the case in which conjunctive or disjunctive combination are employed, leading to analytical expressions for the lower and upper likelihoods of `sharp' samples in the case of Bernoulli trials, and to the formulation of a generalised logistic regression framework.

artificial intelligence, belief function, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1808.0256

Genre:

Research Report > Experimental Study (0.87)
Research Report > New Finding (0.73)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback