Goto

Collaborating Authors

 Horvat, Marko


A Survey of Deep Learning Audio Generation Methods

arXiv.org Artificial Intelligence

This article presents a review of typical techniques used in three distinct aspects of deep learning model development for audio generation. In the first part of the article, we provide an explanation of audio representations, beginning with the fundamental audio waveform. We then progress to the frequency domain, with an emphasis on the attributes of human hearing, and finally introduce a relatively recent development. The main part of the article focuses on explaining basic and extended deep learning architecture variants, along with their practical applications in the field of audio generation. The following architectures are addressed: 1) Autoencoders 2) Generative adversarial networks 3) Normalizing flows 4) Transformer networks 5) Diffusion models. Lastly, we will examine four distinct evaluation metrics that are commonly employed in audio generation. This article aims to offer novice readers and beginners in the field a comprehensive understanding of the current state of the art in audio generation methods as well as relevant studies that can be explored for future research.


WNtags: A Web-Based Tool For Image Labeling And Retrieval With Lexical Ontologies

arXiv.org Artificial Intelligence

Ever growing number of image documents available on the Internet continuously motivates research in better annotation models and more efficient retrieval methods. Formal knowledge representation of objects and events in pictures, their interaction as well as context complexity becomes no longer an option for a quality image repository, but a necessity. We present an ontology-based online image annotation tool WNtags and demonstrate its usefulness in several typical multimedia retrieval tasks using International Affective Picture System emotionally annotated image database. WNtags is built around WordNet lexical ontology but considers Suggested Upper Merged Ontology as the preferred labeling formalism. WNtags uses sets of weighted WordNet synsets as high-level image semantic descriptors and query matching is performed with word stemming and node distance metrics. We also elaborate our near future plans to expand image content description with induced affect as in stimuli for research of human emotion and attention.


Elementary epistemological features of machine intelligence

arXiv.org Artificial Intelligence

Theoretical analysis of machine intelligence (MI) is useful for defining a common platform in both theoretical and applied artificial intelligence (AI). The goal of this paper is to set canonical definitions that can assist pragmatic research in both strong and weak AI. Described epistemological features of machine intelligence include relationship between intelligent behavior, intelligent and unintelligent machine characteristics, observable and unobservable entities and classification of intelligence. The paper also establishes algebraic definitions of efficiency and accuracy of MI tests as their quality measure. The last part of the paper addresses the learning process with respect to the traditional epistemology and the epistemology of MI described here. The proposed views on MI positively correlate to the Hegelian monistic epistemology and contribute towards amalgamating idealistic deliberations with the AI theory, particularly in a local frame of reference.


Tagging multimedia stimuli with ontologies

arXiv.org Artificial Intelligence

Successful management of emotional stimuli is a pivotal issue concerning Affective Computing (AC) and the related research. As a subfield of Artificial Intelligence, AC is concerned not only with the design of computer systems and the accompanying hardware that can recognize, interpret, and process human emotions, but also with the development of systems that can trigger human emotional response in an ordered and controlled manner. This requires the maximum attainable precision and efficiency in the extraction of data from emotionally annotated databases While these databases do use keywords or tags for description of the semantic content, they do not provide either the necessary flexibility or leverage needed to efficiently extract the pertinent emotional content. Therefore, to this extent we propose an introduction of ontologies as a new paradigm for description of emotionally annotated data. The ability to select and sequence data based on their semantic attributes is vital for any study involving metadata, semantics and ontological sorting like the Semantic Web or the Social Semantic Desktop, and the approach described in the paper facilitates reuse in these areas as well.