Goto

Collaborating Authors

 Overview


SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts

arXiv.org Artificial Intelligence

This paper addresses the problem of set-to-set matching, which involves matching two different sets of items based on some criteria, especially in the case of high-dimensional items like images. Although neural networks have been applied to solve this problem, most machine learning-based approaches assume that the training and test data follow the same distribution, which is not always true in real-world scenarios. To address this limitation, we introduce SHIFT15M, a dataset that can be used to evaluate set-to-set matching models when the distribution of data changes between training and testing. We conduct benchmark experiments that demonstrate the performance drop of naive methods due to distribution shift. Additionally, we provide software to handle the SHIFT15M dataset in a simple manner, with the URL for the software to be made available after publication of this manuscript. We believe proposed SHIFT15M dataset provide a valuable resource for evaluating set-to-set matching models under the distribution shift.


Automated Cyber Defence: A Review

arXiv.org Artificial Intelligence

Within recent times, cybercriminals have curated a variety of organised and resolute cyber attacks within a range of cyber systems, leading to consequential ramifications to private and governmental institutions. Current security-based automation and orchestrations focus on automating fixed purpose and hard-coded solutions, which are easily surpassed by modern-day cyber attacks. Research within Automated Cyber Defence will allow the development and enabling intelligence response by autonomously defending networked systems through sequential decision-making agents. This article comprehensively elaborates the developments within Automated Cyber Defence through a requirement analysis divided into two sub-areas, namely, automated defence and attack agents and Autonomous Cyber Operation (ACO) Gyms. The requirement analysis allows the comparison of automated agents and highlights the importance of ACO Gyms for their continual development. The requirement analysis is also used to critique ACO Gyms with an overall aim to develop them for deploying automated agents within real-world networked systems. Relevant future challenges were addressed from the overall analysis to accelerate development within the area of Automated Cyber Defence.


Models of symbol emergence in communication: a conceptual review and a guide for avoiding local minima

arXiv.org Artificial Intelligence

Computational simulations are a popular method for testing hypotheses about the emergence of communication. This kind of research is performed in a variety of traditions including language evolution, developmental psychology, cognitive science, machine learning, robotics, etc. The motivations for the models are different, but the operationalizations and methods used are often similar. We identify the assumptions and explanatory targets of several most representative models and summarise the known results. We claim that some of the assumptions -- such as portraying meaning in terms of mapping, focusing on the descriptive function of communication, modelling signals with amodal tokens -- may hinder the success of modelling. Relaxing these assumptions and foregrounding the interactions of embodied and situated agents allows one to systematise the multiplicity of pressures under which symbolic systems evolve. In line with this perspective, we sketch the road towards modelling the emergence of meaningful symbolic communication, where symbols are simultaneously grounded in action and perception and form an abstract system.


A primer on getting neologisms from foreign languages to under-resourced languages

arXiv.org Artificial Intelligence

Neologisms are certain uses, expressions, and words that did not traditionally exist in a language, but are incorporated into it due to the need of speakers to adapt to a new reality [1]. That is, neologisms are those new words and expressions that speakers incorporate into a language, as new things and new ways of doing to name arise. They are the exact opposite of archaisms. The appearance of neologisms is a common and ordinary process in all languages, forced as they are to adapt and update or die. However, a word can be considered a neologism only for a certain time, since once it has been incorporated and normalized as part of the language, it simply ceases to be a novelty. The simplest way to classify neologisms would be from the method used to create them, thus we have: 1. morphological neologisms: they are built using words that already exist in the language, through the processes of composition or derivation. For example, the word "aircraft" was once a neologism, made up of the prefix "air" and the suffix "craft". This also happens with "teleoperators" or with "biosecurity".


Survey of Machine Learning Based Intrusion Detection Methods for Internet of Medical Things

arXiv.org Artificial Intelligence

The Internet of Medical Things (IoMT) has revolutionized the healthcare industry by enabling physiological data collection using sensors, which are transmitted to remote servers for continuous analysis by physicians and healthcare professionals. This technology offers numerous benefits, including early disease detection and automatic medication for patients with chronic illnesses. However, IoMT technology also presents significant security risks, such as violating patient privacy or exposing sensitive data to interception attacks due to wireless communication, which could be fatal for the patient. Additionally, traditional security measures, such as cryptography, are challenging to implement in medical equipment due to the heterogeneous communication and their limited computation, storage, and energy capacity. These protection methods are also ineffective against new and zero-day attacks. It is essential to adopt robust security measures to ensure data integrity, confidentiality, and availability during data collection, transmission, storage, and processing. In this context, using Intrusion Detection Systems (IDS) based on Machine Learning (ML) can bring a complementary security solution adapted to the unique characteristics of IoMT systems. Therefore, this paper investigates how IDS based on ML can address security and privacy issues in IoMT systems. First, the generic three-layer architecture of IoMT is provided, and the security requirements of IoMT systems are outlined. Then, the various threats that can affect IoMT security are identified, and the advantages, disadvantages, methods, and datasets used in each solution based on ML at the three layers that make up IoMT are presented. Finally, the paper discusses the challenges and limitations of applying IDS based on ML at each layer of IoMT, which can serve as a future research direction.


Graph Neural Networks in Vision-Language Image Understanding: A Survey

arXiv.org Artificial Intelligence

2D image understanding is a complex problem within Computer Vision, but it holds the key to providing human level scene comprehension. It goes further than identifying the objects in an image, and instead it attempts to understand the scene. Solutions to this problem form the underpinning of a range of tasks, including image captioning, Visual Question Answering (VQA), and image retrieval. Graphs provide a natural way to represent the relational arrangement between objects in an image, and thus in recent years Graph Neural Networks (GNNs) have become a standard component of many 2D image understanding pipelines, becoming a core architectural component especially in the VQA group of tasks. In this survey, we review this rapidly evolving field and we provide a taxonomy of graph types used in 2D image understanding approaches, a comprehensive list of the GNN models used in this domain, and a roadmap of future potential developments. To the best of our knowledge, this is the first comprehensive survey that covers image captioning, visual question answering, and image retrieval techniques that focus on using GNNs as the main part of their architecture.


A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design

arXiv.org Artificial Intelligence

Semantic image and video segmentation stand among the most important tasks in computer vision nowadays, since they provide a complete and meaningful representation of the environment by means of a dense classification of the pixels in a given scene. Recently, Deep Learning, and more precisely Convolutional Neural Networks, have boosted semantic segmentation to a new level in terms of performance and generalization capabilities. However, designing Deep Semantic Segmentation models is a complex task, as it may involve application-dependent aspects. Particularly, when considering autonomous driving applications, the robustness-efficiency trade-off, as well as intrinsic limitations - computational/memory bounds and data-scarcity - and constraints - real-time inference - should be taken into consideration. In this respect, the use of additional data modalities, such as depth perception for reasoning on the geometry of a scene, and temporal cues from videos to explore redundancy and consistency, are promising directions yet not explored to their full potential in the literature. In this paper, we conduct a survey on the most relevant and recent advances in Deep Semantic Segmentation in the context of vision for autonomous vehicles, from three different perspectives: efficiency-oriented model development for real-time operation, RGB-Depth data integration (RGB-D semantic segmentation), and the use of temporal information from videos in temporally-aware models. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective, so that the reader can not only get started, but also be up to date in respect to recent advances in this exciting and challenging research field.


Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts, Datasets and Metrics

arXiv.org Artificial Intelligence

One of the main paths towards the reduction of traffic accidents is the increase in vehicle safety through driver assistance systems or even systems with a complete level of autonomy. In these types of systems, tasks such as obstacle detection and segmentation, especially the Deep Learning-based ones, play a fundamental role in scene understanding for correct and safe navigation. Besides that, the wide variety of sensors in vehicles nowadays provides a rich set of alternatives for improvement in the robustness of perception in challenging situations, such as navigation under lighting and weather adverse conditions. Despite the current focus given to the subject, the literature lacks studies on radar-based and radar-camera fusion-based perception. Hence, this work aims to carry out a study on the current scenario of camera and radar-based perception for ADAS and autonomous vehicles. Concepts and characteristics related to both sensors, as well as to their fusion, are presented. Additionally, we give an overview of the Deep Learning-based detection and segmentation tasks, and the main datasets, metrics, challenges, and open questions in vehicle perception.


Foundation Models for Decision Making: Problems, Methods, and Opportunities

arXiv.org Artificial Intelligence

Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks. When such models are deployed in real world environments, they inevitably interface with other entities and agents. For example, language models are often used to interact with human beings through dialogue, and visual perception models are used to autonomously navigate neighborhood streets. In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning. These paradigms leverage the existence of ever-larger datasets curated for multimodal, multitask, and generalist interaction. Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems that can interact effectively across a diverse range of applications such as dialogue, autonomous driving, healthcare, education, and robotics. In this manuscript, we examine the scope of foundation models for decision making, and provide conceptual tools and technical background for understanding the problem space and exploring new research directions. We review recent approaches that ground foundation models in practical decision making applications through a variety of methods such as prompting, conditional generative modeling, planning, optimal control, and reinforcement learning, and discuss common challenges and open problems in the field.


Causal Representation Learning for Instantaneous and Temporal Effects in Interactive Systems

arXiv.org Artificial Intelligence

Causal representation learning is the task of identifying the underlying causal variables and their relations from high-dimensional observations, such as images. Recent work has shown that one can reconstruct the causal variables from temporal sequences of observations under the assumption that there are no instantaneous causal relations between them. In practical applications, however, our measurement or frame rate might be slower than many of the causal effects. This effectively creates "instantaneous" effects and invalidates previous identifiability results. To address this issue, we propose iCITRIS, a causal representation learning method that allows for instantaneous effects in intervened temporal sequences when intervention targets can be observed, e.g., as actions of an agent. iCITRIS identifies the potentially multidimensional causal variables from temporal observations, while simultaneously using a differentiable causal discovery method to learn their causal graph. In experiments on three datasets of interactive systems, iCITRIS accurately identifies the causal variables and their causal graph.