AITopics | Sakaridis, Christos

Collaborating Authors

Sakaridis, Christos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The BRAVO Semantic Segmentation Challenge Results in UNCV2024

Vu, Tuan-Hung, Valle, Eduardo, Bursuc, Andrei, Kerssies, Tommie, de Geus, Daan, Dubbelman, Gijs, Qian, Long, Zhu, Bingke, Chen, Yingying, Tang, Ming, Wang, Jinqiao, Vojíř, Tomáš, Šochman, Jan, Matas, Jiří, Smith, Michael, Ferrie, Frank, Basu, Shamik, Sakaridis, Christos, Van Gool, Luc

arXiv.org Artificial IntelligenceOct-9-2024

We propose the unified BRAVO challenge to benchmark the reliability of semantic segmentation models under realistic perturbations and unknown out-of-distribution (OOD) scenarios. We define two categories of reliability: (1) semantic reliability, which reflects the model's accuracy and calibration when exposed to various perturbations; and (2) OOD reliability, which measures the model's ability to detect object classes that are unknown during training. The challenge attracted nearly 100 submissions from international teams representing notable research institutions. The results reveal interesting insights into the importance of large-scale pre-training and minimal architectural design in developing robust and reliable semantic segmentation models.

artificial intelligence, bravo semantic segmentation challenge result, uncv2024

arXiv.org Artificial Intelligence

2409.15107

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

OVeNet: Offset Vector Network for Semantic Segmentation

Alexandropoulos, Stamatis, Sakaridis, Christos, Maragos, Petros

arXiv.org Artificial IntelligenceNov-15-2023

Semantic segmentation is a fundamental task in visual scene understanding. We focus on the supervised setting, where ground-truth semantic annotations are available. Based on knowledge about the high regularity of real-world scenes, we propose a method for improving class predictions by learning to selectively exploit information from neighboring pixels. In particular, our method is based on the prior that for each pixel, there is a seed pixel in its close neighborhood sharing the same prediction with the former. Motivated by this prior, we design a novel two-head network, named Offset Vector Network (OVeNet), which generates both standard semantic predictions and a dense 2D offset vector field indicating the offset from each pixel to the respective seed pixel, which is used to compute an alternative, seed-based semantic prediction. The two predictions are adaptively fused at each pixel using a learnt dense confidence map for the predicted offset vector field. We supervise offset vectors indirectly via optimizing the seed-based prediction and via a novel loss on the confidence map. Compared to the baseline state-of-the-art architectures HRNet and HRNet+OCR on which OVeNet is built, the latter achieves significant performance gains on three prominent benchmarks for semantic segmentation, namely Cityscapes, ACDC and ADE20K. Code is available at https://github.com/stamatisalex/OVeNet

artificial intelligence, machine learning, segmentation, (20 more...)

arXiv.org Artificial Intelligence

2303.14516

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding

Zhang, Zhejun, Liniger, Alexander, Sakaridis, Christos, Yu, Fisher, Van Gool, Luc

arXiv.org Artificial IntelligenceOct-19-2023

The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants. Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. However, they suffer from high computational overhead and poor scalability as the number of agents to be predicted increases. To address this problem, we introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. Then, based on KNARPE we present the Heterogeneous Polyline Transformer with Relative pose encoding (HPTR), a hierarchical framework enabling asynchronous token update during the online inference. By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods. Experiments on Waymo and Argoverse-2 datasets show that HPTR achieves superior performance among end-to-end methods that do not apply expensive post-processing or model ensembling. The code is available at https://github.com/zhejz/HPTR.

artificial intelligence, heterogeneous polyline transformer, real-time motion prediction, (1 more...)

arXiv.org Artificial Intelligence

2310.1297

Genre: Research Report (0.40)

Technology:

Information Technology > Architecture > Real Time Systems (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)

Add feedback

Three Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding

Unal, Ozan, Sakaridis, Christos, Saha, Suman, Yu, Fisher, Van Gool, Luc

arXiv.org Artificial IntelligenceSep-8-2023

3D visual grounding is the task of localizing the object in a 3D scene which is referred by a description in natural language. With a wide range of applications ranging from autonomous indoor robotics to AR/VR, the task has recently risen in popularity. A common formulation to tackle 3D visual grounding is grounding-by-detection, where localization is done via bounding boxes. However, for real-life applications that require physical interactions, a bounding box insufficiently describes the geometry of an object. We therefore tackle the problem of dense 3D visual grounding, i.e. referral-based 3D instance segmentation. We propose a dense 3D grounding network ConcreteNet, featuring three novel stand-alone modules which aim to improve grounding performance for challenging repetitive instances, i.e. instances with distractors of the same semantic class. First, we introduce a bottom-up attentive fusion module that aims to disambiguate inter-instance relational cues, next we construct a contrastive training scheme to induce separation in the latent space, and finally we resolve view-dependent utterances via a learned global camera token. ConcreteNet ranks 1st on the challenging ScanRefer online benchmark by a considerable +9.43% accuracy at 50% IoU and has won the ICCV 3rd Workshop on Language for 3D Scenes "3D Object Localization" challenge.

machine learning, natural language, visual grounding, (17 more...)

arXiv.org Artificial Intelligence

2309.04561

Country: Europe > Switzerland > Zürich > Zürich (0.16)

Genre: Research Report (0.82)

Industry: Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)

Add feedback

HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection

Broedermann, Tim, Sakaridis, Christos, Dai, Dengxin, Van Gool, Luc

arXiv.org Artificial IntelligenceAug-11-2023

Besides standard cameras, autonomous vehicles typically include multiple additional sensors, such as lidars and radars, which help acquire richer information for perceiving the content of the driving scene. While several recent works focus on fusing certain pairs of sensors - such as camera with lidar or radar - by using architectural components specific to the examined setting, a generic and modular sensor fusion architecture is missing from the literature. In this work, we propose HRFuser, a modular architecture for multi-modal 2D object detection. It fuses multiple sensors in a multi-resolution fashion and scales to an arbitrary number of input modalities. The design of HRFuser is based on state-of-the-art high-resolution networks for image-only dense prediction and incorporates a novel multi-window cross-attention block as the means to perform fusion of multiple modalities at multiple resolutions. We demonstrate via extensive experiments on nuScenes and the adverse conditions DENSE datasets that our model effectively leverages complementary features from additional modalities, substantially improving upon camera-only performance and consistently outperforming state-of-the-art 3D and 2D fusion methods evaluated on 2D object detection metrics. The source code is publicly available.

artificial intelligence, information fusion, modality, (16 more...)

arXiv.org Artificial Intelligence

2206.15157

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology (0.68)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

Add feedback

Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion

Adamczewski, Kamil, Sakaridis, Christos, Patil, Vaishakh, Van Gool, Luc

arXiv.org Artificial IntelligenceMar-21-2023

Lidar is a vital sensor for estimating the depth of a scene. Typical spinning lidars emit pulses arranged in several horizontal lines and the monetary cost of the sensor increases with the number of these lines. In this work, we present the new problem of optimizing the positioning of lidar lines to find the most effective configuration for the depth completion task. We propose a solution to reduce the number of lines while retaining the up-to-the-mark quality of depth completion. Our method consists of two components, (1) line selection based on the marginal contribution of a line computed via the Shapley value and (2) incorporating line position spread to take into account its need to arrive at image-wide depth completion. Spatially-aware Shapley values (SaS) succeed in selecting line subsets that yield a depth accuracy comparable to the full lidar input while using just half of the lines.

artificial intelligence, machine learning, shapley value, (18 more...)

arXiv.org Artificial Intelligence

2303.1172

Country:

Europe (0.28)
Asia > Middle East (0.14)
Oceania > New Zealand (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Game Theory (0.96)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(3 more...)

Add feedback

L2E: Lasers to Events for 6-DoF Extrinsic Calibration of Lidars and Event Cameras

Ta, Kevin, Bruggemann, David, Brödermann, Tim, Sakaridis, Christos, Van Gool, Luc

arXiv.org Artificial IntelligenceFeb-20-2023

Abstract--As neuromorphic technology is maturing, its application to robotics and autonomous vehicle systems has become an area of active research. In particular, event cameras have emerged as a compelling alternative to frame-based cameras in low-power and latency-demanding applications. To enable event cameras to operate alongside staple sensors like lidar in perception tasks, we propose a direct, temporally-decoupled extrinsic calibration method between event cameras and lidars. The high dynamic range, high temporal resolution, and low-latency operation of event cameras are exploited to directly register lidar laser returns, allowing information-based correlation methods to optimize for the 6-DoF extrinsic calibration between the two sensors. This paper presents the first direct calibration method between event cameras and lidars, removing dependencies on frame-based camera intermediaries and/or highly-accurate hand measurements.

artificial intelligence, event camera, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2207.01009

Country: Europe (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Masked Vision-Language Transformer in Fashion

Ji, Ge-Peng, Zhuge, Mingcheng, Gao, Dehong, Fan, Deng-Ping, Sakaridis, Christos, Van Gool, Luc

arXiv.org Artificial IntelligenceOct-26-2022

We present a masked vision-language transformer (MVLT) for fashion-specific multi-modal representation. Technically, we simply utilize vision transformer architecture for replacing the BERT in the pre-training model, making MVLT the first end-to-end framework for the fashion domain. Besides, we designed masked image reconstruction (MIR) for a fine-grained understanding of fashion. MVLT is an extensible and convenient architecture that admits raw multi-modal inputs without extra pre-processing models (e.g., ResNet), implicitly modeling the vision-language alignments. More importantly, MVLT can easily generalize to various matching and generative tasks. Experimental results show obvious improvements in retrieval (rank@5: 17%) and recognition (accuracy: 3%) tasks over the Fashion-Gen 2018 winner Kaleido-BERT. Code is made available at https://github.com/GewelsJI/MVLT.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11633-022-1394-4

2210.1511

Country:

North America > Canada > Quebec (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Minnesota (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Semantic Understanding of Foggy Scenes with Purely Synthetic Data

Hahner, Martin, Dai, Dengxin, Sakaridis, Christos, Zaech, Jan-Nico, Van Gool, Luc

arXiv.org Artificial IntelligenceOct-9-2019

-- This work addresses the problem of semantic scene understanding under foggy road conditions. Although marked progress has been made in semantic scene understanding over the recent years, it is mainly concentrated on clear weather outdoor scenes. Extending semantic segmentation methods to adverse weather conditions like fog is crucially important for outdoor applications such as self-driving cars. In this paper, we propose a novel method, which uses purely synthetic data to improve the performance on unseen real-world foggy scenes captured in the streets of Zurich and its surroundings. Our results highlight the potential and power of photo-realistic synthetic images for training and especially fine-tuning deep neural nets. Our contributions are threefold, 1) we created a purely synthetic, high-quality foggy dataset of 25,000 unique outdoor scenes, that we call Foggy Synscapes and plan to release publicly 2) we show that with this data we outperform previous approaches on real-world foggy test data 3) we show that a combination of our data and previously used data can even further improve the performance on real-world foggy data. The last years have seen tremendous progress in tasks relevant to autonomous driving [1].

dataset, deep learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

1910.03997

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.36)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback