AITopics | Gevers, Theo

Collaborating Authors

Gevers, Theo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting

Xing, Xiaoyan, Groh, Konrad, Karaoglu, Sezer, Gevers, Theo, Bhattad, Anand

arXiv.org Artificial IntelligenceDec-3-2024

We introduce LumiNet, a novel architecture that leverages generative models and latent intrinsic representations for effective lighting transfer. Given a source image and a target lighting image, LumiNet synthesizes a relit version of the source scene that captures the target's lighting. Our approach makes two key contributions: a data curation strategy from the StyleGAN-based relighting model for our training, and a modified diffusion-based ControlNet that processes both latent intrinsic properties from the source image and latent extrinsic properties from the target image. We further improve lighting transfer through a learned adaptor (MLP) that injects the target's latent extrinsic properties via cross-attention and fine-tuning. Unlike traditional ControlNet, which generates images with conditional maps from a single scene, LumiNet processes latent representations from two different images - preserving geometry and albedo from the source while transferring lighting characteristics from the target. Experiments demonstrate that our method successfully transfers complex lighting phenomena including specular highlights and indirect illumination across scenes with varying spatial layouts and materials, outperforming existing approaches on challenging indoor scenes using only images as input.

artificial intelligence, lighting condition, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.00177

Country: Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM

Yugay, Vladimir, Gevers, Theo, Oswald, Martin R.

arXiv.org Artificial IntelligenceNov-25-2024

Simultaneous localization and mapping (SLAM) systems with novel view synthesis capabilities are widely used in computer vision, with applications in augmented reality, robotics, and autonomous driving. However, existing approaches are limited to single-agent operation. Recent work has addressed this problem using a distributed neural scene representation. Unfortunately, existing methods are slow, cannot accurately render real-world data, are restricted to two agents, and have limited tracking accuracy. In contrast, we propose a rigidly deformable 3D Gaussian-based scene representation that dramatically speeds up the system. However, improving tracking accuracy and reconstructing a globally consistent map from multiple agents remains challenging due to trajectory drift and discrepancies across agents' observations. Therefore, we propose new tracking and map-merging mechanisms and integrate loop closure in the Gaussian-based SLAM pipeline. We evaluate MAGiC-SLAM on synthetic and real-world datasets and find it more accurate and faster than the state of the art.

agent, artificial intelligence, magic-slam, (15 more...)

arXiv.org Artificial Intelligence

2411.16785

Country: Europe > Netherlands (0.28)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.48)
Information Technology > Robotics & Automation (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

Yugay, Vladimir, Li, Yue, Gevers, Theo, Oswald, Martin R.

arXiv.org Artificial IntelligenceDec-6-2023

Specifically, earlier works focus a scene representation. The new representation enables on tracking using various scene representations like interactive-time reconstruction and photo-realistic rendering feature point clouds [15, 26, 40], surfels [53, 71], depth of real-world and synthetic scenes. We propose novel maps [43, 58], or implicit representations [14, 42, 44]. Later strategies for seeding and optimizing Gaussian splats to works focused more on the map quality and density. With extend their use from multiview offline scenarios to sequential the advent of powerful neural scene representations like monocular RGBD input data setups. In addition, we neural radiance fields [38] that allow for high fidelity viewsynthesis, extend Gaussian splats to encode geometry and experiment a rapidly growing body of dense neural SLAM with tracking against this scene representation. Our methods [19, 34, 51, 60, 62, 64, 81, 84] has been developed.

artificial intelligence, optimization problem, survey article, (17 more...)

arXiv.org Artificial Intelligence

2312.1007

Country:

Europe > Netherlands (0.28)
Asia > Middle East > Israel (0.14)

Genre:

Overview (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

HaarNet: Large-scale Linear-Morphological Hybrid Network for RGB-D Semantic Segmentation

Groenendijk, Rick, Dorst, Leo, Gevers, Theo

arXiv.org Artificial IntelligenceOct-11-2023

Signals from different modalities each have their own combination algebra which affects their sampling processing. RGB is mostly linear; depth is a geometric signal following the operations of mathematical morphology. If a network obtaining RGB-D input has both kinds of operators available in its layers, it should be able to give effective output with fewer parameters. In this paper, morphological elements in conjunction with more familiar linear modules are used to construct a mixed linear-morphological network called HaarNet. This is the first large-scale linear-morphological hybrid, evaluated on a set of sizeable real-world datasets. In the network, morphological Haar sampling is applied to both feature channels in several layers, which splits extreme values and high-frequency information such that both can be processed to improve both modalities. Moreover, morphologically parameterised ReLU is used, and morphologically-sound up-sampling is applied to obtain a full-resolution output. Experiments show that HaarNet is competitive with a state-of-the-art CNN, implying that morphological networks are a promising research direction for geometry-based learning tasks.

artificial intelligence, large-scale linear-morphological hybrid network, rgb-d semantic segmentation, (1 more...)

arXiv.org Artificial Intelligence

2310.07669

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

MorphPool: Efficient Non-linear Pooling & Unpooling in CNNs

Groenendijk, Rick, Dorst, Leo, Gevers, Theo

arXiv.org Artificial IntelligenceNov-25-2022

Contemporary deep learning architectures exploit pooling operations for two reasons: to filter impactful activation values from feature maps, and to reduce spatial feature size [28]. The most used pooling operation is the max pool, which is used in nearly all common network architectures such as ResNet [14], VGGNet [32], and DenseNet [16]. These network architectures can be applied to pixel-level prediction tasks, such as semantic segmentation. To do so, inputs are down-sampled to a set of latent features of small spatial size, after which they are up-sampled to full resolution again. Up-sampling from pooled feature sets most often happens with a combination of unpooling and deconvolution [41, 42] and is used in seminal works such as [3, 22, 26]. As will be shown in this paper, down-sampling using max pooling can be formalised and improved using mathematical morphology, the mathematics of contact. Ever since the works of Serra [29], the underlying algebraic structure of data that is acquired using probing contact (e.g. LiDAR and radar) has been known to the computer vision community [5, 11, 25, 33]. It is different from the algebra of linear diffusion that is used to build convolutional neural networks (CNNs).

artificial intelligence, machine learning, opération, (17 more...)

arXiv.org Artificial Intelligence

2211.14037

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Loss Weighting with Coefficient of Variations

Groenendijk, Rick, Karaoglu, Sezer, Gevers, Theo, Mensink, Thomas

arXiv.org Artificial IntelligenceSep-3-2020

Many interesting tasks in machine learning and computer vision are learned by optimising an objective function defined as a weighted linear combination of multiple losses. The final performance is sensitive to choosing the correct (relative) weights for these losses. Finding a good set of weights is often done by adopting them into the set of hyper-parameters, which are set using an extensive grid search. This is computationally expensive. In this paper, the weights are defined based on properties observed while training the model, including the specific batch loss, the average loss, and the variance for each of the losses. An additional advantage is that the defined weights evolve during training, instead of using static loss weights. In literature, loss weighting is mostly used in a multi-task learning setting, where the different tasks obtain different weights. However, there is a plethora of single-task multi-loss problems that can benefit from automatic loss weighting. In this paper, it is shown that these multi-task approaches do not work on single tasks. Instead, a method is proposed that automatically and dynamically tunes loss weights throughout training specifically for single-task multi-loss problems. The method incorporates a measure of uncertainty to balance the losses. The validity of the approach is shown empirically for different tasks on multiple datasets.

loss ratio, neural network, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2009.01717

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Towards Personalised Gaming via Facial Expression Recognition

Blom, Paris Mavromoustakos (University of Amsterdam) | Bakkes, Sander (University of Amsterdam) | Tan, Chek Tien (University of Technology Sydney) | Whiteson, Shimon (University of Amsterdam) | Roijers, Diederik (University of Amsterdam) | Valenti, Roberto (University of Amsterdam) | Gevers, Theo (University of Amsterdam)

AAAI ConferencesSep-29-2014

In this paper we propose an approach for personalising the space in which a game is played (i.e., levels) dependent on classifications of the user's facial expression — to the end of tailoring the affective game experience to the individual user. Our approach is aimed at online game personalisation, i.e., the game experience is personalised during actual play of the game. A key insight of this paper is that game personalisation techniques can leverage novel computer vision-based techniques to unobtrusively infer player experiences automatically based on facial expression analysis. Specifically, to the end of tailoring the affective game experience to the individual user, in this paper we (1) leverage the proven InSight facial expression recognition SDK as a model of the user's affective state InSight, and (2) employ this model for guiding the online game personalisation process. User studies that validate the game personalisation approach in the actual video game Infinite Mario Bros. reveal that it provides an effective basis for converging to an appropriate affective state for the individual human player.

facial expression recognition, personalised gaming

AAAI Conferences

Tenth Artificial Intelligence and Interactive Digital Entertainment Conference

Industry: Leisure & Entertainment > Games > Computer Games (0.73)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)

Add feedback