AITopics | Stricker, Didier

Collaborating Authors

Stricker, Didier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation

Sinha, Sankalp, Khan, Mohammad Sadil, Usama, Muhammad, Sam, Shino, Stricker, Didier, Ali, Sk Aziz, Afzal, Muhammad Zeshan

arXiv.org Artificial IntelligenceNov-26-2024

Generating high-fidelity 3D content from text prompts remains a significant challenge in computer vision due to the limited size, diversity, and annotation depth of the existing datasets. To address this, we introduce MARVEL-40M+, an extensive dataset with 40 million text annotations for over 8.9 million 3D assets aggregated from seven major 3D datasets. Our contribution is a novel multi-stage annotation pipeline that integrates open-source pretrained multi-view VLMs and LLMs to automatically produce multi-level descriptions, ranging from detailed (150-200 words) to concise semantic tags (10-20 words). This structure supports both fine-grained 3D reconstruction and rapid prototyping. Furthermore, we incorporate human metadata from source datasets into our annotation pipeline to add domain-specific information in our annotation and reduce VLM hallucinations. Additionally, we develop MARVEL-FX3D, a two-stage text-to-3D pipeline. We fine-tune Stable Diffusion with our annotations and use a pretrained image-to-3D network to generate 3D textured meshes within 15s. Extensive evaluations show that MARVEL-40M+ significantly outperforms existing datasets in annotation quality and linguistic diversity, achieving win rates of 72.41% by GPT-4 and 73.40% by human evaluators.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.17945

Country:

North America > United States (0.67)
Asia (0.67)
Europe (0.67)

Genre: Research Report (1.00)

Industry:

Materials (1.00)
Leisure & Entertainment > Sports (1.00)
Automobiles & Trucks (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BRep Boundary and Junction Detection for CAD Reverse Engineering

Ali, Sk Aziz, Khan, Mohammad Sadil, Stricker, Didier

arXiv.org Artificial IntelligenceSep-21-2024

In machining process, 3D reverse engineering of the mechanical system is an integral, highly important, and yet time consuming step to obtain parametric CAD models from 3D scans. Therefore, deep learning-based Scan-to-CAD modeling can offer designers enormous editability to quickly modify CAD model, being able to parse all its structural compositions and design steps. In this paper, we propose a supervised boundary representation (BRep) detection network BRepDetNet from 3D scans of CC3D and ABC dataset. We have carefully annotated the 50K and 45K scans of both the datasets with appropriate topological relations (e.g., next, mate, previous) between the geometrical primitives (i.e., boundaries, junctions, loops, faces) of their BRep data structures. The proposed solution decomposes the Scan-to-CAD problem in Scan-to-BRep ensuring the right step towards feature-based modeling, and therefore, leveraging other existing BRep-to-CAD modeling methods. Our proposed Scan-to-BRep neural network learns to detect BRep boundaries and junctions by minimizing focal-loss and non-maximal suppression (NMS) during training time. Experimental results show that our BRepDetNet with NMS-Loss achieves impressive results.

artificial intelligence, boundary point, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICMI60790.2024.10585950

2409.14087

Country: Europe (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks

Xie, Yaxu, Pagani, Alain, Stricker, Didier

arXiv.org Artificial IntelligenceMar-28-2024

Scene graphs have been recently introduced into 3D spatial understanding as a comprehensive representation of the scene. The alignment between 3D scene graphs is the first step of many downstream tasks such as scene graph aided point cloud registration, mosaicking, overlap checking, and robot navigation. In this work, we treat 3D scene graph alignment as a partial graph-matching problem and propose to solve it with a graph neural network. We reuse the geometric features learned by a point cloud registration method and associate the clustered point-level geometric features with the node-level semantic feature via our designed feature fusion module. Partial matching is enabled by using a learnable method to select the top-k similar node pairs. Subsequent downstream tasks such as point cloud registration are achieved by running a pre-trained registration network within the matched regions. We further propose a point-matching rescoring method, that uses the node-wise alignment of the 3D scene graph to reweight the matching candidates from a pre-trained point cloud registration method. It reduces the false point correspondences estimated especially in low-overlapping cases. Experiments show that our method improves the alignment accuracy by 10~20% in low-overlap and random transformation scenarios and outperforms the existing work in multiple downstream tasks.

machine learning, natural language, registration, (16 more...)

arXiv.org Artificial Intelligence

2403.19474

Country:

North America (0.14)
Europe (0.14)

Genre: Research Report (0.82)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
(2 more...)

Add feedback

Achieving RGB-D level Segmentation Performance from a Single ToF Camera

Sharma, Pranav, Katrolia, Jigyasa Singh, Rambach, Jason, Mirbach, Bruno, Stricker, Didier, Seiler, Juergen

arXiv.org Artificial IntelligenceJun-30-2023

Depth is a very important modality in computer vision, typically used as complementary information to RGB, provided by RGB-D cameras. In this work, we show that it is possible to obtain the same level of accuracy as RGB-D cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera. In order to fuse the IR and depth modalities of the ToF camera, we introduce a method utilizing depth-specific convolutions in a multi-task learning framework. In our evaluation on an in-car segmentation dataset, we demonstrate the competitiveness of our method against the more costly RGB-D approaches.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2306.17636

Country: Europe > Germany (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Structure PLP-SLAM: Efficient Sparse Mapping and Localization using Point, Line and Plane for Monocular, RGB-D and Stereo Cameras

Shu, Fangwen, Wang, Jiaxuan, Pagani, Alain, Stricker, Didier

arXiv.org Artificial IntelligenceJan-28-2023

This paper presents a visual SLAM system that uses both points and lines for robust camera localization, and simultaneously performs a piece-wise planar reconstruction (PPR) of the environment to provide a structural map in real-time. One of the biggest challenges in parallel tracking and mapping with a monocular camera is to keep the scale consistent when reconstructing the geometric primitives. This further introduces difficulties in graph optimization of the bundle adjustment (BA) step. We solve these problems by proposing several run-time optimizations on the reconstructed lines and planes. Our system is able to run with depth and stereo sensors in addition to the monocular setting. Our proposed SLAM tightly incorporates the semantic and geometric features to boost both frontend pose tracking and backend map optimization. We evaluate our system exhaustively on various datasets, and show that we outperform state-of-the-art methods in terms of trajectory precision. The code of PLP-SLAM has been made available in open-source for the research community (https://github.com/PeterFWS/Structure-PLP-SLAM).

artificial intelligence, optimization problem, plane, (16 more...)

arXiv.org Artificial Intelligence

2207.06058

Genre: Research Report (0.84)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Surface Defect Classification in Real-Time Using Convolutional Neural Networks

Arikan, Selim, Varanasi, Kiran, Stricker, Didier

arXiv.org Machine LearningApr-7-2019

Surface inspection systems are an important application domain for computer vision, as they are used for defect detection and classification in the manufacturing industry. Existing systems use hand-crafted features which require extensive domain knowledge to create. Even though Convolutional neural networks (CNNs) have proven successful in many large-scale challenges, industrial inspection systems have yet barely realized their potential due to two significant challenges: real-time processing speed requirements and specialized narrow domain-specific datasets which are sometimes limited in size. In this paper, we propose CNN models that are specifically designed to handle capacity and real-time speed requirements of surface inspection systems. To train and evaluate our network models, we created a surface image dataset containing more than 22000 labeled images with many types of surface materials and achieved 98.0% accuracy in binary defect classification. To solve the class imbalance problem in our datasets, we introduce neural data augmentation methods which are also applicable to similar domains that suffer from the same problem. Our results show that deep learning based methods are feasible to be used in surface inspection systems and outperform traditional methods in accuracy and inference time by considerable margins.

dataset, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1904.04671

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast Feature Extraction with CNNs with Pooling Layers

Bailer, Christian, Habtegebrial, Tewodros, varanasi, Kiran, Stricker, Didier

arXiv.org Machine LearningMay-8-2018

In recent years, many publications showed that convolutional neural network based features can have a superior performance to engineered features. However, not much effort was taken so far to extract local features efficiently for a whole image. In this paper, we present an approach to compute patch-based local feature descriptors efficiently in presence of pooling and striding layers for whole images at once. Our approach is generic and can be applied to nearly all existing network architectures. This includes networks for all local feature extraction tasks like camera calibration, Patchmatching, optical flow estimation and stereo matching. In addition, our approach can be applied to other patch-based approaches like sliding window object detection and recognition. We complete our paper with a speed benchmark of popular CNN based feature extraction approaches applied on a whole image, with and without our speedup, and example code (for Torch) that shows how an arbitrary CNN architecture can be easily converted by our approach.

deep learning, feature extraction, neural network, (20 more...)

arXiv.org Machine Learning

1805.03096

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Data Science > Data Mining > Feature Extraction (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Fast View Synthesis with Deep Stereo Vision

Habtegebrial, Tewodros, Varanasi, Kiran, Bailer, Christian, Stricker, Didier

arXiv.org Artificial IntelligenceApr-25-2018

Novel view synthesis is an important problem in computer vision and graphics. Over the years a large number of solutions have been put forward to solve the problem. However, the large-baseline novel view synthesis problem is far from being "solved". Recent works have attempted to use Convolutional Neural Networks (CNNs) to solve view synthesis tasks. Due to the difficulty of learning scene geometry and interpreting camera motion, CNNs are often unable to generate realistic novel views. In this paper, we present a novel view synthesis approach based on stereo-vision and CNNs that decomposes the problem into two sub-tasks: view dependent geometry estimation and texture inpainting. Both tasks are structured prediction problems that could be effectively learned with CNNs. Experiments on the KITTI Odometry dataset show that our approach is more accurate and significantly faster than the current state-of-the-art. The code and supplementary material will be publicly available. Results could be found here https://youtu.be/5pzS9jc-5t0.

deep learning, neural network, view synthesis, (15 more...)

arXiv.org Artificial Intelligence

1804.0969

Country: Europe (0.14)

Genre: Research Report (0.64)

Industry:

Media > Television (0.34)
Media > Photography (0.34)
Media > Film (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback