AITopics | Navab, Nassir

Collaborating Authors

Navab, Nassir

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos

Stilz, Florian Philipp, Karaoglu, Mert Asim, Tristram, Felix, Navab, Nassir, Busam, Benjamin, Ladikos, Alexander

arXiv.org Artificial IntelligenceMar-18-2024

Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. Neural rendering has recently shown promising results in endoscopic reconstruction with deforming tissue. However, the setup has been restricted to a static endoscope, limited deformation, or required an external tracking device to retrieve camera pose information of the endoscopic camera. With FLex we adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information. Extensive evaluations on the StereoMIS dataset show that FLex significantly improves the quality of novel view synthesis while maintaining competitive pose accuracy.

artificial intelligence, optimization, radiance field, (13 more...)

arXiv.org Artificial Intelligence

2403.12198

Country: South America > Peru (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Surgery (0.89)
Health & Medicine > Health Care Technology (0.67)
Media > Photography (0.67)
Health & Medicine > Diagnostic Medicine > Imaging (0.47)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Physics-Encoded Graph Neural Networks for Deformation Prediction under Contact

Saleh, Mahdi, Sommersperger, Michael, Navab, Nassir, Tombari, Federico

arXiv.org Artificial IntelligenceFeb-5-2024

In robotics, it's crucial to understand object deformation during tactile interactions. A precise understanding of deformation can elevate robotic simulations and have broad implications across different industries. We introduce a method using Physics-Encoded Graph Neural Networks (GNNs) for such predictions. Similar to robotic grasping and manipulation scenarios, we focus on modeling the dynamics between a rigid mesh contacting a deformable mesh under external forces. Our approach represents both the soft body and the rigid body within graph structures, where nodes hold the physical states of the meshes. We also incorporate cross-attention mechanisms to capture the interplay between the objects. By jointly learning geometry and physics, our model reconstructs consistent and detailed deformations. We've made our code and dataset public to advance research in robotic simulation and grasping.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2402.03466

Country:

Europe > Germany (0.29)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.69)
Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Advancing Surgical VQA with Scene Graph Knowledge

Yuan, Kun, Kattel, Manasi, Lavanchy, Joel L., Navab, Nassir, Srivastav, Vinkle, Padoy, Nicolas

arXiv.org Artificial IntelligenceJan-13-2024

Modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with language capabilities is emerging as a necessity. Our work aims to advance Visual Question Answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question-condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design. First, we propose a Surgical Scene Graph-based dataset, SSG-QA, generated by employing segmentation and detection models on publicly available datasets. We build surgical scene graphs using spatial and action information of instruments and anatomies. These graphs are fed into a question engine, generating diverse QA pairs. Our SSG-QA dataset provides a more complex, diverse, geometrically grounded, unbiased, and surgical action-oriented dataset compared to existing surgical VQA datasets. We then propose SSG-QA-Net, a novel surgical VQA model incorporating a lightweight Scene-embedded Interaction Module (SIM), which integrates geometric scene knowledge in the VQA model design by employing cross-attention between the textual and the scene features. Our comprehensive analysis of the SSG-QA dataset shows that SSG-QA-Net outperforms existing methods across different question types and complexities. We highlight that the primary limitation in the current surgical VQA systems is the lack of scene knowledge to answer complex queries. We present a novel surgical VQA dataset and model and show that results can be significantly improved by incorporating geometric scene features in the VQA model design. The source code and the dataset will be made publicly available at: https://github.com/CAMMA-public/SSG-QA

machine learning, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2312.10251

Country: Europe > Germany (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine > Surgery (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Machine Learning in Robotic Ultrasound Imaging: Challenges and Perspectives

Bi, Yuan, Jiang, Zhongliang, Duelmer, Felix, Huang, Dianye, Navab, Nassir

arXiv.org Artificial IntelligenceJan-4-2024

This article reviews the recent advances in intelligent robotic ultrasound (US) imaging systems. We commence by presenting the commonly employed robotic mechanisms and control techniques in robotic US imaging, along with their clinical applications. Subsequently, we focus on the deployment of machine learning techniques in the development of robotic sonographers, emphasizing crucial developments aimed at enhancing the intelligence of these systems. The methods for achieving autonomous action reasoning are categorized into two sets of approaches: those relying on implicit environmental data interpretation and those using explicit interpretation. Throughout this exploration, we also discuss practical challenges, including those related to the scarcity of medical data, the need for a deeper understanding of the physical aspects involved, and effective data representation approaches. Moreover, we conclude by highlighting the open problems in the field and analyzing different possible perspectives on how the community could move forward in this research area.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2401.02376

Country:

Europe (0.68)
North America > United States (0.46)
North America > Canada > Quebec (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robot-Assisted Deep Venous Thrombosis Ultrasound Examination using Virtual Fixture

Huang, Dianye, Yang, Chenguang, Zhou, Mingchuan, Karlas, Angelos, Navab, Nassir, Jiang, Zhongliang

arXiv.org Artificial IntelligenceJan-4-2024

Deep Venous Thrombosis (DVT) is a common vascular disease with blood clots inside deep veins, which may block blood flow or even cause a life-threatening pulmonary embolism. A typical exam for DVT using ultrasound (US) imaging is by pressing the target vein until its lumen is fully compressed. However, the compression exam is highly operator-dependent. To alleviate intra- and inter-variations, we present a robotic US system with a novel hybrid force motion control scheme ensuring position and force tracking accuracy, and soft landing of the probe onto the target surface. In addition, a path-based virtual fixture is proposed to realize easy human-robot interaction for repeat compression operation at the lesion location. To ensure the biometric measurements obtained in different examinations are comparable, the 6D scanning path is determined in a coarse-to-fine manner using both an external RGBD camera and US images. The RGBD camera is first used to extract a rough scanning path on the object. Then, the segmented vascular lumen from US images are used to optimize the scanning path to ensure the visibility of the target object. To generate a continuous scan path for developing virtual fixtures, an arc-length based path fitting model considering both position and orientation is proposed. Finally, the whole system is evaluated on a human-like arm phantom with an uneven surface.

artificial intelligence, contact force, scan path, (15 more...)

arXiv.org Artificial Intelligence

2401.02539

Country:

North America > United States (0.94)
Asia > China (0.68)
Europe > United Kingdom > England > Greater London > London (0.14)
Europe > France > Île-de-France > Paris > Paris (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions

Rezaei, Razieh, Dizaji, Alireza, Khakzar, Ashkan, Kazi, Anees, Navab, Nassir, Rueckert, Daniel

arXiv.org Artificial IntelligenceDec-31-2023

Neural networks are increasingly finding their way into the realm of graphs and modeling relationships between features. Concurrently graph neural network explanation approaches are being invented to uncover relationships between the nodes of the graphs. However, there is a disparity between the existing attribution methods, and it is unclear which attribution to trust. Therefore research has introduced evaluation experiments that assess them from different perspectives. In this work, we assess attribution methods from a perspective not previously explored in the graph domain: retraining. The core idea is to retrain the network on important (or not important) relationships as identified by the attributions and evaluate how networks can generalize based on these relationships. We reformulate the retraining framework to sidestep issues lurking in the previous formulation and propose guidelines for correct analysis. We run our analysis on four state-of-the-art GNN attribution methods and five synthetic and real-world graph classification datasets. The analysis reveals that attributions perform variably depending on the dataset and the network. Most importantly, we observe that the famous GNNExplainer performs similarly to an arbitrary designation of edge importance. The study concludes that the retraining evaluation cannot be used as a generalized benchmark and recommends it as a toolset to evaluate attributions on a specifically addressed network, dataset, and sparsity.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2401.00633

Country: Europe > Denmark (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deformable 3D Gaussian Splatting for Animatable Human Avatars

Jung, HyunJun, Brasch, Nikolas, Song, Jifei, Perez-Pellitero, Eduardo, Zhou, Yiren, Li, Zhihao, Navab, Nassir, Busam, Benjamin

arXiv.org Artificial IntelligenceDec-22-2023

Recent advances in neural radiance fields enable novel view synthesis of photo-realistic images in dynamic settings, which can be applied to scenarios with human animation. Commonly used implicit backbones to establish accurate models, however, require many input views and additional annotations such as human masks, UV maps and depth maps. In this work, we propose ParDy-Human (Parameterized Dynamic Human Avatar), a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human introduces parameter-driven dynamics into 3D Gaussian Splatting where 3D Gaussians are deformed by a human pose model to animate the avatar. Our method is composed of two parts: A first module that deforms canonical 3D Gaussians according to SMPL vertices and a consecutive module that further takes their designed joint encodings and predicts per Gaussian deformations to deal with dynamics beyond SMPL vertex deformations. Images are then synthesized by a rasterizer. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware. We provide experimental evidence to show that ParDy-Human outperforms state-of-the-art methods on ZJU-MoCap and THUman4.0 datasets both quantitatively and visually.

artificial intelligence, gaussian, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2312.15059

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Re-Nerfing: Enforcing Geometric Constraints on Neural Radiance Fields through Novel Views Synthesis

Tristram, Felix, Gasperini, Stefano, Tombari, Federico, Navab, Nassir, Busam, Benjamin

arXiv.org Artificial IntelligenceDec-4-2023

Neural Radiance Fields (NeRFs) have shown remarkable novel view synthesis capabilities even in large-scale, unbounded scenes, albeit requiring hundreds of views or introducing artifacts in sparser settings. Their optimization suffers from shape-radiance ambiguities wherever only a small visual overlap is available. This leads to erroneous scene geometry and artifacts. In this paper, we propose Re-Nerfing, a simple and general multi-stage approach that leverages NeRF's own view synthesis to address these limitations. With Re-Nerfing, we increase the scene's coverage and enhance the geometric consistency of novel views as follows: First, we train a NeRF with the available views. Then, we use the optimized NeRF to synthesize pseudo-views next to the original ones to simulate a stereo or trifocal setup. Finally, we train a second NeRF with both original and pseudo views while enforcing structural, epipolar constraints via the newly synthesized images. Extensive experiments on the mip-NeRF 360 dataset show the effectiveness of Re-Nerfing across denser and sparser input scenarios, bringing improvements to the state-of-the-art Zip-NeRF, even when trained with all views.

artificial intelligence, computer vision, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2312.02255

Country: Europe (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

Pellegrini, Chantal, Özsoy, Ege, Busam, Benjamin, Navab, Nassir, Keicher, Matthias

arXiv.org Artificial IntelligenceNov-30-2023

Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. Our code is available on github: https://github.com/ChantalMP/RaDialog.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.18681

Country:

Europe > Germany (0.14)
Europe > Spain (0.14)
Europe > France (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Intelligent Robotic Sonographer: Mutual Information-based Disentangled Reward Learning from Few Demonstrations

Jiang, Zhongliang, Bi, Yuan, Zhou, Mingchuan, Hu, Ying, Burke, Michael, Navab, Nassir

arXiv.org Artificial IntelligenceNov-29-2023

Ultrasound (US) imaging is widely used for biometric measurement and diagnosis of internal organs due to the advantages of being real-time and radiation-free. However, due to inter-operator variations, resulting images highly depend on the experience of sonographers. This work proposes an intelligent robotic sonographer to autonomously "explore" target anatomies and navigate a US probe to a relevant 2D plane by learning from the expert. The underlying high-level physiological knowledge from experts is inferred by a neural reward function, using a ranked pairwise image comparisons approach in a self-supervised fashion. This process can be referred to as understanding the "language of sonography". Considering the generalization capability to overcome inter-patient variations, mutual information is estimated by a network to explicitly disentangle the task-related and domain features in latent space. The robotic localization is carried out in coarse-to-fine mode based on the predicted reward associated with B-mode images. To validate the effectiveness of the proposed reward inference network, representative experiments were performed on vascular phantoms ("line" target), two types of ex-vivo animal organs (chicken heart and lamb kidney) phantoms ("point" target) and in-vivo human carotids, respectively. To further validate the performance of the autonomous acquisition framework, physical robotic acquisitions were performed on three phantoms (vascular, chicken heart, and lamb kidney). The results demonstrated that the proposed advanced framework can robustly work on a variety of seen and unseen phantoms as well as in-vivo human carotid data.

demonstration, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2307.03705

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback