AITopics | Susladkar, Onkar

Collaborating Authors

Susladkar, Onkar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods

Susladkar, Onkar, Deshmukh, Gayatri, Mittal, Sparsh, Shastri, Parth

arXiv.org Artificial IntelligenceAug-7-2024

In image processing, one of the most challenging tasks is to render an image's semantic meaning using a variety of artistic approaches. Existing techniques for arbitrary style transfer (AST) frequently experience mode-collapse, over-stylization, or under-stylization due to a disparity between the style and content images. We propose a novel framework called D$^2$Styler (Discrete Diffusion Styler) that leverages the discrete representational capability of VQ-GANs and the advantages of discrete diffusion, including stable training and avoidance of mode collapse. Our method uses Adaptive Instance Normalization (AdaIN) features as a context guide for the reverse diffusion process. This makes it easy to move features from the style image to the content image without bias. The proposed method substantially enhances the visual quality of style-transferred images, allowing the combination of content and style in a visually appealing manner. We take style images from the WikiArt dataset and content images from the COCO dataset. Experimental results demonstrate that D$^2$Styler produces high-quality style-transferred images and outperforms twelve existing methods on nearly all the metrics. The qualitative results and ablation studies provide further insights into the efficacy of our technique. The code is available at https://github.com/Onkarsus13/D2Styler.

content image, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2408.03558

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Ethical Framework for Responsible Foundational Models in Medical Imaging

Das, Abhijit, Jha, Debesh, Sanjotra, Jasmer, Susladkar, Onkar, Sarkar, Suramyaa, Rauniyar, Ashish, Tomar, Nikhil, Sharma, Vanshali, Bagci, Ulas

arXiv.org Artificial IntelligenceApr-13-2024

Foundational models (FMs) have tremendous potential to revolutionize medical imaging. However, their deployment in real-world clinical settings demands extensive ethical considerations. This paper aims to highlight the ethical concerns related to FMs and propose a framework to guide their responsible development and implementation within medicine. We meticulously examine ethical issues such as privacy of patient data, bias mitigation, algorithmic transparency, explainability and accountability. The proposed framework is designed to prioritize patient welfare, mitigate potential risks, and foster trust in AI-assisted healthcare.

arxiv preprint arxiv, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.11868

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

MOVES: Movable and Moving LiDAR Scene Segmentation in Label-Free settings using Static Reconstruction

Kumar, Prashant, Makwana, Dhruv, Susladkar, Onkar, Mittal, Anurag, Kalra, Prem Kumar

arXiv.org Artificial IntelligenceOct-15-2023

Accurate static structure reconstruction and segmentation of non-stationary objects is of vital importance for autonomous navigation applications. These applications assume a LiDAR scan to consist of only static structures. In the real world however, LiDAR scans consist of non-stationary dynamic structures - moving and movable objects. Current solutions use segmentation information to isolate and remove moving structures from LiDAR scan. This strategy fails in several important use-cases where segmentation information is not available. In such scenarios, moving objects and objects with high uncertainty in their motion i.e. movable objects, may escape detection. This violates the above assumption. We present MOVES, a novel GAN based adversarial model that segments out moving as well as movable objects in the absence of segmentation information. We achieve this by accurately transforming a dynamic LiDAR scan to its corresponding static scan. This is obtained by replacing dynamic objects and corresponding occlusions with static structures which were occluded by dynamic objects. We leverage corresponding static-dynamic LiDAR pairs.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Artificial Intelligence

2306.14812

Country: Asia > India (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Add feedback

Towards Scene-Text to Scene-Text Translation

Susladkar, Onkar, Gatti, Prajwal, Mishra, Anand

arXiv.org Artificial IntelligenceAug-6-2023

In this work, we study the task of ``visually" translating scene text from a source language (e.g., English) to a target language (e.g., Chinese). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the text, such as font, size, and background. There are several challenges associated with this task, such as interpolating font to unseen characters and preserving text size and the background. To address these, we introduce VTNet, a novel conditional diffusion-based method. To train the VTNet, we create a synthetic cross-lingual dataset of 600K samples of scene text images in six popular languages, including English, Hindi, Tamil, Chinese, Bengali, and German. We evaluate the performance of VTnet through extensive experiments and comparisons to related methods. Our model also surpasses the previous state-of-the-art results on the conventional scene-text editing benchmarks. Further, we present rigorous qualitative studies to understand the strengths and shortcomings of our model. Results show that our approach generalizes well to unseen words and fonts. We firmly believe our work can benefit real-world applications, such as text translation using a phone camera and translating educational materials. Code and data will be made publicly available.

machine learning, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2308.03024

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback