AITopics | tango

Creation of 3D content by stylization is a promising yet challenging problem in computer vision and graphics research. In this work, we focus on stylizing photorealistic appearance renderings of a given surface mesh of arbitrary topology. Motivated by the recent surge of cross-modal supervision of the Contrastive Language-Image Pre-training (CLIP) model, we propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner. Technically, we propose to disentangle the appearance style as the spatially varying bidirectional reflectance distribution function, the local geometric variation, and the lighting condition, which are jointly optimized, via supervision of the CLIP loss, by a spherical Gaussians based differentiable renderer. As such, TANGO enables photorealistic 3D style transfer by automatically predicting reflectance effects even for bare, low-quality meshes, without training on a task-specific dataset. Extensive experiments show that TANGO outperforms existing methods of text-driven 3D style transfer in terms of photorealistic quality, consistency of 3D geometry, and robustness when stylizing low-quality meshes. Our codes and results are available at our project webpage https://cyw-3d.github.io/tango/.

name change, stylization, text-driven photorealistic, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

c7b925e600ae4880f5c5d7557f70a72b-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 20:51:53 GMT

artificial intelligence, machine learning, vertex displacement framework, (10 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.05)
Asia > China > Hong Kong (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback

TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition

Neural Information Processing SystemsJan-18-2025, 21:02:18 GMT

Creation of 3D content by stylization is a promising yet challenging problem in computer vision and graphics research. In this work, we focus on stylizing photorealistic appearance renderings of a given surface mesh of arbitrary topology. Motivated by the recent surge of cross-modal supervision of the Contrastive Language-Image Pre-training (CLIP) model, we propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner. Technically, we propose to disentangle the appearance style as the spatially varying bidirectional reflectance distribution function, the local geometric variation, and the lighting condition, which are jointly optimized, via supervision of the CLIP loss, by a spherical Gaussians based differentiable renderer. As such, TANGO enables photorealistic 3D style transfer by automatically predicting reflectance effects even for bare, low-quality meshes, without training on a task-specific dataset.

lighting decomposition, stylization, text-driven photorealistic, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

TANGO: Training-free Embodied AI Agents for Open-world Tasks

Ziliotto, Filippo, Campari, Tommaso, Serafini, Luciano, Ballan, Lamberto

arXiv.org Artificial IntelligenceDec-5-2024

Large Language Models (LLMs) have demonstrated excellent capabilities in composing various modules together to create programs that can perform complex reasoning tasks on images. In this paper, we propose TANGO, an approach that extends the program composition via LLMs already observed for images, aiming to integrate those capabilities into embodied agents capable of observing and acting in the world. Specifically, by employing a simple PointGoal Navigation model combined with a memory-based exploration policy as a foundational primitive for guiding an agent through the world, we show how a single model can address diverse tasks without additional training. We task an LLM with composing the provided primitives to solve a specific task, using only a few in-context examples in the prompt. We evaluate our approach on three key Embodied AI tasks: Open-Set ObjectGoal Navigation, Multi-Modal Lifelong Navigation, and Open Embodied Question Answering, achieving state-of-the-art results without any specific fine-tuning in challenging zero-shot scenarios.

large language model, natural language, navigation, (13 more...)

arXiv.org Artificial Intelligence

2412.10402

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Tango*: Constrained synthesis planning using chemically informed value functions

Armstrong, Daniel, Joncev, Zlatko, Guo, Jeff, Schwaller, Philippe

arXiv.org Artificial IntelligenceDec-4-2024

Computer-aided synthesis planning (CASP) has made significant strides in generating retrosynthetic pathways for simple molecules in a non-constrained fashion. Recent work introduces a specialised bidirectional search algorithm with forward and retro expansion to address the starting material-constrained synthesis problem, allowing CASP systems to provide synthesis pathways from specified starting materials, such as waste products or renewable feed-stocks. In this work, we introduce a simple guided search which allows solving the starting material-constrained synthesis planning problem using an existing, uni-directional search algorithm, Retro*. We show that by optimising a single hyperparameter, Tango* outperforms existing methods in terms of efficiency and solve rate. We find the Tango* cost function catalyses strong improvements for the bidirectional DESP methods. Our method also achieves lower wall clock times while proposing synthetic routes of similar length, a common metric for route quality.

cost function, molecule, synthesis planning, (15 more...)

arXiv.org Artificial Intelligence

2412.03424

Country:

North America > United States > California > Orange County > Anaheim (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Materials > Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization

Ma, Haowen, Long, Zhiguo, Meng, Hua

arXiv.org Artificial IntelligenceAug-19-2024

Density-based clustering methods by mode-seeking usually achieve clustering by using local density estimation to mine structural information, such as local dependencies from lower density points to higher neighbors. However, they often rely too heavily on \emph{local} structures and neglect \emph{global} characteristics, which can lead to significant errors in peak selection and dependency establishment. Although introducing more hyperparameters that revise dependencies can help mitigate this issue, tuning them is challenging and even impossible on real-world datasets. In this paper, we propose a new algorithm (TANGO) to establish local dependencies by exploiting a global-view \emph{typicality} of points, which is obtained by mining further the density distributions and initial dependencies. TANGO then obtains sub-clusters with the help of the adjusted dependencies, and characterizes the similarity between sub-clusters by incorporating path-based connectivity. It achieves final clustering by employing graph-cut on sub-clusters, thus avoiding the challenging selection of cluster centers. Moreover, this paper provides theoretical analysis and an efficient method for the calculation of typicality. Experimental results on several synthetic and $16$ real-world datasets demonstrate the effectiveness and superiority of TANGO.

dependency relationship, similarity, typicality, (16 more...)

arXiv.org Artificial Intelligence

2408.10084

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports

Yan, Yanfu, Cooper, Nathan, Chaparro, Oscar, Moran, Kevin, Poshyvanyk, Denys

arXiv.org Artificial IntelligenceJul-11-2024

Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques to manage video-based reports is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about a reported bug. In this paper, we aim to overcome these challenges by advancing the bug report management task of duplicate detection for video-based reports. To this end, we introduce a new approach, called JANUS, that adapts the scene-learning capabilities of vision transformers to capture subtle visual and textual patterns that manifest on app UI screens - which is key to differentiating between similar screens for accurate duplicate report detection. JANUS also makes use of a video alignment technique capable of adaptive weighting of video frames to account for typical bug manifestation patterns. In a comprehensive evaluation on a benchmark containing 7,290 duplicate detection tasks derived from 270 video-based bug reports from 90 Android app bugs, the best configuration of our approach achieves an overall mRR/mAP of 89.8%/84.7%, and for the large majority of duplicate detection tasks, outperforms prior work by around 9% to a statistically significant degree. Finally, we qualitatively illustrate how the scene-learning capabilities provided by Janus benefits its performance.

janus, representation, video, (14 more...)

arXiv.org Artificial Intelligence

2407.0861

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
North America > United States > Virginia > Williamsburg (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Improving Text-To-Audio Models with Synthetic Captions

Kong, Zhifeng, Lee, Sang-gil, Ghosal, Deepanway, Majumder, Navonil, Mehrish, Ambuj, Valle, Rafael, Poria, Soujanya, Catanzaro, Bryan

arXiv.org Artificial IntelligenceJul-8-2024

It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model} to synthesize accurate and diverse captions for audio at scale. We leverage this pipeline to produce a dataset of synthetic captions for AudioSet, named \texttt{AF-AudioSet}, and then evaluate the benefit of pre-training text-to-audio models on these synthetic captions. Through systematic evaluations on AudioCaps and MusicCaps, we find leveraging our pipeline and synthetic captions leads to significant improvements on audio generation quality, achieving a new \textit{state-of-the-art}.

af-audioset, caption, language model, (15 more...)

arXiv.org Artificial Intelligence

2406.15487

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.05)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)

Add feedback

Filters

Collaborating Authors

tango

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SupplementaryMaterialFor"TANGO: Text-driven PhotorealisticandRobust3DStylizationviaLighting Decomposition "

TANGO: Text-drivenPhotorealisticandRobust3D StylizationviaLightingDecomposition

TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition

c7b925e600ae4880f5c5d7557f70a72b-Supplemental-Conference.pdf

TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition

TANGO: Training-free Embodied AI Agents for Open-world Tasks

Tango*: Constrained synthesis planning using chemically informed value functions

TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization

Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports

Improving Text-To-Audio Models with Synthetic Captions