tango
TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition
Creation of 3D content by stylization is a promising yet challenging problem in computer vision and graphics research. In this work, we focus on stylizing photorealistic appearance renderings of a given surface mesh of arbitrary topology. Motivated by the recent surge of cross-modal supervision of the Contrastive Language-Image Pre-training (CLIP) model, we propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner. Technically, we propose to disentangle the appearance style as the spatially varying bidirectional reflectance distribution function, the local geometric variation, and the lighting condition, which are jointly optimized, via supervision of the CLIP loss, by a spherical Gaussians based differentiable renderer. As such, TANGO enables photorealistic 3D style transfer by automatically predicting reflectance effects even for bare, low-quality meshes, without training on a task-specific dataset. Extensive experiments show that TANGO outperforms existing methods of text-driven 3D style transfer in terms of photorealistic quality, consistency of 3D geometry, and robustness when stylizing low-quality meshes. Our codes and results are available at our project webpage https://cyw-3d.github.io/tango/.
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.05)
- Asia > China > Hong Kong (0.05)
TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition
Creation of 3D content by stylization is a promising yet challenging problem in computer vision and graphics research. In this work, we focus on stylizing photorealistic appearance renderings of a given surface mesh of arbitrary topology. Motivated by the recent surge of cross-modal supervision of the Contrastive Language-Image Pre-training (CLIP) model, we propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner. Technically, we propose to disentangle the appearance style as the spatially varying bidirectional reflectance distribution function, the local geometric variation, and the lighting condition, which are jointly optimized, via supervision of the CLIP loss, by a spherical Gaussians based differentiable renderer. As such, TANGO enables photorealistic 3D style transfer by automatically predicting reflectance effects even for bare, low-quality meshes, without training on a task-specific dataset.
TANGO: Training-free Embodied AI Agents for Open-world Tasks
Ziliotto, Filippo, Campari, Tommaso, Serafini, Luciano, Ballan, Lamberto
Large Language Models (LLMs) have demonstrated excellent capabilities in composing various modules together to create programs that can perform complex reasoning tasks on images. In this paper, we propose TANGO, an approach that extends the program composition via LLMs already observed for images, aiming to integrate those capabilities into embodied agents capable of observing and acting in the world. Specifically, by employing a simple PointGoal Navigation model combined with a memory-based exploration policy as a foundational primitive for guiding an agent through the world, we show how a single model can address diverse tasks without additional training. We task an LLM with composing the provided primitives to solve a specific task, using only a few in-context examples in the prompt. We evaluate our approach on three key Embodied AI tasks: Open-Set ObjectGoal Navigation, Multi-Modal Lifelong Navigation, and Open Embodied Question Answering, achieving state-of-the-art results without any specific fine-tuning in challenging zero-shot scenarios.
Tango*: Constrained synthesis planning using chemically informed value functions
Armstrong, Daniel, Joncev, Zlatko, Guo, Jeff, Schwaller, Philippe
Computer-aided synthesis planning (CASP) has made significant strides in generating retrosynthetic pathways for simple molecules in a non-constrained fashion. Recent work introduces a specialised bidirectional search algorithm with forward and retro expansion to address the starting material-constrained synthesis problem, allowing CASP systems to provide synthesis pathways from specified starting materials, such as waste products or renewable feed-stocks. In this work, we introduce a simple guided search which allows solving the starting material-constrained synthesis planning problem using an existing, uni-directional search algorithm, Retro*. We show that by optimising a single hyperparameter, Tango* outperforms existing methods in terms of efficiency and solve rate. We find the Tango* cost function catalyses strong improvements for the bidirectional DESP methods. Our method also achieves lower wall clock times while proposing synthetic routes of similar length, a common metric for route quality.
- North America > United States > California > Orange County > Anaheim (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Materials > Chemicals (0.46)
TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization
Ma, Haowen, Long, Zhiguo, Meng, Hua
Density-based clustering methods by mode-seeking usually achieve clustering by using local density estimation to mine structural information, such as local dependencies from lower density points to higher neighbors. However, they often rely too heavily on \emph{local} structures and neglect \emph{global} characteristics, which can lead to significant errors in peak selection and dependency establishment. Although introducing more hyperparameters that revise dependencies can help mitigate this issue, tuning them is challenging and even impossible on real-world datasets. In this paper, we propose a new algorithm (TANGO) to establish local dependencies by exploiting a global-view \emph{typicality} of points, which is obtained by mining further the density distributions and initial dependencies. TANGO then obtains sub-clusters with the help of the adjusted dependencies, and characterizes the similarity between sub-clusters by incorporating path-based connectivity. It achieves final clustering by employing graph-cut on sub-clusters, thus avoiding the challenging selection of cluster centers. Moreover, this paper provides theoretical analysis and an efficient method for the calculation of typicality. Experimental results on several synthetic and $16$ real-world datasets demonstrate the effectiveness and superiority of TANGO.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- North America > United States > New York (0.04)
Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports
Yan, Yanfu, Cooper, Nathan, Chaparro, Oscar, Moran, Kevin, Poshyvanyk, Denys
Video-based bug reports are increasingly being used to document bugs for programs centered around a graphical user interface (GUI). However, developing automated techniques to manage video-based reports is challenging as it requires identifying and understanding often nuanced visual patterns that capture key information about a reported bug. In this paper, we aim to overcome these challenges by advancing the bug report management task of duplicate detection for video-based reports. To this end, we introduce a new approach, called JANUS, that adapts the scene-learning capabilities of vision transformers to capture subtle visual and textual patterns that manifest on app UI screens - which is key to differentiating between similar screens for accurate duplicate report detection. JANUS also makes use of a video alignment technique capable of adaptive weighting of video frames to account for typical bug manifestation patterns. In a comprehensive evaluation on a benchmark containing 7,290 duplicate detection tasks derived from 270 video-based bug reports from 90 Android app bugs, the best configuration of our approach achieves an overall mRR/mAP of 89.8%/84.7%, and for the large majority of duplicate detection tasks, outperforms prior work by around 9% to a statistically significant degree. Finally, we qualitatively illustrate how the scene-learning capabilities provided by Janus benefits its performance.
- Europe > Portugal > Lisbon > Lisbon (0.05)
- North America > United States > Virginia > Williamsburg (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Improving Text-To-Audio Models with Synthetic Captions
Kong, Zhifeng, Lee, Sang-gil, Ghosal, Deepanway, Majumder, Navonil, Mehrish, Ambuj, Valle, Rafael, Poria, Soujanya, Catanzaro, Bryan
It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model} to synthesize accurate and diverse captions for audio at scale. We leverage this pipeline to produce a dataset of synthetic captions for AudioSet, named \texttt{AF-AudioSet}, and then evaluate the benefit of pre-training text-to-audio models on these synthetic captions. Through systematic evaluations on AudioCaps and MusicCaps, we find leveraging our pipeline and synthetic captions leads to significant improvements on audio generation quality, achieving a new \textit{state-of-the-art}.
- Leisure & Entertainment (1.00)
- Media > Music (0.68)
TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification
Time series classification (TSC) on multivariate time series is a critical problem. We propose a novel multi-view approach integrating frequency-domain and time-domain features to provide complementary contexts for TSC. Our method fuses continuous wavelet transform spectral features with temporal convolutional or multilayer perceptron features. We leverage the Mamba state space model for efficient and scalable sequence modeling. We also introduce a novel tango scanning scheme to better model sequence relationships. Experiments on 10 standard benchmark datasets demonstrate our approach achieves an average 6.45% accuracy improvement over state-of-the-art TSC models.
- North America > United States > Kentucky (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.86)