AITopics | Canberk, Alper

Collaborating Authors

Canberk, Alper

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation

Haji-Ali, Moayed, Menapace, Willi, Siarohin, Aliaksandr, Skorokhodov, Ivan, Canberk, Alper, Lee, Kwot Sin, Ordonez, Vicente, Tulyakov, Sergey

arXiv.org Artificial IntelligenceDec-19-2024

We propose AV-Link, a unified framework for Video-to-Audio and Audio-to-Video generation that leverages the activations of frozen video and audio diffusion models for temporally-aligned cross-modal conditioning. The key to our framework is a Fusion Block that enables bidirectional information exchange between our backbone video and audio diffusion models through a temporally-aligned self attention operation. Unlike prior work that uses feature extractors pretrained for other tasks for the conditioning signal, AV-Link can directly leverage features obtained by the complementary modality in a single framework i.e. video features to generate audio, or audio features to generate video. We extensively evaluate our design choices and demonstrate the ability of our method to achieve synchronized and high-quality audiovisual content, showcasing its potential for applications in immersive media generation. Project Page: snap-research.github.io/AVLink/

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.15191

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Differentiable Robot Rendering

Liu, Ruoshi, Canberk, Alper, Song, Shuran, Vondrick, Carl

arXiv.org Artificial IntelligenceOct-17-2024

Vision foundation models trained on massive amounts of visual data have shown unprecedented reasoning and planning skills in open-world settings. A key challenge in applying them to robotic tasks is the modality gap between visual data and action data. We introduce differentiable robot rendering, a method allowing the visual appearance of a robot body to be directly differentiable with respect to its control parameters. Our model integrates a kinematics-aware deformable model and Gaussians Splatting and is compatible with any robot form factors and degrees of freedom. We demonstrate its capability and usage in applications including reconstruction of robot poses from images and controlling robots through vision language models. Quantitative and qualitative results show that our differentiable rendering model provides effective gradients for robotic control directly from pixels, setting the foundation for the future applications of vision foundation models in robotics.

artificial intelligence, gaussian, robot, (16 more...)

arXiv.org Artificial Intelligence

2410.13851

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback