AITopics | azimuth

Collaborating Authors

azimuth

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep Convolutional Inverse Graphics Network

Tejas D. Kulkarni, William F. Whitney, Pushmeet Kohli, Josh Tenenbaum

Neural Information Processing SystemsOct-2-2025, 13:57:06 GMT

This paper presents the Deep Convolution Inverse Graphics Network (DC-IGN), a model that aims to learn an interpretable representation of images, disentangled with respect to three-dimensional scene structure and viewing transformations such as depth rotations and lighting variations. The DC-IGN model is composed of multiple layers of convolution and de-convolution operators and is trained using the Stochastic Gradient Variational Bayes (SGVB) algorithm [10]. We propose a training procedure to encourage neurons in the graphics code layer to represent a specific transformation (e.g.

artificial intelligence, machine learning, representation, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Generating Moving 3D Soundscapes with Latent Diffusion Models

Templin, Christian, Zhu, Yanda, Wang, Hao

arXiv.org Artificial IntelligenceSep-22-2025

Spatial audio has become central to immersive applications such as VR/AR, cinema, and music. Existing generative audio models are largely limited to mono or stereo formats and cannot capture the full 3D localization cues available in first-order Ambisonics (FOA). Recent FOA models extend text-to-audio generation but remain restricted to static sources. In this work, we introduce SonicMotion, the first end-to-end latent diffusion framework capable of generating FOA audio with explicit control over moving sound sources. SonicMotion is implemented in two variations: 1) a descriptive model conditioned on natural language prompts, and 2) a parametric model conditioned on both text and spatial trajectory parameters for higher precision. To support training and evaluation, we construct a new dataset of over one million simulated FOA caption pairs that include both static and dynamic sources with annotated azimuth, elevation, and motion attributes. Experiments show that SonicMotion achieves state-of-the-art semantic alignment and perceptual quality comparable to leading text-to-audio systems, while uniquely attaining low spatial localization error.

artificial intelligence, international conference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.07318

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-8-2025, 02:32:20 GMT

Summary: the paper proposes a CNN for learning explicit image representations as an inverse graphics problem. The image representation has interpretable explicit representations, in particular pose angles and lighting angles, along with implicit representations (texture, appearance). This is done in an autoencoder framework with reconstruction error. To make a particular latent dimension focus on one aspect (e.g. Experiments on two datasets showing reconstructions of a 3D object at varying poses and illumination directions.

author feedback and meta-review, azimuth, representation, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ImmerseDiffusion: A Generative Spatial Audio Latent Diffusion Model

Heydari, Mojtaba, Souden, Mehrez, Conejo, Bruno, Atkins, Joshua

arXiv.org Artificial IntelligenceOct-18-2024

We introduce ImmerseDiffusion, an end-to-end generative audio model that produces 3D immersive soundscapes conditioned on the spatial, temporal, and environmental conditions of sound objects. ImmerseDiffusion is trained to generate first-order ambisonics (FOA) audio, which is a conventional spatial audio format comprising four channels that can be rendered to multichannel spatial output. The proposed generative system is composed of a spatial audio codec that maps FOA audio to latent components, a latent diffusion model trained based on various user input types, namely, text prompts, spatial, temporal and environmental acoustic parameters, and optionally a spatial audio and text encoder trained in a Contrastive Language and Audio Pretraining (CLAP) style. We propose metrics to evaluate the quality and spatial adherence of the generated spatial audio. Finally, we assess the model performance in terms of generation quality and spatial conformance, comparing the two proposed modes: ``descriptive", which uses spatial text prompts) and ``parametric", which uses non-spatial text prompts and spatial parameters. Our evaluations demonstrate promising results that are consistent with the user conditions and reflect reliable spatial fidelity.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.14945

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.95)
Media > Music (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Are Doppler Velocity Measurements Useful for Spinning Radar Odometry?

Lisus, Daniil, Burnett, Keenan, Yoon, David J., Poulton, Richard, Marshall, John, Barfoot, Timothy D.

arXiv.org Artificial IntelligenceJul-12-2024

Spinning, frequency-modulated continuous-wave (FMCW) radars with 360 degree coverage have been gaining popularity for autonomous-vehicle navigation. However, unlike 'fixed' automotive radar, commercially available spinning radar systems typically do not produce radial velocities due to the lack of repeated measurements in the same direction and the fundamental hardware setup. To make these radial velocities observable, we modified the firmware of a commercial spinning radar to use triangular frequency modulation. In this paper, we develop a novel way to use this modulation to extract radial Doppler velocity measurements from single raw radar intensity scans without any required data association. We show that these noisy, error-prone measurements contain enough information to provide good ego-velocity estimates, and incorporate these estimates into different modern odometry pipelines. We extensively evaluate the pipelines on over 110 km of driving data in progressively more geometrically challenging autonomous-driving environments. We show that Doppler velocity measurements improve odometry in well-defined geometric conditions and enable it to continue functioning even in severely geometrically degenerate environments, such as long tunnels.

azimuth, pipeline, radar, (15 more...)

arXiv.org Artificial Intelligence

2404.01537

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire (0.04)

Genre: Research Report (0.82)

Industry: Transportation > Ground > Road (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.54)

Add feedback

Improving Chinese Character Representation with Formation Tree

Hong, Yang, Li, Yinfei, Qiao, Xiaojun, Li, Rui, Zhang, Junsong

arXiv.org Artificial IntelligenceApr-19-2024

Learning effective representations for Chinese characters presents unique challenges, primarily due to the vast number of characters and their continuous growth, which requires models to handle an expanding category space. Additionally, the inherent sparsity of character usage complicates the generalization of learned representations. Prior research has explored radical-based sequences to overcome these issues, achieving progress in recognizing unseen characters. However, these approaches fail to fully exploit the inherent tree structure of such sequences. To address these limitations and leverage established data properties, we propose Formation Tree-CLIP (FT-CLIP). This model utilizes formation trees to represent characters and incorporates a dedicated tree encoder, significantly improving performance in both seen and unseen character recognition tasks. We further introduce masking for to both character images and tree nodes, enabling efficient and effective training. This approach accelerates training significantly (by a factor of 2 or more) while enhancing accuracy. Extensive experiments show that processing characters through formation trees aligns better with their inherent properties than direct sequential methods, significantly enhancing the generality and usability of the representations.

node, recognition, representation, (14 more...)

arXiv.org Artificial Intelligence

2404.12693

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Fujian Province > Xiamen (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.92)

Add feedback

Deep Convolutional Inverse Graphics Network, William F. Whitney* 2, Joshua B. Tenenbaum

Neural Information Processing SystemsMar-13-2024, 03:43:38 GMT

batch, representation, transformation, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Simulating Nighttime Visible Satellite Imagery of Tropical Cyclones Using Conditional Generative Adversarial Networks

Yao, Jinghuai, Du, Puyuan, Zhao, Yucheng, Wang, Yubo

arXiv.org Artificial IntelligenceJan-21-2024

Visible (VIS) imagery of satellites has various important applications in meteorology, including monitoring Tropical Cyclones (TCs). However, it is unavailable at night because of the lack of sunlight. This study presents a Conditional Generative Adversarial Networks (CGAN) model that generates highly accurate nighttime visible reflectance using infrared (IR) bands and sunlight direction parameters as input. The model was trained and validated using target area observations of the Advanced Himawari Imager (AHI) in the daytime. This study also presents the first nighttime model validation using the Day/Night Band (DNB) of the Visible/Infrared Imager Radiometer Suite (VIIRS). The daytime statistical results of the Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), Root Mean Square Error (RMSE), Correlation Coefficient (CC), and Bias are 0.885, 28.3, 0.0428, 0.984, and -0.0016 respectively, completely surpassing the model performance of previous studies. The nighttime statistical results of SSIM, PSNR, RMSE, and CC are 0.821, 24.4, 0.0643, and 0.969 respectively, which are slightly negatively impacted by the parallax between satellites. We performed full-disk model validation which proves our model could also be readily applied in the tropical ocean without TCs in the northern hemisphere. This model contributes to the nighttime monitoring of meteorological phenomena by providing accurate AI-generated visible imagery with adjustable virtual sunlight directions.

validation, vis image, zenith angle, (16 more...)

arXiv.org Artificial Intelligence

2401.11679

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Japan (0.04)
Europe > Finland (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (0.67)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.51)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Doppler-aware Odometry from FMCW Scanning Radar

Rennie, Fraser, Williams, David, Newman, Paul, De Martini, Daniele

arXiv.org Artificial IntelligenceDec-14-2023

Abstract-- This work explores Doppler information from a millimetre-Wave (mm-W) Frequency-Modulated Continuous-Wave (FMCW) scanning radar to make odometry estimation more robust and accurate. Firstly, doppler information is added to the scan masking process to enhance correlative scan matching. Secondly, we train a Neural Network (NN) for regressing forward velocity directly from a single radar scan; we fuse this estimate with the correlative scan matching estimate and show improved robustness to bad estimates caused by challenging environment geometries, e.g. We test our method with a novel custom dataset which is released with this work at https://ori.ox.ac.uk/publications/datasets. Index Terms-- radar odometry, doppler, navigation, dataset As considered deployment scenarios become more challenging, the detection methods and the sensors collecting data about a vehicle's surroundings must Figure 1: Radar scan from the RDD dataset. Currently, the primary sensors used by autonomous two regions extracted show the "zig-zag" pattern caused by vehicles are cameras and LiDAR: while these traditional the alternating modulation patterns - in conjunction with the sensors may perform adequately under favourable conditions, ego-vehicle speed.

doppler information, estimation, odometry, (13 more...)

arXiv.org Artificial Intelligence

2308.10597

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.40)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

Add feedback

Decentralized shape formation and force-based interactive formation control in robot swarms

S, Akshaya C, Soma, Karthik, B, Visweswaran, Ravichander, Aditya, PM, Venkata Nagarjun

arXiv.org Artificial IntelligenceSep-3-2023

Swarm robotic systems utilize collective behaviour to achieve goals that might be too complex for a lone entity, but become attainable with localized communication and collective decision making. In this paper, a behaviour-based distributed approach to shape formation is proposed. Flocking into strategic formations is observed in migratory birds and fish to avoid predators and also for energy conservation. The formation is maintained throughout long periods without collapsing and is advantageous for communicating within the flock. Similar behaviour can be deployed in multi-agent systems to enhance coordination within the swarm. Existing methods for formation control are either dependent on the size and geometry of the formation or rely on maintaining the formation with a single reference in the swarm (the leader). These methods are not resilient to failure and involve a high degree of deformation upon obstacle encounter before the shape is recovered again. To improve the performance, artificial force-based interaction amongst the entities of the swarm to maintain shape integrity while encountering obstacles is elucidated.

bot, obstacle, swarm, (15 more...)

arXiv.org Artificial Intelligence

2309.0124

Country:

Asia > India (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback