AITopics | gallery

Collaborating Authors

gallery

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Couple to Control: Joint Initial Noise Design in Diffusion Models

Jia, Jing, Shen, Liyue, Wang, Guanyang

arXiv.org Machine LearningMay-13-2026

Diffusion models typically generate image batches from independent Gaussian initial noises. We argue that this independence assumption is only one choice within a broader class of valid joint noise designs. Instead, one can specify a coupling of the initial noises: each noise remains marginally standard Gaussian, so the pretrained diffusion model receives the same single-sample input distribution, while the dependence across samples is chosen by design. This reframes initial-noise control from selecting or optimizing individual seeds to designing the dependence structure of a multi-sample gallery. This view gives a general framework for initial-noise design, covering several existing methods as special cases and leading naturally to new coupled-noise constructions. Coupled noise can improve generation on its own without adding sampling cost, and it is flexible enough to serve as a structured initialization for optimization-based pipelines when additional computation is available. Empirically, repulsive Gaussian coupling improves gallery diversity on SD1.5, SDXL, and SD3 while largely preserving prompt alignment and image quality. It matches or outperforms recent test-time noise-optimization baselines on several diversity metrics at the same sampling cost as independent generation. Subspace couplings also support fixed-object background generation, producing diverse, natural backgrounds compared with specialized inpainting baselines, with a tunable trade-off in foreground fidelity.

artificial intelligence, coupling, machine learning, (19 more...)

arXiv.org Machine Learning

2605.11311

Country: North America > United States > Michigan (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Supplementary for: " GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization "

Neural Information Processing SystemsApr-25-2026, 12:56:49 GMT

We organize our supplementary document as follows: 1. Results on additional dataset 2. Results for limited data settings on YFCC26k and GWS15k datasets 3. Additional Ablations (a) Gallery Size (b) Queue Length (c) ση for Batch GPS noise (d) ση for Queue GPS noise (e) σ for Random Fourier Features (f) Number of hierarchies (M) 4. Different selection choices for GPSGallery Construction (a) Evenly Spaced GPSCoordinates (b) Test Set GPSCoordinates 5. Analysis of Runtime and Memory Footprint 6. Motivations for using Pretrained CLIP as Image encoder Backbone 7. Qualitative Demonstration (a) Hierarchical learning in our location encoder L () (b) GeoCLIP with Image Query (c) Distribution of correct predictions of GeoCLIP on different datasets (d) GeoCLIP with Text Query 8. Discussion on Ethical Issues and Possible Mitigation In section 4.1 of the main paper, we demonstrated the performance of our GeoCLIP method on Im2GPS3k [2] and GWS15k [1] datasets and compared them with the state-of-the-art methods. Here, we perform experiments on another dataset YFCC26k [6]. The results are provided in Table 1. This result highlights that GeoCLIP performs well across datasets, being useful across different data distributions. GeoCLIP achieves decent performance across datasets even when the training data is significantly reduced. 2 We show the efficacy of GeoCLIP on limited training samples of Im2GPS3k in section 4.2 of the main paper. Now, we further investigate the performance of GeoCLIP for limited data settings on other datasets (YFCC26k and GWS15k).

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia (0.68)
North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

1b57aaddf85ab01a2445a79c9edc1f4b-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 12:56:46 GMT

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.67)
North America > United States (0.28)

Genre: Research Report (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)

Add feedback

1b57aaddf85ab01a2445a79c9edc1f4b-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 13:06:31 GMT

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Poland (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

1b57aaddf85ab01a2445a79c9edc1f4b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 13:06:27 GMT

encoder, geoclip, gps coordinate, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Poland (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(3 more...)

Genre: Research Report (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)

Add feedback

3 things Will Douglas Heaven is into right now

MIT Technology ReviewJan-2-2026, 11:00:00 GMT

MIT Technology Review's senior editor for AI shares what he's been thinking about lately. My daughter introduced me to El Estepario Siberiano's YouTube channel a few months back, and I have been obsessed ever since. The Spanish drummer (real name: Jorge Garrido) posts videos of himself playing supercharged cover versions of popular tracks, hitting his drums with such jaw-dropping speed and technique that he makes other pro drummers shake their heads in disbelief. The dozens of reaction videos posted by other musicians are a joy in themselves. Garrido is up-front about the countless hours that it took to get this good. He says he sat behind his kit almost all day, every day for years.

douglas heaven, mit technology review, share story, (7 more...)

MIT Technology Review

Country:

North America > United States > Massachusetts (0.05)
Europe > United Kingdom (0.05)
Asia > China (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.31)
Health & Medicine > Therapeutic Area > Immunology (0.31)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

From Hubs to Deserts: Urban Cultural Accessibility Patterns with Explainable AI

Pranto, Protik Bose, Islam, Minhazul, Saha, Ripon Kumar, Rivera, Abimelec Mercado, Abbasov, Namig

arXiv.org Artificial IntelligenceNov-12-2025

Cultural infrastructures, such as libraries, museums, theaters, and galleries, support learning, civic life, health, and local economies, yet access is uneven across cities. We present a novel, scalable, and open-data framework to measure spatial equity in cultural access. We map cultural infrastructures and compute a metric called Cultural Infrastructure Accessibility Score (CIAS) using exponential distance decay at fine spatial resolution, then aggregate the score per capita and integrate socio-demographic indicators. Interpretable tree-ensemble models with SHapley Additive exPlanation (SHAP) are used to explain associations between accessibility, income, density, and tract-level racial/ethnic composition. Results show a pronounced core-periphery gradient, where non-library cultural infrastructures cluster near urban cores, while libraries track density and provide broader coverage. Non-library accessibility is modestly higher in higher-income tracts, and library accessibility is slightly higher in denser, lower-income areas.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.07475

Country: North America > United States > New York > Bronx County (0.29)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.94)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Communications > Social Media (0.68)
(2 more...)

Add feedback

$\boldsymbolλ$-Orthogonality Regularization for Compatible Representation Learning

Ricci, Simone, Biondi, Niccolò, Pernici, Federico, Patras, Ioannis, Del Bimbo, Alberto

arXiv.org Artificial IntelligenceOct-22-2025

Retrieval systems rely on representations learned by increasingly powerful models. However, due to the high training cost and inconsistencies in learned representations, there is significant interest in facilitating communication between representations and ensuring compatibility across independently trained neural networks. In the literature, two primary approaches are commonly used to adapt different learned representations: affine transformations, which adapt well to specific distributions but can significantly alter the original representation, and orthogonal transformations, which preserve the original structure with strict geometric constraints but limit adaptability. A key challenge is adapting the latent spaces of updated models to align with those of previous models on downstream distributions while preserving the newly learned representation spaces. In this paper, we impose a relaxed orthogonality constraint, namely $λ$-Orthogonality regularization, while learning an affine transformation, to obtain distribution-specific adaptation while retaining the original learned representations. Extensive experiments across various architectures and datasets validate our approach, demonstrating that it preserves the model's zero-shot performance and ensures compatibility across model updates. Code available at: \href{https://github.com/miccunifi/lambda_orthogonality.git}{https://github.com/miccunifi/lambda\_orthogonality}.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.16664

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation

Pan, Zhenyu, Lu, Yucheng, Liu, Han

arXiv.org Artificial IntelligenceOct-7-2025

We present MetaFind, a scene-aware tri-modal compositional retrieval framework designed to enhance scene generation in the metaverse by retrieving 3D assets from large-scale repositories. MetaFind addresses two core challenges: (i) inconsistent asset retrieval that overlooks spatial, semantic, and stylistic constraints, and (ii) the absence of a standardized retrieval paradigm specifically tailored for 3D asset retrieval, as existing approaches mainly rely on general-purpose 3D shape representation models. Our key innovation is a flexible retrieval mechanism that supports arbitrary combinations of text, image, and 3D modalities as queries, enhancing spatial reasoning and style consistency by jointly modeling object-level features (including appearance) and scene-level layout structures. Methodologically, MetaFind introduces a plug-and-play equivariant layout encoder ESSGNN that captures spatial relationships and object appearance features, ensuring retrieved 3D assets are contextually and stylistically coherent with the existing scene, regardless of coordinate frame transformations. The framework supports iterative scene construction by continuously adapting retrieval results to current scene updates. Empirical evaluations demonstrate the improved spatial and stylistic consistency of MetaFind in various retrieval tasks compared to baseline methods.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.04057

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

Audio Geolocation: A Natural Sounds Benchmark

Chasmai, Mustafa, Liu, Wuao, Maji, Subhransu, Van Horn, Grant

arXiv.org Artificial IntelligenceJul-23-2025

Can we determine someone's geographic location purely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? We tackle the challenge of global-scale audio geolocation, formalize the problem, and conduct an in-depth analysis with wildlife audio from the iNatSounds dataset. Adopting a vision-inspired approach, we convert audio recordings to spectrograms and benchmark existing image geolocation techniques. We hypothesize that species vocalizations offer strong geolocation cues due to their defined geographic ranges and propose an approach that integrates species range prediction with retrieval-based geolocation. We further evaluate whether geolocation improves when analyzing species-rich recordings or when aggregating across spatiotemporal neighborhoods. Finally, we introduce case studies from movies to explore multimodal geolocation using both audio and visual content. Our work highlights the advantages of integrating audio and visual cues, and sets the stage for future research in audio geolocation.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2505.18726

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Communications > Social Media (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback