AITopics | flux

Collaborating Authors

flux

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion Models

Bradbury, Rowan, Zhong, Dazhi

arXiv.org Artificial IntelligenceDec-8-2025

Latent inpainting in diffusion models still relies almost universally on linearly interpolating VAE latents under a downsampled mask. We propose a key principle for compositing image latents: Pixel-Equivalent Latent Compositing (PELC). An equivalent latent compositor should be the same as compositing in pixel space. This principle enables full-resolution mask control and true soft-edge alpha compositing, even though VAEs compress images 8x spatially. Modern VAEs capture global context beyond patch-aligned local structure, so linear latent blending cannot be pixel-equivalent: it produces large artifacts at mask seams and global degradation and color shifts. We introduce DecFormer, a 7.7M-parameter transformer that predicts per-channel blend weights and an off-manifold residual correction to realize mask-consistent latent fusion. DecFormer is trained so that decoding after fusion matches pixel-space alpha compositing, is plug-compatible with existing diffusion pipelines, requires no backbone finetuning and adds only 0.07% of FLUX.1-Dev's parameters and 3.5% FLOP overhead. On the FLUX.1 family, DecFormer restores global color consistency, soft-mask support, sharp boundaries, and high-fidelity masking, reducing error metrics around edges by up to 53% over standard mask interpolation. Used as an inpainting prior, a lightweight LoRA on FLUX.1-Dev with DecFormer achieves fidelity comparable to FLUX.1-Fill, a fully finetuned inpainting model. While we focus on inpainting, PELC is a general recipe for pixel-equivalent latent editing, as we demonstrate on a complex color-correction task.

artificial intelligence, latent, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.05198

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models

Wu, Zihao

arXiv.org Artificial IntelligenceDec-8-2025

Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present InvarDiff, a training-free acceleration method that exploits the relative temporal invariance across timestep-scale and layer-scale. From a few deterministic runs, we compute a per-timestep, per-layer, per-module binary cache plan matrix and use a re-sampling correction to avoid drift when consecutive caches occur. Using quantile-based change metrics, this matrix specifies which module at which step is reused rather than recomputed. The same invariance criterion is applied at the step scale to enable cross-timestep caching, deciding whether an entire step can reuse cached results. During inference, InvarDiff performs step-first and layer-wise caching guided by this matrix. When applied to DiT and FLUX, our approach reduces redundant compute while preserving fidelity. Experiments show that InvarDiff achieves $2$-$3\times$ end-to-end speed-ups with minimal impact on standard quality metrics. Qualitatively, we observe almost no degradation in visual quality compared with full computations.

artificial intelligence, machine learning, threshold, (17 more...)

arXiv.org Artificial Intelligence

2512.05134

Genre:

Workflow (0.69)
Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation

Zhang, Ruoxuan, Wen, Bin, Xie, Hongxia, Yao, Yi, Zuo, Songhan, Jiang-Lin, Jian-Yu, Shuai, Hong-Han, Cheng, Wen-Huang

arXiv.org Artificial IntelligenceDec-8-2025

Cooking is a sequential and visually grounded activity, where each step such as chopping, mixing, or frying carries both procedural logic and visual semantics. While recent diffusion models have shown strong capabilities in text-to-image generation, they struggle to handle structured multi-step scenarios like recipe illustration. Additionally, current recipe illustration methods are unable to adjust to the natural variability in recipe length, generating a fixed number of images regardless of the actual instructions structure. To address these limitations, we present CookAnything, a flexible and consistent diffusion-based framework that generates coherent, semantically distinct image sequences from textual cooking instructions of arbitrary length. The framework introduces three key components: (1) Step-wise Regional Control (SRC), which aligns textual steps with corresponding image regions within a single denoising process; (2) Flexible RoPE, a step-aware positional encoding mechanism that enhances both temporal coherence and spatial diversity; and (3) Cross-Step Consistency Control (CSCC), which maintains fine-grained ingredient consistency across steps. Experimental results on recipe illustration benchmarks show that CookAnything performs better than existing methods in training-based and training-free settings. The proposed framework supports scalable, high-quality visual synthesis of complex multi-step instructions and holds significant potential for broad applications in instructional media, and procedural content creation.

artificial intelligence, machine learning, proceedings, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3746027.3755174

2512.0354

Country: Asia > China (0.28)

Genre: Workflow (0.98)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Consumer Health (0.46)
Education > Educational Technology > Audio & Video (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

BioPro: On Difference-Aware Gender Fairness for Vision-Language Models

Lin, Yujie, Ma, Jiayao, Hu, Qingguo, Wong, Derek F., Su, Jinsong

arXiv.org Artificial IntelligenceDec-2-2025

Vision-Language Models (VLMs) inherit significant social biases from their training data, notably in gender representation. Current fairness interventions often adopt a difference-unaware perspective that enforces uniform treatment across demographic groups. These approaches, however, fail to distinguish between contexts where neutrality is required and those where group-specific attributes are legitimate and must be preserved. Building upon recent advances in difference-aware fairness for text-only models, we extend this concept to the multimodal domain and formalize the problem of difference-aware gender fairness for image captioning and text-to-image generation. We advocate for selective debiasing, which aims to mitigate unwanted bias in neutral contexts while preserving valid distinctions in explicit ones. To achieve this, we propose BioPro (Bias Orthogonal Projection), an entirely training-free framework. BioPro identifies a low-dimensional gender-variation subspace through counterfactual embeddings and applies projection to selectively neutralize gender-related information. Experiments show that BioPro effectively reduces gender bias in neutral cases while maintaining gender faithfulness in explicit ones, thus providing a promising direction toward achieving selective fairness in VLMs. Beyond gender bias, we further demonstrate that BioPro can effectively generalize to continuous bias variables, such as scene brightness, highlighting its broader applicability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.00807

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ToMA: Token Merge with Attention for Diffusion Models

Lu, Wenbo, Zheng, Shaoyi, Xia, Yuxuan, Wang, Shengjie

arXiv.org Artificial IntelligenceDec-2-2025

Diffusion models excel in high-fidelity image generation but face scalability limits due to transformers' quadratic attention complexity. Plug-and-play token reduction methods like ToMeSD and ToFu reduce FLOPs by merging redundant tokens in generated images but rely on GPU-inefficient operations (e.g., sorting, scattered writes), introducing overheads that negate theoretical speedups when paired with optimized attention implementations (e.g., FlashAttention). To bridge this gap, we propose Token Merge with Attention (ToMA), an off-the-shelf method that redesigns token reduction for GPU-aligned efficiency, with three key contributions: 1) a reformulation of token merge as a submodular optimization problem to select diverse tokens; 2) merge/unmerge as an attention-like linear transformation via GPU-friendly matrix operations; and 3) exploiting latent locality and sequential redundancy (pattern reuse) to minimize overhead. ToMA reduces SDXL/Flux generation latency by 24%/23%, respectively (with DINO $Δ< 0.07$), outperforming prior methods. This work bridges the gap between theoretical and practical efficiency for transformers in diffusion. Code available at https://github.com/WenboLuu/ToMA.

artificial intelligence, machine learning, toma, (19 more...)

arXiv.org Artificial Intelligence

2509.10918

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

Modeling X-ray photon pile-up with a normalizing flow

König, Ole, Huppenkothen, Daniela, Finkbeiner, Douglas, Kirsch, Christian, Wilms, Jörn, Yang, Justina R., Steiner, James F., Martínez-Galarza, Juan Rafael

arXiv.org Artificial IntelligenceNov-18-2025

The dynamic range of imaging detectors flown on-board X-ray observatories often only covers a limited flux range of extrasolar X-ray sources. The analysis of bright X-ray sources is complicated by so-called pile-up, which results from high incident photon flux. This nonlinear effect distorts the measured spectrum, resulting in biases in the inferred physical parameters, and can even lead to a complete signal loss in extreme cases. Piled-up data are commonly discarded due to resulting intractability of the likelihood. As a result, a large number of archival observations remain underexplored. We present a machine learning solution to this problem, using a simulation-based inference framework that allows us to estimate posterior distributions of physical source parameters from piled-up eROSITA data. We show that a normalizing flow produces better-constrained posterior densities than traditional mitigation techniques, as more data can be leveraged. We consider model- and calibration-dependent uncertainties and the applicability of such an algorithm to real data in the eROSITA archive.

artificial intelligence, machine learning, spectra, (14 more...)

arXiv.org Artificial Intelligence

2511.11863

Country: Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

FSampler: Training Free Acceleration of Diffusion Sampling via Epsilon Extrapolation

Vladimir, Michael A.

arXiv.org Artificial IntelligenceNov-13-2025

FSampler is a training free, sampler agnostic execution layer that accelerates diffusion sampling by reducing the number of function evaluations (NFE). FSampler maintains a short history of denoising signals (epsilon) from recent real model calls and extrapolates the next epsilon using finite difference predictors at second order, third order, or fourth order, falling back to lower order when history is insufficient. On selected steps the predicted epsilon substitutes the model call while keeping each sampler's update rule unchanged. Predicted epsilons are validated for finiteness and magnitude; a learning stabilizer rescales predictions on skipped steps to correct drift, and an optional gradient estimation stabilizer compensates local curvature. Protected windows, periodic anchors, and a cap on consecutive skips bound deviation over the trajectory. Operating at the sampler level, FSampler integrates with Euler/DDIM, DPM++ 2M/2S, LMS/AB2, and RES family exponential multistep methods and drops into standard workflows. FLUX.1 dev, Qwen Image, and Wan 2.2, FSampler reduces time by 8 to 22% and model calls by 15 to 25% at high fidelity (Structural Similarity Index (SSIM) 0.95 to 0.99), without altering sampler formulas. With an aggressive adaptive gate, reductions can reach 45 to 50% fewer model calls at lower fidelity (SSIM 0.73 to 0.74).

artificial intelligence, machine learning, sampler, (18 more...)

arXiv.org Artificial Intelligence

2511.0918

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Realizable Circuit Complexity: Embedding Computation in Space-Time

Prada, Benjamin, Mali, Ankur

arXiv.org Artificial IntelligenceNov-11-2025

Classical circuit complexity characterizes parallel computation in purely combinatorial terms, ignoring the physical constraints that govern real hardware. The standard classes $\mathbf{NC}$, $\mathbf{AC}$, and $\mathbf{TC}$ treat unlimited fan-in, free interconnection, and polynomial gate counts as feasible -- assumptions that conflict with geometric, energetic, and thermodynamic realities. We introduce the family of realizable circuit classes $\mathbf{RC}_d$, which model computation embedded in physical $d$-dimensional space. Each circuit in $\mathbf{RC}_d$ obeys conservative realizability laws: volume scales as $\mathcal{O}(t^d)$, cross-boundary information flux is bounded by $\mathcal{O}(t^{d-1})$ per unit time, and growth occurs through local, physically constructible edits. These bounds apply to all causal systems, classical or quantum. Within this framework, we show that algorithms with runtime $ω(n^{d/(d-1)})$ cannot scale to inputs of maximal entropy, and that any $d$-dimensional parallel implementation offers at most a polynomial speed-up of degree $(d-1)$ over its optimal sequential counterpart. In the limit $d\to\infty$, $\mathbf{RC}_\infty(\mathrm{polylog})=\mathbf{NC}$, recovering classical parallelism as a non-physical idealization. By unifying geometry, causality, and information flow, $\mathbf{RC}_d$ extends circuit complexity into the physical domain, revealing universal scaling laws for computation.

artificial intelligence, computation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.19161

Country: North America > United States > Florida (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.46)

Add feedback

Machine Learning for Electron-Scale Turbulence Modeling in W7-X

Farcas, Ionut-Gabriel, Fernando, Don Lawrence Carl Agapito, Navarro, Alejandro Banon, Merlo, Gabriele, Jenko, Frank

arXiv.org Artificial IntelligenceNov-7-2025

Constructing reduced models for turbulent transport is essential for accelerating profile predictions and enabling many-query tasks such as uncertainty quantification, parameter scans, and design optimization. This paper presents machine-learning-driven reduced models for Electron Temperature Gradient (ETG) turbulence in the Wendelstein 7-X (W7-X) stellarator. Each model predicts the ETG heat flux as a function of three plasma parameters: the normalized electron temperature radial gradient ($ω_{T_e}$), the ratio of normalized electron temperature and density radial gradients ($η_e$), and the electron-to-ion temperature ratio ($τ$). We first construct models across seven radial locations using regression and an active machine-learning-based procedure. This process initializes models using low-cardinality sparse-grid training data and then iteratively refines their training sets by selecting the most informative points from a pre-existing simulation database. We evaluate the prediction capabilities of our models using out-of-sample datasets with over $393$ points per location, and $95\%$ prediction intervals are estimated via bootstrapping to assess prediction uncertainty. We then investigate the construction of generalized reduced models, including a generic, position-independent model, and assess their heat flux prediction capabilities at three additional locations. Our models demonstrate robust performance and predictive accuracy comparable to the original reference simulations, even when applied beyond the training domain.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2511.04567

Country:

Europe (0.68)
North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)

Add feedback

Accelerating Radiative Transfer for Planetary Atmospheres by Orders of Magnitude with a Transformer-Based Machine Learning Model

Malsky, Isaac, Kataria, Tiffany, Batalha, Natasha E., Graham, Matthew

arXiv.org Artificial IntelligenceNov-3-2025

Submitted to ApJ ABSTRACT Radiative transfer calculations are essential for modeling planetary atmospheres. However, standard methods are computationally demanding and impose accuracy-speed trade-offs. High computational costs force numerical simplifications in large models (e.g., General Circulation Models) that degrade the accuracy of the simulation. Radiative transfer calculations are an ideal candidate for machine learning emulation: fundamentally, it is a well-defined physical mapping from a static atmospheric profile to the resulting fluxes, and high-fidelity training data can be created from first principles calculations. We developed a radiative transfer emulator using an encoder-only transformer neural network architecture, trained on 1D profiles representative of solar-composition hot Jupiter atmospheres. Our emulator reproduced bolometric two-stream layer fluxes with mean test set errors of 1% compared to the traditional method and achieved speedups of more than 100x. Emulating radiative transfer with machine learning opens up the possibility for faster and more accurate routines within planetary atmospheric models such as GCMs. INTRODUCTION At the heart of almost every computational model of an exoplanet atmosphere lies a radiative transfer routine that determines how radiation is scattered, absorbed, and emitted as it propagates through the atmosphere. These methods are computationally demanding, as they require solutions to integro-differential equations in many distinct wavelength bins.

artificial intelligence, flux, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.2705

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry:

Government > Space Agency (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback