img
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > United Kingdom > England > Staffordshire (0.04)
- Europe > Greece (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- North America > United States > California > Riverside County > Riverside (0.04)
- Asia (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.68)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
ExpandNets: LinearOver-parameterization toTrainCompactConvolutionalNetworks-SupplementaryMaterial-AComplementaryExperiments
However,withdeep networks, initialization can have an important effect on the final results. While designing an initialization strategy specifically for compact networks is an unexplored research direction, our ExpandNets can be initialized in a natural manner. Note that this strategy yields an additional accuracy boost to our approach. Theoutput ofthelastlayer ispassed through afully-connected layer with 64 units, followed by a logit layer with either 10 or 100 units. Weusedstandard stochastic gradient descent (SGD) withamomentum of0.9 and a learning rate of0.01, divided by10 at epochs 50 and 100.
Safety-Efficacy Trade Off: Robustness against Data-Poisoning
Backdoor and data poisoning attacks can achieve high attack success while evading existing spectral and optimisation based defences. We show that this behaviour is not incidental, but arises from a fundamental geometric mechanism in input space. Using kernel ridge regression as an exact model of wide neural networks, we prove that clustered dirty label poisons induce a rank one spike in the input Hessian whose magnitude scales quadratically with attack efficacy. Crucially, for nonlinear kernels we identify a near clone regime in which poison efficacy remains order one while the induced input curvature vanishes, making the attack provably spectrally undetectable. We further show that input gradient regularisation contracts poison aligned Fisher and Hessian eigenmodes under gradient flow, yielding an explicit and unavoidable safety efficacy trade off by reducing data fitting capacity. For exponential kernels, this defence admits a precise interpretation as an anisotropic high pass filter that increases the effective length scale and suppresses near clone poisons. Extensive experiments on linear models and deep convolutional networks across MNIST and CIFAR 10 and CIFAR 100 validate the theory, demonstrating consistent lags between attack success and spectral visibility, and showing that regularisation and data augmentation jointly suppress poisoning. Our results establish when backdoors are inherently invisible, and provide the first end to end characterisation of poisoning, detectability, and defence through input space curvature.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows
Nguyen, John, Havasi, Marton, Berrada, Tariq, Zettlemoyer, Luke, Chen, Ricky T. Q.
We present OneFlow, the first non-autoregressive multimodal model that enables variable-length and concurrent mixed-modal generation. Unlike autoregressive models that enforce rigid causal ordering between text and image generation, OneFlow combines an insertion-based Edit Flow for discrete text tokens with Flow Matching for image latents. OneFlow enables concurrent text-image synthesis with hierarchical sampling that prioritizes content over grammar. Through controlled experiments across model sizes from 1B to 8B, we demonstrate that OneFlow outperforms autoregressive baselines on both generation and understanding tasks while using up to 50% fewer training FLOPs. OneFlow surpasses both autoregressive and diffusion-based approaches while unlocking new capabilities for concurrent generation, iterative refinement, and natural reasoning-like generation.
Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings
Akbarian, Fatemeh, Baninajjar, Anahita, Zhang, Yingyi, Balashankar, Ananth, Aminifar, Amir
Abstract--Multi-modal foundation models align images, text, and other modalities in a shared embedding space but remain vulnerable to adversarial illusions [35], where imperceptible perturbations disrupt cross-modal alignment and mislead downstream tasks. T o counteract the effects of adversarial illusions, we propose a task-agnostic mitigation mechanism that reconstructs the input from the attacker's perturbed input through generative models, e.g., V ariational Autoencoders (V AEs), to maintain natural alignment. T o further enhance our proposed defense mechanism, we adopt a generative sampling strategy combined with a consensus-based aggregation scheme over the outcomes of the generated samples. Our experiments on the state-of-the-art multi-modal encoders show that our approach substantially reduces the illusion attack success rates to near-zero and improves cross-modal alignment by 4% (42 46) and 11% (32 43) in unperturbed and perturbed input settings respectively, providing an effective and model-agnostic defense against adversarial illusions. Multi-modal foundation models have rapidly advanced the frontier of visual and linguistic understanding. Foundation models such as CLIP [19], ALIGN [11], and ImageBind [8] align a variety of heterogeneous modalities including images, text, and other modalities within a shared embedding space, thereby enabling zero-shot classification, cross-modal retrieval, and generative conditioning. The shared embedding space that underpins cross-modal flexibility simultaneously introduces a new attack surface, giving rise to adversarial illusions [35]. As downstream tasks directly rely on the integrity of this shared representation, even small perturbations in one modality can induce semantic misalignment across others, misleading models that depend on the embedding for retrieval, captioning, or generative conditioning. Defending against such cross-modal attacks presents unique challenges.
- North America > United States (0.04)
- Europe > Sweden (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
CoMind: Towards Community-Driven Agents for Machine Learning Engineering
Li, Sijie, Sun, Weiwei, Li, Shanda, Talwalkar, Ameet, Yang, Yiming
Large language model (LLM) agents show promise in automating machine learning (ML) engineering. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, an multi-agent system designed to actively integrate external knowledge. CoMind employs an iterative parallel exploration mechanism, developing multiple solutions simultaneously to balance exploratory breadth with implementation depth. On 75 past Kaggle competitions within our MLE-Live framework, CoMind achieves a 36% medal rate, establishing a new state of the art. Critically, when deployed in eight live, ongoing competitions, CoMind outperforms 92.6% of human competitors on average, placing in the top 5% on three official leaderboards and the top 1% on one.
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- Africa > Cameroon > Gulf of Guinea (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Education (0.67)