Technology
Conformal Risk Training: End-to-End Optimization of Conformal Risk Control
While deep learning models often achieve high predictive accuracy, their predictions typically do not come with any provable guarantees on risk or reliability, which are critical for deployment in high-stakes applications. The framework of conformal risk control (CRC) provides a distribution-free, finite-sample method for controlling the expected value of any bounded monotone loss function and can be conveniently applied post-hoc to any pre-trained deep learning model. However, many realworld applications are sensitive to tail risks, as opposed to just expected loss. In this work, we develop a method for controlling the general class of Optimized CertaintyEquivalent (OCE) risks, a broad class of risk measures which includes as special cases the expected loss (generalizing the original CRC method) and common tail risks like the conditional value-at-risk (CVaR).
Collapse and simplex ETF
Neural collapse [26] is an intuitive observation that happens at the terminal phase of a well-trained model on a balanced dataset that last-layer features converge to within-class mean, and all within-class means and their corresponding classifier vectors converge to ETF as shown in Figure 6. The main results can be concluded as follows: (NC1) Variability of the last-layer features ฮฃ:= Avgi,c{(hic hc)(hic hc)T} collapse within-class: ฮฃ 0, where hic is the last-layer feature of the i-th sample in the c-th class, and hc is the within-class mean of c-th class's features. Last-layer features converge to within-class mean, and all within-class means and their corresponding classifier vectors converge to a simplex ETF. To analyze this phenomenon, some studies simplify deep neural networks as last-layer features and classifier (layer-peeled model)[9, 12, 40, 53] with proper constraints or regularizations. In the view of layer-peeled model (LPM), training W with constraints on the weights can be seen as training the C-class classification head WL = {W1,...,WC} and features H = {h1,...,hN} of all n samples output by last layer of backbone with constraints EW and EH respectively. EH. (6) In the balanced dataset, as described in Lemma 1, any solutions to this model merge neural collapse and form a simplex equiangular tight frame (ETF), which means ETF is optimal classifier in the balanced case of LPM.
Cross City Traffic Flow Generation via Retrieval Augmented Diffusion Model
Traffic flow data are of great value in smart city applications. However, limited by data collection costs and privacy sensitivity, it is rather difficult to obtain large-scale traffic flow data. Therefore, various data generation methods have been proposed in the literature. Nevertheless, these methods often require data from a specific city for training and are difficult to directly apply to new cities lacking data. To address this problem, this paper proposes a retrieval-augmented diffusion generation model with geographic representation alignment. We use data from multiple source cities for training, extract consistent representations across multiple cities, and leverage retrieval-augmented generation (RAG) technology to incorporate dynamic traffic flow patterns into the condition, aiming to improve the accuracy of data generation in the target city. Experiments on four real-world datasets demonstrate that, compared to existing generation methods, our method achieves best cross-city zero-shot performance.
Bit-swapping Oriented Twin-memory Multi-view Clustering in Lifelong Incomplete Scenarios
Although receiving notable improvements, current multi-view clustering (MVC) techniques generally rely on feature library mechanisms to propagate accumulated knowledge from historical views to newly-arrived data, which overlooks the information pertaining to basis embedding within each view. Moreover, the mapping paradigm inevitably alters the values of learned landmarks and built affinities due to the uninterruption nature, accordingly disarraying the hierarchical cluster structures. To mitigate these two issues, we in the paper provide a named BSTM algorithm. Concretely, we firstly synchronize with the distinct dimensions by introducing a group of specialized projectors, and then establish unified anchors for all views collected so far to capture intrinsic patterns. Afterwards, departing from per-view architectures, we devise a shared bipartite graph construction via indicators to quantify similarity, which not only avoids redundant data-recalculations but alleviates the representation distortion caused by fusion.
Fast Projection-Free Approach (without Optimization Oracle) for Optimization over Compact Convex Set
Projection-free first-order methods, e.g., the celebrated Frank-Wolfe (FW) algorithms, have emerged as powerful tools for optimization over simple convex sets such as polyhedra, because of their scalability, fast convergence, and iteration-wise feasibility without costly projections. However, extending these methods effectively to general compact convex sets remains challenging and largely open, as FW methods rely on expensive linear optimization oracles (LOO), while penalty-based methods often struggle with poor feasibility. We tackle this open challenge by presenting Hom-PGD, a novel projection-free method without expensive (optimization) oracles. Our method constructs a homeomorphism between the convex constraint set and a unit ball, transforming the original problem into an equivalent ball-constrained formulation, thus enabling efficient gradient-based optimization while preserving the original problem structure. We prove that Hom-PGD attains optimal convergence rates matching gradient descent with constant step-size to find an ฯต-approximate (stationary) solution: O(log(1/ฯต))for strongly convex objectives, O(ฯต 1) for convex objectives, and O(ฯต 2) for non-convex objectives. Meanwhile, Hom-PGD enjoys a low per-iteration complexity of O(n2), without expensive oracles like LOO or projection, where nis the input size. Our framework further extends to certain non-convex sets, broadening its applicability in practical optimization scenarios with complex constraints. Extensive numerical experiments demonstrate that Hom-PGD achieves comparable convergence rates to state-of-theart projection-free methods, while significantly reducing per-iteration runtime (up to 5 orders of magnitude faster) and thus the total problem-solving time.
Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via In-the-wild Cascading Flow Optimization
Adversarial attacks are widely used to evaluate model robustness, and in black-box scenarios, the transferability of these attacks becomes crucial. Existing generatorbased attacks have excellent generalization and transferability due to their instanceagnostic nature. However, when training generators for multi-target tasks, the success rate of transfer attacks is relatively low due to the limitations of the model's capacity. To address these challenges, we propose a novel Dual-Flow framework for multi-target instance-agnostic adversarial attacks, utilizing Cascading Distribution Shift Training to develop an adversarial velocity function. Extensive experiments demonstrate that Dual-Flow significantly improves transferability over previous multi-target generative attacks. For example, it increases the success rate from Inception-v3 to ResNet-152 by 34.58%. Furthermore, our attack method shows substantially stronger robustness against defense mechanisms, such as adversarially trained models. The code of Dual-Flow is available at: https://github.com/Chyxx/Dual-Flow.
Direct Alignment with Heterogeneous Preferences
Alignment with human preferences is commonly framed using a universal reward function, even though human preferences are inherently heterogeneous. We formalize this heterogeneity by introducing user types and examine the limits of the homogeneity assumption. We show that aligning to heterogeneous preferences with a single policy is best achieved using the average reward across user types. However, this requires additional information about annotators. We examine improvements under different information settings, focusing on direct alignment methods. We find that minimal information can yield first-order improvements, while full feedback from each user type leads to consistent learning of the optimal policy. Surprisingly, however, no sample-efficient consistent direct loss exists in this latter setting. These results reveal a fundamental tension between consistency and sample efficiency in direct policy alignment.
Scalable Cross-View Sample Alignment for Multi-View Clustering with View Structure Similarity
Most existing multi-view clustering methods aim to generate a consensus partition across all views, based on the assumption that all views share the same sample arrangement. However, in real-world scenarios, the collected data across different views is often unsynchronized, making it difficult to ensure consistent sample correspondence between views. To address this issue, we propose a scalable sample-alignment-based multi-view clustering method, referred to as SSA-MVC. Specifically, we first employ a cluster-label matching (CLM) algorithm to select the view whose clustering labels best match those of the others as the benchmark view. Then, for each of the remaining views, we construct representations of nonaligned samples by computing their similarities with aligned samples. Based on these representations, we build a similarity graph between the non-aligned samples of each view and those in the benchmark view, which serves as the alignment criterion. This alignment criterion is then integrated into a late-fusion framework to enable clustering without requiring aligned samples. Notably, the learned sample alignment matrix can be used to enhance existing multi-view clustering methods in scenarios where sample correspondence is unavailable. The effectiveness of the proposed SSA-MVC algorithm is validated through extensive experiments conducted on eight real-world multi-view datasets.
RiboFlow: Conditional De Novo RNACo-Design via Synergistic Flow Matching
Ribonucleic acid (RNA) binds to molecules to achieve specific biological functions. While generative models are advancing biomolecule design, existing methods for designing RNA that target specific ligands face limitations in capturing RNA's conformational flexibility, ensuring structural validity, and overcoming data scarcity. To address these challenges, we introduce RiboFlow, a synergistic flow matching model to co-design RNA structures and sequences based on target molecules. By integrating RNA backbone frames, torsion angles, and sequence features in an unified architecture, RiboFlow explicitly models RNA's dynamic conformations while enforcing sequence-structure consistency to improve validity. Additionally, we curate RiboBind, a large-scale dataset of RNA-molecule interactions, to resolve the scarcity of high-quality structural data. Extensive experiments reveal that RiboFlow not only outperforms state-of-the-art RNA design methods by a large margin but also showcases controllable capabilities for achieving high binding affinity to target ligands.
Reproducing Kernel Banach Space Models for Neural Networks with Application to Rademacher Complexity Analysis
This paper explores the use of Hermite transform based reproducing kernel Banach space methods to construct exact or un-approximated models of feedforward neural networks of arbitrary width, depth and topology, including ResNet and Transformers networks, assuming only a feedforward topology, finite energy activations and finite (spectral-) norm weights and biases. Using this model, two straightforward but surprisingly tight bounds on Rademacher complexity are derived, precisely (1) a general bound that is width-independent and scales exponentially with depth; and (2) a width-and depth-independent bound for networks with appropriately constrained (below threshold) weights and biases.