mig
Alignment Unlocks Complementarity: A Framework for Multiview Circuit Representation Learning
Shi, Zhengyuan, Wang, Jingxin, Jiang, Wentao, Ma, Chengyu, Zheng, Ziyang, Chu, Zhufei, Qian, Weikang, Xu, Qiang
Multiview learning on Boolean circuits holds immense promise, as different graph-based representations offer complementary structural and semantic information. However, the vast structural heterogeneity between views--such as an And-Inverter Graph (AIG) versus an XOR-Majority Graph (XMG)--poses a critical barrier to effective fusion, especially for self-supervised techniques like masked modeling. Naively applying such methods fails, as the cross-view context is perceived as noise. Our key insight is that functional alignment is a necessary precondition to unlock the power of multiview self-supervision. We introduce MixGate, a framework built on a principled training curriculum that first teaches the model a shared, function-aware representation space via an Equivalence Alignment Loss. Only then do we introduce a multiview masked modeling objective, which can now leverage the aligned views as a rich, complementary signal. Extensive experiments, including a crucial ablation study, demonstrate that our alignment-first strategy transforms masked modeling from an ineffective technique into a powerful performance driver. Multiview learning on Boolean circuits holds immense promise, as different graph-based representations offer complementary structural and semantic insights. While an And-Inverter Graph (AIG) provides a detailed structural view, a format like an XOR-Majority Graph (XMG) offers a semantically richer, high-level abstraction. This multiview approach has shown remarkable empirical success, surpassing earlier models that relied on single representations Li et al. (2022); Wang et al. (2022); Wu et al. (2023); Shi et al. (2023); Deng et al. (2024); Wang et al. (2024). The key challenge, however, arises from the vast structural heterogeneity between these views.
Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution
Zaher, Eslam, Trzaskowski, Maciej, Nguyen, Quan, Roosta, Fred
In this paper, we dive into the reliability concerns of Integrated Gradients (IG), a prevalent feature attribution method for black-box deep learning models. We particularly address two predominant challenges associated with IG: the generation of noisy feature visualizations for vision models and the vulnerability to adversarial attributional attacks. Our approach involves an adaptation of path-based feature attribution, aligning the path of attribution more closely to the intrinsic geometry of the data manifold. Our experiments utilise deep generative models applied to several real-world image datasets. They demonstrate that IG along the geodesics conforms to the curved geometry of the Riemannian data manifold, generating more perceptually intuitive explanations and, subsequently, substantially increasing robustness to targeted attributional attacks.
Learning Disentangled Discrete Representations
Friede, David, Reimers, Christian, Stuckenschmidt, Heiner, Niepert, Mathias
Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions mitigates the problem of rotational invariance associated with multivariate Gaussian distributions, acting as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations.
An Analysis of Collocation on GPUs for Deep Learning Training
Robroek, Ties, Yousefzadeh-Asl-Miandoab, Ehsan, Tözün, Pınar
Deep learning training is an expensive process that extensively uses GPUs, but not all model training saturates modern powerful GPUs. Multi-Instance GPU (MIG) is a new technology introduced by NVIDIA that can partition a GPU to better-fit workloads that do not require all the memory and compute resources of a full GPU. In this paper, we examine the performance of a MIG-enabled A100 GPU under deep learning workloads containing various sizes and combinations of models. We contrast the benefits of MIG to older workload collocation methods on GPUs: na\"ively submitting multiple processes on the same GPU and utilizing Multi-Process Service (MPS). Our results demonstrate that collocating multiple model training runs may yield significant benefits. In certain cases, it can lead up to four times training throughput despite increased epoch time. On the other hand, the aggregate memory footprint and compute needs of the models trained in parallel must fit the available memory and compute resources of the GPU. MIG can be beneficial thanks to its interference-free partitioning, especially when the sizes of the models align with the MIG partitioning options. MIG's rigid partitioning, however, may create sub-optimal GPU utilization for more dynamic mixed workloads. In general, we recommend MPS as the best performing and most flexible form of collocation for model training for a single user submitting training jobs.
NVIDIA Crushes Latest Artificial Intelligence Benchmarking Tests
In its third round of submissions, MLCommons released results for MLPerf Inference v1.0. MLPerf is a set of standard AI inference benchmarking tests using seven different applications. These seven tests include a range of workloads that include computer vision, medical imaging, recommender systems, speech recognition, and natural language processing. MLPerf benchmarking measures how fast a trained neural network can process data for each application and its form factor. The results allow unbiased comparison between systems.
Robust Disentanglement of a Few Factors at a Time
Estermann, Benjamin, Marks, Markus, Yanik, Mehmet Fatih
Disentanglement is at the forefront of unsupervised learning, as disentangled representations of data improve generalization, interpretability, and performance in downstream tasks. Current unsupervised approaches remain inapplicable for real-world datasets since they are highly variable in their performance and fail to reach levels of disentanglement of (semi-)supervised approaches. We introduce population-based training (PBT) for improving consistency in training variational autoencoders (VAEs) and demonstrate the validity of this approach in a supervised setting (PBT-VAE). We then use Unsupervised Disentanglement Ranking (UDR) as an unsupervised heuristic to score models in our PBT-VAE training and show how models trained this way tend to consistently disentangle only a subset of the generative factors. Building on top of this observation we introduce the recursive rPU-VAE approach. We train the model until convergence, remove the learned factors from the dataset and reiterate. In doing so, we can label subsets of the dataset with the learned factors and consecutively use these labels to train one model that fully disentangles the whole dataset. With this approach, we show striking improvement in state-of-the-art unsupervised disentanglement performance and robustness across multiple datasets and metrics.
Disentangling Factors of Variation Using Few Labels
Locatello, Francesco, Tschannen, Michael, Bauer, Stefan, Rätsch, Gunnar, Schölkopf, Bernhard, Bachem, Olivier
Learning disentangled representations is considered a cornerstone problem in representation learning. Recently, Locatello et al. (2019) demonstrated that unsupervised disentanglement learning without inductive biases is theoretically impossible and that existing inductive biases and unsupervised methods do not allow to consistently learn disentangled representations. However, in many practical settings, one might have access to a very limited amount of supervision, for example through manual labeling of training examples. In this paper, we investigate the impact of such supervision on state-of-the-art disentanglement methods and perform a large scale study, training over 29 000 models under well-defined and reproducible experimental conditions. We first observe that a very limited number of labeled examples (0.01-0.5% of the data set) is sufficient to perform model selection on state-of-the-art unsupervised models. Yet, if one has access to labels for supervised model selection, this raises the natural question of whether they should also be incorporated into the training process. As a case-study, we test the benefit of introducing (very limited) supervision into existing state-of-the-art unsupervised disentanglement methods exploiting both the values of the labels and the ordinal information that can be deduced from them. Overall, we empirically validate that with very little and potentially imprecise supervision it is possible to reliably learn disentangled representations.
Human Rights and Artificial Intelligence Forum
The Montreal Institute for Genocide and Human Rights Studies (MIGS) is organizing the Human Rights and Artificial Intelligence Forum on April 5. The event will take place at Concordia's 4TH SPACE, an innovative and immersive venue for state-of-the-art installations, which will permit leading experts from around the world to gather to discuss this emerging technology's implication for human rights. MIGS has convened thought leaders and practitioners with the goal of understanding how new technologies are disrupting global affairs. MIGS has worked with Global Affairs Canada and Tech Against Terrorism to explore how artificial intelligence (AI) can counter online extremism and how non-state actors might use AI for nefarious purposes. MIGS has also presented work on AI at the Hague Digital Diplomacy Camp organized by the Dutch Foreign Ministry.