Goto

Collaborating Authors

 conda


Conda: Column-Normalized Adam for Training Large Language Models Faster

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated impressive generalization and emergent capabilities, yet their pre-training remains computationally expensive and sensitive to optimization dynamics. While Adam-based optimizers offer fast convergence by adapting learning rates coordinate-wise, recent studies reveal that their updates often suffer from poor spectral conditioning and low-rank structures, hindering efficiency. Muon addresses this issue via global spectral normalization but lacks the per-coordinate adaptivity of Adam. In this work, we propose Column-Normalized Adam (Conda), a novel optimizer that bridges the strengths of both approaches. Conda projects updates into an orthogonal subspace and applies column-wise second moment normalization based on the projected gradients, thereby achieving both improved spectral conditioning and maintaining coordinate-wise adaptivity. This design alleviates the spectral pathologies of Adam while preserving its fast convergence behavior. Extensive experiments on the LLaMA and GPT-2 series show that Conda consistently outperforms AdamW, Muon, and other baselines in pre-training. Remarkably, on the LLaMA series, Conda achieves 2-2.5 the convergence speed of AdamW, measured in both training steps and training time. Further ablations demonstrate its robustness under diverse training setups. These results collectively highlight Conda as an effective and broadly applicable optimizer for large-scale LLM training. The code is released on https://github.com/jie040109/Conda


Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts

arXiv.org Artificial Intelligence

Advancements in foundation models (FMs) have led to a paradigm shift in machine learning. The rich, expressive feature representations from these pre-trained, large-scale FMs are leveraged for multiple downstream tasks, usually via lightweight fine-tuning of a shallow fully-connected network following the representation. However, the non-interpretable, black-box nature of this prediction pipeline can be a challenge, especially in critical domains such as healthcare, finance, and security. In this paper, we explore the potential of Concept Bottleneck Models (CBMs) for transforming complex, non-interpretable foundation models into interpretable decision-making pipelines using high-level concept vectors. Specifically, we focus on the test-time deployment of such an interpretable CBM pipeline "in the wild", where the input distribution often shifts from the original training distribution. We first identify the potential failure modes of such a pipeline under different types of distribution shifts. Then we propose an adaptive concept bottleneck framework to address these failure modes, that dynamically adapts the concept-vector bank and the prediction layer based solely on unlabeled data from the target domain, without access to the source (training) dataset. Empirical evaluations with various real-world distribution shifts show that our adaptation method produces concept-based interpretations better aligned with the test data and boosts post-deployment accuracy by up to 28%, aligning the CBM performance with that of non-interpretable classification.


Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Mode

arXiv.org Artificial Intelligence

Continuous-Time Dynamic Graph (CTDG) precisely models evolving real-world relationships, drawing heightened interest in dynamic graph learning across academia and industry. However, existing CTDG models encounter challenges stemming from noise and limited historical data. Graph Data Augmentation (GDA) emerges as a critical solution, yet current approaches primarily focus on static graphs and struggle to effectively address the dynamics inherent in CTDGs. Moreover, these methods often demand substantial domain expertise for parameter tuning and lack theoretical guarantees for augmentation efficacy. To address these issues, we propose Conda, a novel latent diffusion-based GDA method tailored for CTDGs. Conda features a sandwich-like architecture, incorporating a Variational Auto-Encoder (VAE) and a conditional diffusion model, aimed at generating enhanced historical neighbor embeddings for target nodes. Unlike conventional diffusion models trained on entire graphs via pre-training, Conda requires historical neighbor sequence embeddings of target nodes for training, thus facilitating more targeted augmentation. We integrate Conda into the CTDG model and adopt an alternating training strategy to optimize performance. Extensive experimentation across six widely used real-world datasets showcases the consistent performance improvement of our approach, particularly in scenarios with limited historical data.


ConDA: Contrastive Domain Adaptation for AI-generated Text Detection

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly being used for generating text in a variety of use cases, including journalistic news articles. Given the potential malicious nature in which these LLMs can be used to generate disinformation at scale, it is important to build effective detectors for such AI-generated text. Given the surge in development of new LLMs, acquiring labeled training data for supervised detectors is a bottleneck. However, there might be plenty of unlabeled text data available, without information on which generator it came from. In this work we tackle this data problem, in detecting AI-generated news text, and frame the problem as an unsupervised domain adaptation task. Here the domains are the different text generators, i.e. LLMs, and we assume we have access to only the labeled source data and unlabeled target data. We develop a Contrastive Domain Adaptation framework, called ConDA, that blends standard domain adaptation techniques with the representation power of contrastive learning to learn domain invariant representations that are effective for the final unsupervised detection task. Our experiments demonstrate the effectiveness of our framework, resulting in average performance gains of 31.7% from the best performing baselines, and within 0.8% margin of a fully supervised detector. All our code and data is available at https://github.com/AmritaBh/ConDA-gen-text-detection.


Explaining and Adapting Graph Conditional Shift

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have shown remarkable performance on graph-structured data. However, recent empirical studies suggest that GNNs are very susceptible to distribution shift. There is still significant ambiguity about why graph-based models seem more vulnerable to these shifts. In this work we provide a thorough theoretical analysis on it by quantifying the magnitude of conditional shift between the input features and the output label. Our findings show that both graph heterophily and model architecture exacerbate conditional shifts, leading to performance degradation. To address this, we propose an approach that involves estimating and minimizing the conditional shift for unsupervised domain adaptation on graphs. In our controlled synthetic experiments, our algorithm demonstrates robustness towards distribution shift, resulting in up to 10% absolute ROC AUC improvement versus the second-best algorithm. Furthermore, comprehensive experiments on both node classification and graph classification show its robust performance under various distribution shifts.


How to run Segment Anything on Linux

#artificialintelligence

This post shows how to run Segment Anything Model (SAM) developed by Meta Research. SAM is explained like below. The Segment Anything Model (SAM) produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks. How SAM works is explained in the blog below.


ConDA: Unsupervised Domain Adaptation for LiDAR Segmentation via Regularized Domain Concatenation

arXiv.org Artificial Intelligence

Transferring knowledge learned from the labeled source domain to the raw target domain for unsupervised domain adaptation (UDA) is essential to the scalable deployment of autonomous driving systems. State-of-the-art methods in UDA often employ a key idea: utilizing joint supervision signals from both source and target domains for self-training. In this work, we improve and extend this aspect. We present ConDA, a concatenation-based domain adaptation framework for LiDAR segmentation that: 1) constructs an intermediate domain consisting of fine-grained interchange signals from both source and target domains without destabilizing the semantic coherency of objects and background around the ego-vehicle; and 2) utilizes the intermediate domain for self-training. To improve the network training on the source domain and self-training on the intermediate domain, we propose an anti-aliasing regularizer and an entropy aggregator to reduce the negative effect caused by the aliasing artifacts and noisy pseudo labels. Through extensive studies, we demonstrate that ConDA significantly outperforms prior arts in mitigating domain gaps.


Managing Machine Learning Lifecycles with MLflow

#artificialintelligence

Model development and experimentation is part of any machine learning lifecycle. However, without careful planning, keeping track of experiments can become tedious and challenging; especially given the number of configurations we typically deal with. MLflow is a machine learning lifecycle framework that allows ML engineers and teams to keep track of their experiments. In PART 1 of the series, we are going to focus on the first two steps -- tracking experiments and sharing code. PART 2 will be dedicated to model packaging, while PART 3 will show how the concepts outlined in the previous parts can be used in a React web application. For now, let's try to understand what MLflow is, and what it can do for us!


In-game Toxic Language Detection: Shared Task and Attention Residuals

arXiv.org Artificial Intelligence

In-game toxic language becomes the hot potato in the gaming industry and community. There have been several online game toxicity analysis frameworks and models proposed. However, it is still challenging to detect toxicity due to the nature of in-game chat, which has extremely short length. In this paper, we describe how the in-game toxic language shared task has been established using the real-world in-game chat data. In addition, we propose and introduce the model/framework for toxic language token tagging (slot filling) from the in-game chat. The data and code will be released.


GitHub - RubensZimbres/best-of-ml-python: ๐Ÿ† A ranked list of awesome machine learning Python libraries. Updated weekly.

#artificialintelligence

A ranked list of awesome machine learning Python libraries. This curated list contains 830 awesome open-source projects with a total of 2.6M stars grouped into 32 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Discover other best-of lists or create your own.