clr
Donor-Aware scRNA-seq Benchmarks for IBD Classification
Donor-level disease classification from single-cell RNA sequencing (scRNA-seq) requires strict donor-aware cross-validation: naive pipelines that split cells randomly conflate training and test donors, inflating reported performance through pseudoreplication. We present a donor-aware benchmark evaluating three feature representations across two independent IBD cohorts: centered log-ratio (CLR) transformed cell-type composition, GatedStructuralCFN dependency embeddings, and scVI variational autoencoder latent embeddings. The cohorts are the SCP259 ulcerative colitis atlas (UC vs. Healthy, n=30 donors, 51 cell types) and the Kong 2023 Crohn's disease atlas (CD vs. Healthy, n=71 donors, 55-68 cell types across three intestinal regions). Compartment-stratified CLR composition achieves AUROC 0.956 +/- 0.061 on SCP259; GatedStructuralCFN on the same features achieves 0.978 +/- 0.050. In the Kong cohort, CFN achieves its best performance in the colon region (0.960 +/- 0.055 after feature filtering), exceeding linear CLR (0.900 +/- 0.100), while terminal ileum classification is dominated by linear models (CatBoost CLR 0.967 +/- 0.075 vs. CFN 0.811 +/- 0.164). Cross-dataset transfer (CD->UC, four shared cell types) achieves AUC 0.833 with XGBoost CLR; the reverse direction performs at chance. CFN edge stability analysis shows that compartment-wise composition eliminates spurious unit-sum-induced instability present in global composition (Jaccard 0.026 vs. top-20 recurrence 1.0). CFN shows a consistent numerical advantage over linear models in the colon region of CD (AUROC 0.960 vs. 0.900), though no inter-method comparison reached statistical significance at n<=34 donors per region. Compartment-aware feature construction is critical for both classification performance and structural interpretability. Code: https://github.com/Jonathan-321/sfn-scrna-study
KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment
Maintaining comprehensive and up-to-date knowledge graphs (KGs) is critical for modern AI systems, but manual curation struggles to scale with the rapid growth of scientific literature. This paper presents KARMA, a novel framework employing multi-agent large language models (LLMs) to automate KG enrichment through structured analysis of unstructured text. Our approach employs nine collaborative agents, spanning entity discovery, relation extraction, schema alignment, and conflict resolution that iteratively parse documents, verify extracted knowledge, and integrate it into existing graph structures while adhering to domain-specific schema. Experiments on 1,200 PubMed articles from three different domains demonstrate the effectiveness of KARMA in knowledge graph enrichment, with the identification of up to 38,230 new entities while achieving 83.1\% LLM-verified correctness and reducing conflict edges by 18.6\% through multi-layer assessments.
Metric as Transform: Exploring beyond Affine Transform for Interpretable Neural Network
Artificial Neural Networks of varying architectures are generally paired with affine transformation at the core. However, we find dot product neurons with global influence less interpretable as compared to local influence of euclidean distance (as used in Radial Basis Function Network). We find that metrics as transform performs similarly to affine transform when used in MultiLayer Perceptron or Convolutional Neural Network. Moreover, we explore various properties of Metrics, compare it with Affine, and present multiple cases where metrics seem to provide better interpretability. We develop an interpretable local dictionary based Neural Networks and use it to understand and reject adversarial examples. Artificial Neural Networks (ANN) are used end-to-end and generally as black-box function approximators. This is partly due to the vast number of parameters, the underlying function used, and the high dimension of input and hidden neurons. The backbone of Deep Networks including MLP, CNN Krizhevsky et al. (2012), Transformers Vaswani et al. (2017), and MLP-Mixers Tolstikhin et al. (2021) has been a linear transform of form y = Wx + b (or per neuron: y However, matrix multiplication has won both software and hardware-lottery Hooker (2021), is highly optimized and tightly integrated in current Deep Neural Networks. Since metrics have a point of minima, the lower contour set of metrics is generally bounded and the minima represents that two points are similar. We can also generalize the metrics by relaxing different axioms (see Section 2.9). In this paper, we explore various properties of metrics such as voronoi partitioning, metrics as transform, their generalization, invertibility, application in low dimensional embeddings, dictionary learning, adversarial rejection and noisy optimization. First, we analyze various properties of Metrics and how it could help us with interpretability as compared to linear transforms generally used in most neural network architectures. The dot product is simple to understand but we find the dot product neuron, with non-linear activation, is difficult to interpret. It represents a planar neuron rather than a local neuron as shown in Figure 1.
Modeling Comparative Logical Relation with Contrastive Learning for Text Generation
Dan, Yuhao, Tian, Junfeng, Zhou, Jie, Yan, Ming, Zhang, Ji, Chen, Qin, He, Liang
Data-to-Text Generation (D2T), a classic natural language generation problem, aims at producing fluent descriptions for structured input data, such as a table. Existing D2T works mainly focus on describing the superficial associative relations among entities, while ignoring the deep comparative logical relations, such as A is better than B in a certain aspect with a corresponding opinion, which is quite common in our daily life. In this paper, we introduce a new D2T task named comparative logical relation generation (CLRG). Additionally, we propose a Comparative Logic (CoLo) based text generation method, which generates texts following specific comparative logical relations with contrastive learning. Specifically, we first construct various positive and negative samples by fine-grained perturbations in entities, aspects and opinions. Then, we perform contrastive learning in the encoder layer to have a better understanding of the comparative logical relations, and integrate it in the decoder layer to guide the model to correctly generate the relations. Noting the data scarcity problem, we construct a Chinese Comparative Logical Relation Dataset (CLRD), which is a high-quality human-annotated dataset and challenging for text generation with descriptions of multiple entities and annotations on their comparative logical relations. Extensive experiments show that our method achieves impressive performance in both automatic and human evaluations.
Edgewise outliers of network indexed signals
Rieser, Christopher, Ruiz-Gazen, Anne, Thomas-Agnan, Christine
We consider models for network indexed multivariate data involving a dependence between variables as well as across graph nodes. In the framework of these models, we focus on outliers detection and introduce the concept of edgewise outliers. For this purpose, we first derive the distribution of some sums of squares, in particular squared Mahalanobis distances that can be used to fix detection rules and thresholds for outlier detection. We then propose a robust version of the deterministic MCD algorithm that we call edgewise MCD. An application on simulated data shows the interest of taking the dependence structure into account. We also illustrate the utility of the proposed method with a real data set.
Explainable Artificial Intelligence Architecture for Melanoma Diagnosis Using Indicator Localization and Self-Supervised Learning
Sun, Ruitong, Rostami, Mohammad
Melanoma is a prevalent lethal type of cancer that is treatable if diagnosed at early stages of development. Skin lesions are a typical indicator for diagnosing melanoma but they often led to delayed diagnosis due to high similarities of cancerous and benign lesions at early stages of melanoma. Deep learning (DL) can be used as a solution to classify skin lesion pictures with a high accuracy, but clinical adoption of deep learning faces a significant challenge. The reason is that the decision processes of deep learning models are often uninterpretable which makes them black boxes that are challenging to trust. We develop an explainable deep learning architecture for melanoma diagnosis which generates clinically interpretable visual explanations for its decisions. Our experiments demonstrate that our proposed architectures matches clinical explanations significantly better than existing architectures.
Mixture Representation Learning with Coupled Autoencoding Agents
Marghi, Yeganeh M., Gala, Rohan, Sümbül, Uygar
Jointly identifying a mixture of discrete and continuous factors of variability can help unravel complex phenomena. We study this problem by proposing an unsupervised framework called coupled mixture VAE (cpl-mixVAE), which utilizes multiple interacting autoencoding agents. The individual agents operate on augmented copies of training samples to learn mixture representations, while being encouraged to reach consensus on the categorical assignments. We provide theoretical justification to motivate the use of a multi-agent framework, and formulate it as a variational inference problem. We benchmark our approach on MNIST and dSprites, achieving state-of-the-art categorical assignments while preserving interpretability of the continuous factors. We then demonstrate the utility of this approach in jointly identifying cell types and type-specific, activity-regulated genes for a single-cell gene expression dataset profiling over 100 cortical neuron types.
Faster Gradient-based NAS Pipeline Combining Broad Scalable Architecture with Confident Learning Rate
Ding, Zixiang, Chen, Yaran, Li, Nannan, Zhao, Dongbin
In order to further improve the search efficiency of Neural Architecture Search (NAS), we propose B-DARTS, a novel pipeline combining broad scalable architecture with Confident Learning Rate (CLR). In B-DARTS, Broad Convolutional Neural Network (BCNN) is employed as the scalable architecture for DARTS, a popular differentiable NAS approach. On one hand, BCNN is a broad scalable architecture whose topology achieves two advantages compared with the deep one, mainly including faster single-step training speed and higher memory efficiency (i.e. larger batch size for architecture search), which are all contributed to the search efficiency improvement of NAS. On the other hand, DARTS discovers the optimal architecture by gradient-based optimization algorithm, which benefits from two superiorities of BCNN simultaneously. Similar to vanilla DARTS, B-DARTS also suffers from the performance collapse issue, where those weight-free operations are prone to be selected by the search strategy. Therefore, we propose CLR, that considers the confidence of gradient for architecture weights update increasing with the training time of over-parameterized model, to mitigate the above issue. Experimental results on CIFAR-10 and ImageNet show that 1) B-DARTS delivers state-of-the-art efficiency of 0.09 GPU day using first order approximation on CIFAR-10; 2) the learned architecture by B-DARTS achieves competitive performance using state-of-the-art composite multiply-accumulate operations and parameters on ImageNet; and 3) the proposed CLR is effective for performance collapse issue alleviation of both B-DARTS and DARTS.
Applying Cyclical Learning Rate to Neural Machine Translation
Lee, Choon Meng, Liu, Jianfeng, Peng, Wei
In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy for computer vision related convolutional networks and datasets, we explore how cyclical learning rate can be applied to train transformer-based neural networks for neural machine translation. From our carefully designed experiments, we show that the choice of optimizers and the associated cyclical learning rate policy can have a significant impact on the performance. In addition, we establish guidelines when applying cyclical learning rates to neural machine translation tasks. Thus with our work, we hope to raise awareness of the importance of selecting the right optimizers and the accompanying learning rate policy, at the same time, encourage further research into easy-to-use learning rate policies.
Novel Prediction Techniques Based on Clusterwise Linear Regression
Gitman, Igor, Chen, Jieshi, Lei, Eric, Dubrawski, Artur
In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a reasonable way to obtain CLR cluster labels when the values of target variable are unknown. In this paper we propose two novel approaches on how to solve this problem. The first approach, predictive CLR builds a separate classification model to predict test CLR labels. The second approach, constrained CLR utilizes a set of user-specified constraints that enforce certain points to go to the same clusters. Assuming the constraint values are known for the test points, they can be directly used to assign CLR labels. We evaluate these two approaches on three UCI ML datasets as well as on a large corpus of health insurance claims. We show that both of the proposed algorithms significantly improve over the known CLR-based regression methods. Moreover, predictive CLR consistently outperforms linear regression and random forest, and shows comparable performance to support vector regression on UCI ML datasets. The constrained CLR approach achieves the best performance on the health insurance dataset, while enjoying only $\approx 20$ times increased computational time over linear regression.