Goto

Collaborating Authors

 mog




Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation

arXiv.org Artificial Intelligence

Integrating information from different reference data sources is a major challenge for Retrieval-Augmented Generation (RAG) systems because each knowledge source adopts a unique data structure and follows different conventions. Retrieving from multiple knowledge sources with one fixed strategy usually leads to under-exploitation of information. To mitigate this drawback, inspired by Mix-of-Expert, we introduce Mix-of-Granularity (MoG), a method that dynamically determines the optimal granularity of a knowledge database based on input queries using a router. The router is efficiently trained with a newly proposed loss function employing soft labels. We further extend MoG to Mix-of-Granularity-Graph (MoGG), where reference documents are pre-processed into graphs, enabling the retrieval of relevant information from distantly situated chunks. Extensive experiments demonstrate that both MoG and MoGG effectively predict optimal granularity levels, significantly enhancing the performance of the RAG system in downstream tasks. The code of both MoG and MoGG will be made public.


Graph Sparsification via Mixture of Graphs

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have demonstrated superior performance across various graph learning tasks but face significant computational challenges when applied to large-scale graphs. One effective approach to mitigate these challenges is graph sparsification, which involves removing non-essential edges to reduce computational overhead. However, previous graph sparsification methods often rely on a single global sparsity setting and uniform pruning criteria, failing to provide customized sparsification schemes for each node's complex local context. In this paper, we introduce Mixture-of-Graphs (MoG), leveraging the concept of Mixtureof-Experts (MoE), to dynamically select tailored pruning solutions for each node. Specifically, MoG incorporates multiple sparsifier experts, each characterized by unique sparsity levels and pruning criteria, and selects the appropriate experts for each node. Subsequently, MoG performs a mixture of the sparse graphs produced by different experts on the Grassmann manifold to derive an optimal sparse graph. One notable property of MoG is its entirely local nature, as it depends on the specific circumstances of each individual node. Extensive experiments on four large-scale OGB datasets and two superpixel datasets, equipped with five GNN backbones, demonstrate that MoG (I) identifies subgraphs at higher sparsity levels (8.67% 50.85%), with performance equal to or better than the dense graph, (II) achieves 1.47 2.62 speedup in GNN inference with negligible performance drop, and (III) boosts "top-student" GNN performance (1.02% on RevGNN+


A New Probabilistic Distance Metric With Application In Gaussian Mixture Reduction

arXiv.org Artificial Intelligence

This paper presents a new distance metric to compare two continuous probability density functions. The main advantage of this metric is that, unlike other statistical measurements, it can provide an analytic, closed-form expression for a mixture of Gaussian distributions while satisfying all metric properties. These characteristics enable fast, stable, and efficient calculations, which are highly desirable in real-world signal processing applications. The application in mind is Gaussian Mixture Reduction (GMR), which is widely used in density estimation, recursive tracking, and belief propagation. To address this problem, we developed a novel algorithm dubbed the Optimization-based Greedy GMR (OGGMR), which employs our metric as a criterion to approximate a high-order Gaussian mixture with a lower order. Experimental results show that the OGGMR algorithm is significantly faster and more efficient than state-of-the-art GMR algorithms while retaining the geometric shape of the original mixture.


Improving the Utility of Differentially Private Clustering through Dynamical Processing

arXiv.org Artificial Intelligence

This study aims to alleviate the trade-off between utility and privacy in the task of differentially private clustering. Existing works focus on simple clustering methods, which show poor clustering performance for non-convex clusters. By utilizing Morse theory, we hierarchically connect the Gaussian sub-clusters to fit complex cluster distributions. Because differentially private sub-clusters are obtained through the existing methods, the proposed method causes little or no additional privacy loss. We provide a theoretical background that implies that the proposed method is inductive and can achieve any desired number of clusters. Experiments on various datasets show that our framework achieves better clustering performance at the same privacy level, compared to the existing methods.


To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph

arXiv.org Artificial Intelligence

Abstract--In this paper, we propose an explanation of representation for self-attention network (SAN) based neural sequence encoders, which regards the information captured by the model and the encoding of the model as graph structure and the generation of these graph structures respectively. The proposed explanation applies to existing works on SAN-based models and can explain the relationship among the ability to capture the structural or linguistic information, depth of model, and length of sentence, and can also be extended to other models such as recurrent neural network based models. We also propose a revisited multigraph called Multi-order-Graph (MoG) based on our explanation to model the graph structures in the SAN-based model as subgraphs in MoG and convert the encoding of SAN-based model to the generation of MoG. Based on our explanation, we further introduce a Graph-Transformer by enhancing the ability to capture multiple subgraphs of different orders and focusing on subgraphs of high orders. Experimental results on multiple neural machine translation tasks show that the Graph-Transformer can yield effective performance improvement. These works show that SAN-based models can embed structural which the encoder takes a sentence as input and generates the and linguistic information, and the information embedding ability corresponding contextualized representations for the decoder for is related to the model depth and sentence length. So far, although NLP tasks with various we may get intuitions as follows, (1) different layers in SANbased modeling ways, generally, there are mainly three types of encoder models may deliver different sorts of information, (2) architectures, recurrent neural network (RNN) [1], [2], [3], convolutional increasing the depth of the model can improve the performance neural network (CNN), and self-attention network (SAN) while improvement may be tiny when the model is too deep, from Transformer [4].


Adaptive Quantile Low-Rank Matrix Factorization

arXiv.org Machine Learning

Low-rank matrix factorization (LRMF) has received much popularity owing to its successful applications in both computer vision and data mining. By assuming the noise term to come from a Gaussian, Laplace or a mixture of Gaussian distributions, significant efforts have been made on optimizing the (weighted) $L_1$ or $L_2$-norm loss between an observed matrix and its bilinear factorization. However, the type of noise distribution is generally unknown in real applications and inappropriate assumptions will inevitably deteriorate the behavior of LRMF. On the other hand, real data are often corrupted by skew rather than symmetric noise. To tackle this problem, this paper presents a novel LRMF model called AQ-LRMF by modeling noise with a mixture of asymmetric Laplace distributions. An efficient algorithm based on the expectation-maximization (EM) algorithm is also offered to estimate the parameters involved in AQ-LRMF. The AQ-LRMF model possesses the advantage that it can approximate noise well no matter whether the real noise is symmetric or skew. The core idea of AQ-LRMF lies in solving a weighted $L_1$ problem with weights being learned from data. The experiments conducted with synthetic and real datasets show that AQ-LRMF outperforms several state-of-the-art techniques. Furthermore, AQ-LRMF also has the superiority over the other algorithms that it can capture local structural information contained in real images.


Variational Memory Encoder-Decoder

Neural Information Processing Systems

Introducing variability while maintaining coherence is a core task in learning to generate utterances in conversation. Standard neural encoder-decoder models and their extensions using conditional variational autoencoder often result in either trivial or digressive responses. To overcome this, we explore a novel approach that injects variability into neural encoder-decoder via the use of external memory as a mixture model, namely Variational Memory Encoder-Decoder (VMED). By associating each memory read with a mode in the latent mixture distribution at each timestep, our model can capture the variability observed in sequential data such as natural conversations. We empirically compare the proposed model against other recent approaches on various conversational datasets. The results show that VMED consistently achieves significant improvement over others in both metric-based and qualitative evaluations.


Variational Memory Encoder-Decoder

Neural Information Processing Systems

Introducing variability while maintaining coherence is a core task in learning to generate utterances in conversation. Standard neural encoder-decoder models and their extensions using conditional variational autoencoder often result in either trivial or digressive responses. To overcome this, we explore a novel approach that injects variability into neural encoder-decoder via the use of external memory as a mixture model, namely Variational Memory Encoder-Decoder (VMED). By associating each memory read with a mode in the latent mixture distribution at each timestep, our model can capture the variability observed in sequential data such as natural conversations. We empirically compare the proposed model against other recent approaches on various conversational datasets. The results show that VMED consistently achieves significant improvement over others in both metric-based and qualitative evaluations.