AITopics

2410.00408

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment (0.68)
Information Technology > Services (0.47)
Media > Film (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceOct-1-2024

Linear Projections of Teacher Embeddings for Few-Class Distillation

Loo, Noel, Iliopoulos, Fotis, Hu, Wei, Vee, Erik

Knowledge Distillation (KD) has emerged as a promising approach for transferring knowledge from a larger, more complex teacher model to a smaller student model. Traditionally, KD involves training the student to mimic the teacher's output probabilities, while more advanced techniques have explored guiding the student to adopt the teacher's internal representations. Despite its widespread success, the performance of KD in binary classification and few-class problems has been less satisfactory. This is because the information about the teacher model's generalization patterns scales directly with the number of classes. Moreover, several sophisticated distillation methods may not be universally applicable or effective for data types beyond Computer Vision. Consequently, effective distillation techniques remain elusive for a range of key real-world applications, such as sentiment analysis, search query understanding, and advertisement-query relevance assessment. Taking these observations into account, we introduce a novel method for distilling knowledge from the teacher's model representations, which we term Learning Embedding Linear Projections (LELP). Inspired by recent findings about the structure of final-layer representations, LELP works by identifying informative linear subspaces in the teacher's embedding space, and splitting them into pseudo-subclasses. The student model is then trained to replicate these pseudo-classes. Our experimental evaluation on large-scale NLP benchmarks like Amazon Reviews and Sentiment140 demonstrate the LELP is consistently competitive with, and typically superior to, existing state-of-the-art distillation algorithms for binary and few-class problems, where most KD methods suffer.

dataset, lelp, teacher model, (14 more...)

2409.20449

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Ortiz-Bouza, Meiby, Aviyente, Selin

Discriminative community detection for multiplex networks

Multiplex networks have emerged as a promising approach for modeling complex systems, where each layer represents a different mode of interaction among entities of the same type. A core task in analyzing these networks is to identify the community structure for a better understanding of the overall functioning of the network. While different methods have been proposed to detect the community structure of multiplex networks, the majority deal with extracting the consensus community structure across layers. In this paper, we address the community detection problem across two closely related multiplex networks. For example in neuroimaging studies, it is common to have multiple multiplex brain networks where each layer corresponds to an individual and each group to different experimental conditions. In this setting, one may be interested in both learning the community structure representing each experimental condition and the discriminative community structure between two groups. In this paper, we introduce two discriminative community detection algorithms based on spectral clustering. The first approach aims to identify the discriminative subgraph structure between the groups, while the second one learns the discriminative and the consensus community structures, simultaneously. The proposed approaches are evaluated on both simulated and real world multiplex networks.

data mining, machine learning, multiplex network, (19 more...)

doi: 10.1109/MLSP58920.2024.10734717

2410.00724

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.69)
Health & Medicine > Health Care Technology (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Silva, Fillipe dos Santos, Kakimoto, Gabriel Kenzo, Reis, Julio Cesar dos, Reis, Marcelo S.

ERASMO: Leveraging Large Language Models for Enhanced Clustering Segmentation

Cluster analysis plays a crucial role in various domains and applications, such as customer segmentation in marketing. These contexts often involve multimodal data, including both tabular and textual datasets, making it challenging to represent hidden patterns for obtaining meaningful clusters. This study introduces ERASMO, a framework designed to fine-tune a pretrained language model on textually encoded tabular data and generate embeddings from the fine-tuned model. ERASMO employs a textual converter to transform tabular data into a textual format, enabling the language model to process and understand the data more effectively. Additionally, ERASMO produces contextually rich and structurally representative embeddings through techniques such as random feature sequence shuffling and number verbalization. Extensive experimental evaluations were conducted using multiple datasets and baseline approaches. Our results demonstrate that ERASMO fully leverages the specific context of each tabular dataset, leading to more precise and nuanced embeddings for accurate clustering. This approach enhances clustering performance by capturing complex relationship patterns within diverse tabular data.

large language model, machine learning, natural language, (19 more...)

2410.03738

Country: South America > Brazil (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)

Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling

Grangier, David, Fan, Simin, Seto, Skyler, Ablin, Pierre

Specialist language models (LMs) focus on a specific task or domain on which they often outperform generalist LMs of the same size. However, the specialist data needed to pretrain these models is only available in limited amount for most tasks. In this work, we build specialist models from large generalist training sets instead. We adjust the training distribution of the generalist data with guidance from the limited domain-specific data. We explore several approaches, with clustered importance sampling standing out. This method clusters the generalist dataset and samples from these clusters based on their frequencies in the smaller specialist dataset. It is scalable, suitable for pretraining and continued pretraining, it works well in multi-task settings. Our findings demonstrate improvements across different domains in terms of language modeling perplexity and accuracy on multiple-choice question tasks. We also present ablation studies that examine the impact of dataset sizes, clustering configurations, and model sizes. Generalist language models (LMs) can address a wide variety of tasks, but this generality comes at a cost (Brown et al., 2020). It necessitates a large training set representative of all prospective tasks, as well as a large model to fit such a comprehensive dataset.

computational linguistic, large language model, machine learning, (19 more...)

2410.03735

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Berlin (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(12 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Boral, Subhadip, Pal, Rikathi, Ghosh, Ashish

ACEV: Unsupervised Intersecting Manifold Segmentation using Adaptation to Angular Change of Eigenvectors in Intrinsic Dimension

Intersecting manifold segmentation has been a focus of research, where individual manifolds, that intersect with other manifolds, are separated to discover their distinct properties. The proposed method is based on the intuition that when a manifold in $D$ dimensional space with an intrinsic dimension of $d$ intersects with another manifold, the data variance grows in more than $d$ directions. The proposed method measures local data variances and determines their vector directions. It counts the number of vectors with non-zero variance, which determines the manifold's intrinsic dimension. For detection of the intersection region, the method adapts to the changes in the angular gaps between the corresponding direction vectors of the child and parent using exponential moving averages using a tree structure construction. Accordingly, it includes those data points in the same manifold whose neighborhood is within the adaptive angular difference and eventually identifies the data points in the intersection area of manifolds. Data points whose inclusion in the neighborhood-identified data points increases their intrinsic dimensionality are removed based on data variance and distance. The proposed method performs better than 18 SOTA manifold segmentation methods in ARI and NMI scores over 14 real-world datasets with lesser time complexity and better stability.

algorithm, manifold, neighbourhood, (13 more...)

2410.0093

Country:

Asia > India > West Bengal > Kolkata (0.14)
Oceania > Australia (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > Experimental Study (0.47)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Chaves, Anderson, Ogasawara, Eduardo, Valduriez, Patrick, Porto, Fabio

StreamEnsemble: Predictive Queries over Spatiotemporal Streaming Data

arXiv.org Machine LearningSep-30-2024

Predictive queries over spatiotemporal (ST) stream data pose significant data processing and analysis challenges. ST data streams involve a set of time series whose data distributions may vary in space and time, exhibiting multiple distinct patterns. In this context, assuming a single machine learning model would adequately handle such variations is likely to lead to failure. To address this challenge, we propose StreamEnsemble, a novel approach to predictive queries over ST data that dynamically selects and allocates Machine Learning models according to the underlying time series distributions and model characteristics. Our experimental evaluation reveals that this method markedly outperforms traditional ensemble methods and single model approaches in terms of accuracy and time, demonstrating a significant reduction in prediction error of more than 10 times compared to traditional approaches.

data distribution, data stream, time sery, (15 more...)

arXiv.org Machine Learning

2410.00933

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
South America > Brazil > São Paulo (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine (0.46)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

arXiv.org Artificial IntelligenceSep-29-2024

RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models

Chen, Shuhao, Jiang, Weisen, Lin, Baijiong, Kwok, James T., Zhang, Yu

Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method called query-based Router by Dual Contrastive learning (RouterDC). The RouterDC model consists of an encoder and LLM embeddings, and we propose two contrastive learning losses to train the RouterDC model. Experimental results show that RouterDC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution (+2.76\%) and out-of-distribution (+1.90\%) tasks. Source code is available at https://github.com/shuhao02/RouterDC.

large language model, machine learning, natural language, (16 more...)

2409.19886

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Abdullah, Abdulhady Abas, Ahmed, Aram Mahmood, Rashid, Tarik, Veisi, Hadi, Rassul, Yassin Hussein, Hassan, Bryar, Fattah, Polla, Ali, Sabat Abdulhameed, Shamsaldin, Ahmed S.

Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods

arXiv.org Artificial IntelligenceSep-28-2024

Speech signal processing is a cornerstone of modern communication technologies, tasked with improving the clarity and comprehensibility of audio data in noisy environments. The primary challenge in this field is the effective separation and recognition of speech from background noise, crucial for applications ranging from voice-activated assistants to automated transcription services. The quality of speech recognition directly impacts user experience and accessibility in technology-driven communication. This review paper explores advanced clustering techniques, particularly focusing on the Kernel Fuzzy C-Means (KFCM) method, to address these challenges. Our findings indicate that KFCM, compared to traditional methods like K-Means (KM) and Fuzzy C-Means (FCM), provides superior performance in handling non-linear and non-stationary noise conditions in speech signals. The most notable outcome of this review is the adaptability of KFCM to various noisy environments, making it a robust choice for speech enhancement applications. Additionally, the paper identifies gaps in current methodologies, such as the need for more dynamic clustering algorithms that can adapt in real time to changing noise conditions without compromising speech recognition quality. Key contributions include a detailed comparative analysis of current clustering algorithms and suggestions for further integrating hybrid models that combine KFCM with neural networks to enhance speech recognition accuracy. Through this review, we advocate for a shift towards more sophisticated, adaptive clustering techniques that can significantly improve speech enhancement and pave the way for more resilient speech processing systems.

artificial intelligence, fuzzy c-means, machine learning, (15 more...)

2409.19448

Country:

Asia > Middle East > Iraq > Erbil Governorate > Erbil (0.04)
Asia > Middle East > Iraq > Kurdistan Region (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Solomon, Indu, Aung, Aye Phyu Phyu, Kumar, Uttam, Jayavelu, Senthilnath

Continual learning with task specialist

arXiv.org Artificial IntelligenceSep-26-2024

Continual learning (CL) adapt the deep learning scenarios with timely updated datasets. However, existing CL models suffer from the catastrophic forgetting issue, where new knowledge replaces past learning. In this paper, we propose Continual Learning with Task Specialists (CLTS) to address the issues of catastrophic forgetting and limited labelled data in real-world datasets by performing class incremental learning of the incoming stream of data. The model consists of Task Specialists (T S) and Task Predictor (T P ) with pre-trained Stable Diffusion (SD) module. Here, we introduce a new specialist to handle a new task sequence and each T S has three blocks; i) a variational autoencoder (V AE) to learn the task distribution in a low dimensional latent space, ii) a K-Means block to perform data clustering and iii) Bootstrapping Language-Image Pre-training (BLIP ) model to generate a small batch of captions from the input data. These captions are fed as input to the pre-trained stable diffusion model (SD) for the generation of task samples. The proposed model does not store any task samples for replay, instead uses generated samples from SD to train the T P module. A comparison study with four SOTA models conducted on three real-world datasets shows that the proposed model outperforms all the selected baselines

architecture, dataset, learning, (14 more...)

2409.17806

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Singapore (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)