AITopics

2407.0936

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Jiangsu Province > Nanjing (0.05)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Wang, Bo, Mine, Tsunenori

One Stone, Four Birds: A Comprehensive Solution for QA System Using Supervised Contrastive Learning

arXiv.org Artificial IntelligenceJul-12-2024

This paper presents a novel and comprehensive solution to enhance both the robustness and efficiency of question answering (QA) systems through supervised contrastive learning (SCL). Training a high-performance QA system has become straightforward with pre-trained language models, requiring only a small amount of data and simple fine-tuning. However, despite recent advances, existing QA systems still exhibit significant deficiencies in functionality and training efficiency. We address the functionality issue by defining four key tasks: user input intent classification, out-of-domain input detection, new intent discovery, and continual learning. We then leverage a unified SCL-based representation learning method to efficiently build an intra-class compact and inter-class scattered feature space, facilitating both known intent classification and unknown intent detection and discovery. Consequently, with minimal additional tuning on downstream tasks, our approach significantly improves model efficiency and achieves new state-of-the-art performance across all tasks.

computational linguistic, learning, proceedings, (15 more...)

2407.09011

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)
(10 more...)

Genre: Research Report (0.50)

Industry:

Media (0.67)
Information Technology (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
(2 more...)

Watteau, Timothé, Bonnefoy, Aubin, Illouz-Laurent, Simon, Jusseau, Joaquim, Iovleff, Serge

Advanced Graph Clustering Methods: A Comprehensive and In-Depth Analysis

arXiv.org Machine LearningJul-12-2024

Graph clustering, which aims to divide a graph into several homogeneous groups, is a critical area of study with applications that span various fields such as social network analysis, bioinformatics, and image segmentation. This paper explores both traditional and more recent approaches to graph clustering. Firstly, key concepts and definitions in graph theory are introduced. The background section covers essential topics, including graph Laplacians and the integration of Deep Learning in graph analysis. The paper then delves into traditional clustering methods, including Spectral Clustering and the Leiden algorithm. Following this, state-of-the-art clustering techniques that leverage deep learning are examined. A comprehensive comparison of these methods is made through experiments. The paper concludes with a discussion of the practical applications of graph clustering and potential future research directions.

graph, matrix, node, (12 more...)

arXiv.org Machine Learning

2407.09055

Country:

Europe > Netherlands > South Holland > Leiden (0.25)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry:

Information Technology (0.66)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Rigaud, Christophe, Burie, Jean-Christophe, Petit, Samuel

Toward accessible comics for blind and low vision readers

This work explores how to fine-tune large language models using prompt engineering techniques with contextual information for generating an accurate text description of the full story, ready to be forwarded to off-the-shelve speech synthesis tools. We propose to use existing computer vision and optical character recognition techniques to build a grounded context from the comic strip image content, such as panels, characters, text, reading order and the association of bubbles and characters. Then we infer character identification and generate comic book script with context-aware panel description including character's appearance, posture, mood, dialogues etc. We believe that such enriched content description can be easily used to produce audiobook and eBook with various voices for characters, captions and playing sound effects. Keywords: comics understanding large language model prompt engineering character identification comic book script accessible comics.

information, large language model, machine learning, (17 more...)

2407.08248

Country:

North America > United States > Massachusetts (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Italy > Sardinia > Cagliari (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.93)
Media > Publishing (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Local Clustering for Lung Cancer Image Classification via Sparse Solution Technique

Hamel, Jackson, Lai, Ming-Jun, Shen, Zhaiming, Tian, Ye

local clustering, lung cancer image classification, sparse solution technique

In this work, we propose to use a local clustering approach based on the sparse solution technique to study the medical image, especially the lung cancer image classification task. We view images as the vertices in a weighted graph and the similarity between a pair of images as the edges in the graph. The vertices within the same cluster can be assumed to share similar features and properties, thus making the applications of graph clustering techniques very useful for image classification. Recently, the approach based on the sparse solutions of linear systems for graph clustering has been found to identify clusters more efficiently than traditional clustering methods such as spectral clustering. We propose to use the two newly developed local clustering methods based on sparse solution of linear system for image classification. In addition, we employ a box spline-based tight-wavelet-framelet method to clean these images and help build a better adjacency matrix before clustering. The performance of our methods is shown to be very effective in classifying images. Our approach is significantly more efficient and either favorable or equally effective compared with other state-of-the-art approaches. Finally, we shall make a remark by pointing out two image deformation methods to build up more artificial image data to increase the number of labeled images.

2407.088

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.60)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.60)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Transforming Movie Recommendations with Advanced Machine Learning: A Study of NMF, SVD,and K-Means Clustering

Yan, Yubing, Moreau, Camille, Wang, Zhuoyue, Fan, Wenhan, Fu, Chengqian

Keywords-recommendation system; machine learning; Non-groups based on their viewing patterns. Agent Recurrent Deterministic Policy Gradient (MA-RDPG) The proliferation of digital content has necessitated the algorithm, as suggested by Zhao et al., this research aims to development of effective recommendation systems to aid users optimize overall system performance through enhanced in navigating vast amounts of data. This research aims to explore and implement advanced machine Previous studies have extensively explored collaborative learning techniques [1-6] to create a high-performing movie filtering techniques for recommendation systems. The study addresses the following (2001) [13] demonstrated the effectiveness of matrix research questions: What are the most effective machine factorization in uncovering latent user-item interactions. How do et al. (2009) [14] further refined these techniques, leading to these models compare in terms of accuracy and relevance?

arxiv preprint arxiv, recommendation, recommendation system, (9 more...)

2407.08916

Country:

North America > United States > New York (0.05)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

Yang, Yudong, Wu, Kai, Teng, Xiangyi, Wang, Handing, Yu, He, Liu, Jing

The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task evaluations. We introduce a novel framework that employs a complex network to comprehensively analyze the dynamics of knowledge transfer between tasks within EMaTO. By extracting and scrutinizing the knowledge transfer network from existing EMaTO algorithms, we evaluate the influence of network modifications on overall algorithmic efficacy. Our findings indicate that these networks are diverse, displaying community-structured directed graph characteristics, with their network density adapting to different task sets. This research underscores the viability of integrating complex network concepts into EMaTO to refine knowledge transfer processes, paving the way for future advancements in the domain.

algorithm, knowledge transfer, optimization, (13 more...)

doi: 10.1145/3638530.3654232

2407.08918

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Vietnam > Long An Province > Tân An (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Schäfer, Karla, Choi, Jeong-Eun, Vogel, Inna, Steinebach, Martin

Unveiling the Potential of BERTopic for Multilingual Fake News Analysis -- Use Case: Covid-19

Topic modeling is frequently being used for analysing large text corpora such as news articles or social media data. BERTopic, consisting of sentence embedding, dimension reduction, clustering, and topic extraction, is the newest and currently the SOTA topic modeling method. However, current topic modeling methods have room for improvement because, as unsupervised methods, they require careful tuning and selection of hyperparameters, e.g., for dimension reduction and clustering. This paper aims to analyse the technical application of BERTopic in practice. For this purpose, it compares and selects different methods and hyperparameters for each stage of BERTopic through density based clustering validation and six different topic coherence measures. Moreover, it also aims to analyse the results of topic modeling on real world data as a use case. For this purpose, the German fake news dataset (GermanFakeNCovid) on Covid-19 was created by us and in order to experiment with topic modeling in a multilingual (English and German) setting combined with the FakeCovid dataset. With the final results, we were able to determine thematic similarities between the United States and Germany. Whereas, distinguishing the topics of fake news from India proved to be more challenging.

bertopic, dataset, united states, (14 more...)

2407.08417

Country:

Asia > India (0.28)
North America > Mexico > Yucatán > Mérida (0.05)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
(9 more...)

Genre: Research Report (1.00)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Yakymovych, Andrey, Singh, Abhishek

Uncovering Semantics and Topics Utilized by Threat Actors to Deliver Malicious Attachments and URLs

Recent threat reports highlight that email remains the top vector for delivering malware to endpoints. Despite these statistics, detecting malicious email attachments and URLs often neglects semantic cues linguistic features and contextual clues. Our study employs BERTopic unsupervised topic modeling to identify common semantics and themes embedded in email to deliver malicious attachments and call-to-action URLs. We preprocess emails by extracting and sanitizing content and employ multilingual embedding models like BGE-M3 for dense representations, which clustering algorithms(HDBSCAN and OPTICS) use to group emails by semantic similarity. Phi3-Mini-4K-Instruct facilitates semantic and hLDA aid in thematic analysis to understand threat actor patterns. Our research will evaluate and compare different clustering algorithms on topic quantity, coherence, and diversity metrics, concluding with insights into the semantics and topics commonly used by threat actors to deliver malicious attachments and URLs, a significant contribution to the field of threat detection.

email, representation, threat actor, (9 more...)

2407.08888

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Xiong, Zhangci, Cao, Meng

SLRL: Structured Latent Representation Learning for Multi-view Clustering

In recent years, Multi-View Clustering (MVC) has attracted increasing attention for its potential to reduce the annotation burden associated with large datasets. The aim of MVC is to exploit the inherent consistency and complementarity among different views, thereby integrating information from multiple perspectives to improve clustering outcomes. Despite extensive research in MVC, most existing methods focus predominantly on harnessing complementary information across views to enhance clustering effectiveness, often neglecting the structural information among samples, which is crucial for exploring sample correlations. To address this gap, we introduce a novel framework, termed Structured Latent Representation Learning based Multi-View Clustering method (SLRL). SLRL leverages both the complementary and structural information. Initially, it learns a common latent representation for all views. Subsequently, to exploit the structural information among samples, a k-nearest neighbor graph is constructed from this common latent representation. This graph facilitates enhanced sample interaction through graph learning techniques, leading to a structured latent representation optimized for clustering. Extensive experiments demonstrate that SLRL not only competes well with existing methods but also sets new benchmarks in various multi-view datasets.

information, latent representation, representation, (11 more...)

2407.0834

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)