AITopics

2404.00883

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Visualizing Routes with AI-Discovered Street-View Patterns

Wu, Tsung Heng, Amiruzzaman, Md, Zhao, Ye, Bhati, Deepshikha, Yang, Jing

Street-level visual appearances play an important role in studying social systems, such as understanding the built environment, driving routes, and associated social and economic factors. It has not been integrated into a typical geographical visualization interface (e.g., map services) for planning driving routes. In this paper, we study this new visualization task with several new contributions. First, we experiment with a set of AI techniques and propose a solution of using semantic latent vectors for quantifying visual appearance features. Second, we calculate image similarities among a large set of street-view images and then discover spatial imagery patterns. Third, we integrate these discovered patterns into driving route planners with new visualization techniques. Finally, we present VivaRoutes, an interactive visualization prototype, to show how visualizations leveraged with these discovered patterns can help users effectively and interactively explore multiple routes. Furthermore, we conducted a user study to assess the usefulness and utility of VivaRoutes.

machine learning, natural language, street-view image, (20 more...)

2404.00431

Country:

North America > United States > New York > New York County > Manhattan (0.14)
North America > United States > Ohio (0.04)
North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)
(3 more...)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Transportation (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
(4 more...)

Clustering for Protein Representation Learning

Quan, Ruijie, Wang, Wenguan, Ma, Fan, Fan, Hehe, Yang, Yi

Protein representation learning is a challenging task that aims to capture the structure and function of proteins from their amino acid sequences. Previous methods largely ignored the fact that not all amino acids are equally important for protein folding and activity. In this article, we propose a neural clustering framework that can automatically discover the critical components of a protein by considering both its primary and tertiary structure information. Our framework treats a protein as a graph, where each node represents an amino acid and each edge represents a spatial or sequential connection between amino acids. We then apply an iterative clustering strategy to group the nodes into clusters based on their 1D and 3D positions and assign scores to each cluster. We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein. We evaluate on four protein-related tasks: protein fold classification, enzyme reaction classification, gene ontology term prediction, and enzyme commission number prediction. Experimental results demonstrate that our method achieves state-of-the-art performance.

amino acid, protein, representation, (16 more...)

2404.00254

Country:

North America > United States (0.14)
Europe > France (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Hematology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Leveraging Pre-trained and Transformer-derived Embeddings from EHRs to Characterize Heterogeneity Across Alzheimer's Disease and Related Dementias

West, Matthew, Magdamo, Colin, Cheng, Lily, He, Yingnan, Das, Sudeshna

Alzheimer's disease is a progressive, debilitating neurodegenerative disease that affects 50 million people globally. Despite this substantial health burden, available treatments for the disease are limited and its fundamental causes remain poorly understood. Previous work has suggested the existence of clinically-meaningful sub-types, which it is suggested may correspond to distinct etiologies, disease courses, and ultimately appropriate treatments. Here, we use unsupervised learning techniques on electronic health records (EHRs) from a cohort of memory disorder patients to characterise heterogeneity in this disease population. Pre-trained embeddings for medical codes as well as transformer-derived Clinical BERT embeddings of free text are used to encode patient EHRs. We identify the existence of sub-populations on the basis of comorbidities and shared textual features, and discuss their clinical significance.

dementia, prevalence, representation, (13 more...)

2404.00464

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > Experimental Study (0.69)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Data Science (0.93)

From Similarity to Superiority: Channel Clustering for Time Series Forecasting

Chen, Jialin, Lenssen, Jan Eric, Feng, Aosong, Hu, Weihua, Fey, Matthias, Tassiulas, Leandros, Leskovec, Jure, Ying, Rex

Time series forecasting has attracted significant attention in recent decades. Previous studies have demonstrated that the Channel-Independent (CI) strategy improves forecasting performance by treating different channels individually, while it leads to poor generalization on unseen instances and ignores potentially necessary interactions between channels. Conversely, the Channel-Dependent (CD) strategy mixes all channels with even irrelevant and indiscriminate information, which, however, results in oversmoothing issues and limits forecasting accuracy. There is a lack of channel strategy that effectively balances individual channel treatment for improved forecasting performance without overlooking essential interactions between channels. Motivated by our observation of a correlation between the time series model's performance boost against channel mixing and the intrinsic similarity on a pair of channels, we developed a novel and adaptable Channel Clustering Module (CCM). CCM dynamically groups channels characterized by intrinsic similarities and leverages cluster identity instead of channel identity, combining the best of CD and CI worlds. Extensive experiments on real-world datasets demonstrate that CCM can (1) boost the performance of CI and CD models by an average margin of 2.4% and 7.2% on long-term and short-term forecasting, respectively; (2) enable zero-shot forecasting with mainstream time series forecasting models; (3) uncover intrinsic time series patterns among channels and improve interpretability of complex time series models.

dataset, forecasting, series forecasting, (10 more...)

2404.0134

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > United States > North Carolina (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(2 more...)

Genre: Research Report > New Finding (0.74)

Industry: Banking & Finance > Trading (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMar-29-2024

Learning using granularity statistical invariants for classification

Zhu, Ting-Ting, Shao, Yuan-Hai, Li, Chun-Na, Liu, Tian

Learning using statistical invariants (LUSI) is a new learning paradigm, which adopts weak convergence mechanism, and can be applied to a wider range of classification problems. However, the computation cost of invariant matrices in LUSI is high for large-scale datasets during training. To settle this issue, this paper introduces a granularity statistical invariant for LUSI, and develops a new learning paradigm called learning using granularity statistical invariants (LUGSI). LUGSI employs both strong and weak convergence mechanisms, taking a perspective of minimizing expected risk. As far as we know, it is the first time to construct granularity statistical invariants. Compared to LUSI, the introduction of this new statistical invariant brings two advantages. Firstly, it enhances the structural information of the data. Secondly, LUGSI transforms a large invariant matrix into a smaller one by maximizing the distance between classes, achieving feasibility for large-scale datasets classification problems and significantly enhancing the training speed of model operations. Experimental results indicate that LUGSI not only exhibits improved generalization capabilities but also demonstrates faster training speed, particularly for large-scale datasets.

dataset, invariant, statistical invariant, (15 more...)

2403.20122

Country: Asia > China > Hainan Province (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Guedes, Gustavo Bartz, da Silva, Ana Estela Antunes

Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning

arXiv.org Artificial IntelligenceMar-29-2024

Scientific articles are long text documents organized into sections, each describing aspects of the research. Analyzing scientific production has become progressively challenging due to the increase in the number of available articles. Within this scenario, our approach consisted of fine-tuning transformer language models to generate sentence-level embeddings from scientific articles, considering the following labels: background, objective, methods, results, and conclusion. We trained our models on three datasets with contrastive learning. Two datasets are from the article's abstracts in the computer science and medical domains. Also, we introduce PMC-Sents-FULL, a novel dataset of sentences extracted from the full texts of medical articles. We compare the fine-tuned and baseline models in clustering and classification tasks to evaluate our approach. On average, clustering agreement measures values were five times higher. For the classification measures, in the best-case scenario, we had an average improvement in F1-micro of 30.73\%. Results show that fine-tuning sentence transformers with contrastive learning and using the generated embeddings in downstream tasks is a feasible approach to sentence classification in scientific articles. Our experiment codes are available on GitHub.

classification, computer science & information technology, dataset, (11 more...)

doi: 10.5121/csit.2023.131923

2404.00224

Country:

Asia > China > Hong Kong (0.04)
South America > Brazil > São Paulo > Campinas (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

arXiv.org Artificial IntelligenceMar-28-2024

A Two-Phase Recall-and-Select Framework for Fast Model Selection

Cui, Jianwei, Shi, Wenhang, Tao, Honglin, Lu, Wei, Du, Xiaoyong

As the ubiquity of deep learning in various machine learning applications has amplified, a proliferation of neural network models has been trained and shared on public model repositories. In the context of a targeted machine learning assignment, utilizing an apt source model as a starting point typically outperforms the strategy of training from scratch, particularly with limited training data. Despite the investigation and development of numerous model selection strategies in prior work, the process remains time-consuming, especially given the ever-increasing scale of model repositories. In this paper, we propose a two-phase (coarse-recall and fine-selection) model selection framework, aiming to enhance the efficiency of selecting a robust model by leveraging the models' training performances on benchmark datasets. Specifically, the coarse-recall phase clusters models showcasing similar training performances on benchmark datasets in an offline manner. A light-weight proxy score is subsequently computed between this model cluster and the target dataset, which serves to recall a significantly smaller subset of potential candidate models in a swift manner. In the following fine-selection phase, the final model is chosen by fine-tuning the recalled models on the target dataset with successive halving. To accelerate the process, the final fine-tuning performance of each potential model is predicted by mining the model's convergence trend on the benchmark datasets, which aids in filtering lower performance models more earlier during fine-tuning. Through extensive experimentation on tasks covering natural language processing and computer vision, it has been demonstrated that the proposed methodology facilitates the selection of a high-performing model at a rate about 3x times faster than conventional baseline methods. Our code is available at https://github.com/plasware/two-phase-selection.

artificial intelligence, deep learning, machine learning, (19 more...)

2404.00069

Country:

Asia (0.46)
North America > United States (0.46)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.74)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Roantree, Mark, Murphi, Niamh, Cuong, Dinh Viet, Ngo, Vuong Minh

Graph-Based Optimisation of Network Expansion in a Dockless Bike Sharing System

arXiv.org Artificial IntelligenceMar-28-2024

Bike-sharing systems (BSSs) are deployed in over a thousand cities worldwide and play an important role in many urban transportation systems. BSSs alleviate congestion, reduce pollution and promote physical exercise. It is essential to explore the spatiotemporal patterns of bike-sharing demand, as well as the factors that influence these patterns, in order to optimise system operational efficiency. In this study, an optimised geo-temporal graph is constructed using trip data from Moby Bikes, a dockless BSS operator. The process of optimising the graph unveiled prime locations for erecting new stations during future expansions of the BSS. The Louvain algorithm, a community detection technique, is employed to uncover usage patterns at different levels of temporal granularity. The community detection results reveal largely self-contained sub-networks that exhibit similar usage patterns at their respective levels of temporal granularity. Overall, this study reinforces that BSSs are intrinsically spatiotemporal systems, with community presence driven by spatiotemporal dynamics. These findings may aid operators in improving redistribution efficiency.

algorithm, detection, graph, (14 more...)

2404.0132

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.68)
Transportation > Infrastructure & Services (0.48)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceMar-28-2024

CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

Wen, Jie, Zhang, Zheng, Xu, Yong, Zhang, Bob, Fei, Lunke, Xie, Guo-Sen

In recent years, incomplete multi-view clustering, which studies the challenging multi-view clustering problem on missing views, has received growing research interests. Although a series of methods have been proposed to address this issue, the following problems still exist: 1) Almost all of the existing methods are based on shallow models, which is difficult to obtain discriminative common representations. 2) These methods are generally sensitive to noise or outliers since the negative samples are treated equally as the important samples. In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues. Specifically, it captures the high-level features and local structure of each view by incorporating the view-specific deep encoders and graph embedding strategy into a framework. Moreover, based on the human cognition, i.e., learning from easy to hard, it introduces a self-paced strategy to select the most confident samples for model training, which can reduce the negative influence of outliers. Experimental results on several incomplete datasets show that CDIMC-net outperforms the state-of-the-art incomplete multi-view clustering methods.

cdimc-net, database, graph, (14 more...)

2403.19514

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > China > Guangdong Province > Shenzhen (0.05)
South America > Paraguay > Asunción > Asunción (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)