AITopics

2102.10263

Country: Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Jahanshahi, Hadi, Baydogan, Mustafa Gokce

nTreeClus: a Tree-based Sequence Encoder for Clustering Categorical Series

arXiv.org Machine LearningFeb-19-2021

The overwhelming presence of categorical/sequential data in diverse domains emphasizes the importance of sequence mining. The challenging nature of sequences proves the need for continuing research to find a more accurate and faster approach providing a better understanding of their (dis)similarities. This paper proposes a new Model-based approach for clustering sequence data, namely nTreeClus. The proposed method deploys Tree-based Learners, k-mers, and autoregressive models for categorical time series, culminating with a novel numerical representation of the categorical sequences. Adopting this new representation, we cluster sequences, considering the inherent patterns in categorical time series. Accordingly, the model showed robustness to its parameter. Under different simulated scenarios, nTreeClus improved the baseline methods for various internal and external cluster validation metrics for up to 10.7% and 2.7%, respectively. The empirical evaluation using synthetic and real datasets, protein sequences, and categorical time series showed that nTreeClus is competitive or superior to most state-of-the-art algorithms.

complexity, dataset, sequence, (14 more...)

2102.10252

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Florida > Hillsborough County > Tampa (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(5 more...)

arXiv.org Artificial IntelligenceFeb-19-2021

Analytics and Machine Learning in Vehicle Routing Research

Bai, Ruibin, Chen, Xinan, Chen, Zhi-Long, Cui, Tianxiang, Gong, Shuhui, He, Wentao, Jiang, Xiaoping, Jin, Huan, Jin, Jiahuan, Kendall, Graham, Li, Jiawei, Lu, Zheng, Ren, Jianfeng, Weng, Paul, Xue, Ning, Zhang, Huayan

The Vehicle Routing Problem (VRP) is one of the most intensively studied combinatorial optimisation problems for which numerous models and algorithms have been proposed. To tackle the complexities, uncertainties and dynamics involved in real-world VRP applications, Machine Learning (ML) methods have been used in combination with analytical approaches to enhance problem formulations and algorithmic performance across different problem solving scenarios. However, the relevant papers are scattered in several traditional research fields with very different, sometimes confusing, terminologies. This paper presents a first, comprehensive review of hybrid methods that combine analytical techniques with ML tools in addressing VRP problems. Specifically, we review the emerging research streams on ML-assisted VRP modelling and ML-assisted VRP optimisation. We conclude that ML can be beneficial in enhancing VRP modelling, and improving the performance of algorithms for both online and offline VRP optimisations. Finally, challenges and future opportunities of VRP research are discussed.

algorithm, vehicle, vehicle routing problem, (15 more...)

2102.10012

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)
(12 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)
Research Report > Promising Solution (0.45)

Industry: Transportation > Freight & Logistics Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Chen, Irene Y., Krishnan, Rahul G., Sontag, David

Clustering Left-Censored Multivariate Time-Series

arXiv.org Machine LearningFeb-18-2021

Unsupervised learning seeks to uncover patterns in data. However, different kinds of noise may impede the discovery of useful substructure from real-world time-series data. In this work, we focus on mitigating the interference of left-censorship in the task of clustering. We provide conditions under which clusters and left-censorship may be identified; motivated by this result, we develop a deep generative, continuous-time model of time-series data that clusters while correcting for censorship time. We demonstrate accurate, stable, and interpretable results on synthetic data that outperform several benchmarks. To showcase the utility of our framework on real-world problems, we study how left-censorship can adversely affect the task of disease phenotyping, resulting in the often incorrect assumption that longitudinal patient data are aligned by disease stage. In reality, patients at the time of diagnosis are at different stages of the disease -- both late and early due to differences in when patients seek medical care and such discrepancy can confound unsupervised learning algorithms. On two clinical datasets, our model corrects for this form of censorship and recovers known clinical subtypes.

dataset, sublign, subtype, (17 more...)

2102.07005

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Pennsylvania (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Rocío, del Amor, Adrián, Colomer, Carlos, Monteagudo, Valery, Naranjo

A Deep Embedded Refined Clustering Approach for Breast Cancer Distinction based on DNA Methylation

Epigenetic alterations have an important role in the development of several types of cancer. Epigenetic studies generate a large amount of data, which makes it essential to develop novel models capable of dealing with large-scale data. In this work, we propose a deep embedded refined clustering method for breast cancer differentiation based on DNA methylation. In concrete, the deep learning system presented here uses the levels of CpG island methylation between 0 and 1. The proposed approach is composed of two main stages. The first stage consists in the dimensionality reduction of the methylation data based on an autoencoder. The second stage is a clustering algorithm based on the soft-assignment of the latent space provided by the autoencoder. The whole method is optimized through a weighted loss function composed of two terms: reconstruction and classification terms. To the best of the authors' knowledge, no previous studies have focused on the dimensionality reduction algorithms linked to classification trained end-to-end for DNA methylation analysis. The proposed method achieves an unsupervised clustering accuracy of 0.9927 and an error rate (%) of 0.73 on 137 breast tissue samples. After a second test of the deep-learning-based method using a different methylation database, an accuracy of 0.9343 and an error rate (%) of 6.57 on 45 breast tissue samples is obtained. Based on these results, the proposed algorithm outperforms other state-of-the-art methods evaluated under the same conditions for breast cancer classification based on DNA methylation data.

algorithm, autoencoder, dimensionality reduction, (13 more...)

2102.09563

Country: Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)

Genre:

Research Report > Promising Solution (0.54)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Rodriguez, Sara Ines Rizo, de Carvalho, Francisco de Assis Tenorio

Fuzzy clustering algorithms with distance metric learning and entropy regularization

The clustering methods have been used in a variety of fields such as image processing, data mining, pattern recognition, and statistical analysis. Generally, the clustering algorithms consider all variables equally relevant or not correlated for the clustering task. Nevertheless, in real situations, some variables can be correlated or may be more or less relevant or even irrelevant for this task. This paper proposes partitioning fuzzy clustering algorithms based on Euclidean, City-block and Mahalanobis distances and entropy regularization. These methods are an iterative three steps algorithms which provide a fuzzy partition, a representative for each fuzzy cluster, and the relevance weight of the variables or their correlation by minimizing a suitable objective function. Several experiments on synthetic and real datasets, including its application to noisy image texture segmentation, demonstrate the usefulness of these adaptive clustering methods.

algorithm, dataset, matrix, (14 more...)

2102.09529

Country:

South America > Paraguay > Asunción > Asunción (0.04)
South America > Brazil > Pernambuco > Recife (0.04)
North America > United States > Wisconsin (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Straka, Milan, Piatriková, Lucia, van Bokhoven, Peter, Buzna, Ľuboš

A matrix approach to detect temporal behavioral patterns at electric vehicle charging stations

Based on the electric vehicle (EV) arrival times and the duration of EV connection to the charging station, we identify charging patterns and derive groups of charging stations with similar charging patterns applying two approaches. The ruled based approach derives the charging patterns by specifying a set of time intervals and a threshold value. In the second approach, we combine the modified l-p norm (as a matrix dissimilarity measure) with hierarchical clustering and apply them to automatically identify charging patterns and groups of charging stations associated with such patterns. A dataset collected in a large network of public charging stations is used to test both approaches. Using both methods, we derived charging patterns. The first, rule-based approach, performed well at deriving predefined patterns and the latter, hierarchical clustering, showed the capability of delivering unexpected charging patterns.

charging pattern, matrix, straka transportation research procedia 00, (10 more...)

2102.0926

Country:

Europe > Slovakia > Žilina > Žilina (0.04)
Europe > Netherlands > Gelderland > Arnhem (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Chaudhari, Shreyas, Nair, Harideep, Moura, José M. F., Shen, John Paul

Unsupervised Clustering of Time Series Signals using Neuromorphic Energy-Efficient Temporal Neural Networks

Unsupervised time series clustering is a challenging problem with diverse industrial applications such as anomaly detection, bio-wearables, etc. These applications typically involve small, low-power devices on the edge that collect and process real-time sensory signals. State-of-the-art time-series clustering methods perform some form of loss minimization that is extremely computationally intensive from the perspective of edge devices. In this work, we propose a neuromorphic approach to unsupervised time series clustering based on Temporal Neural Networks that is capable of ultra low-power, continuous online learning. We demonstrate its clustering performance on a subset of UCR Time Series Archive datasets. Our results show that the proposed approach either outperforms or performs similarly to most of the existing algorithms while being far more amenable for efficient hardware implementation. Our hardware assessment analysis shows that in 7 nm CMOS the proposed architecture, on average, consumes only about 0.005 mm^2 die area and 22 uW power and can process each signal with about 5 ns latency.

neural network, neuron, shapelet, (13 more...)

2102.092

Genre: Research Report > New Finding (0.54)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Education > Educational Setting (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.85)

Sosa, Juan, Betancourt, Brenda

A Latent Space Model for Multilayer Network Data

arXiv.org Machine LearningFeb-17-2021

In this work, we propose a Bayesian statistical model to simultaneously characterize two or more social networks defined over a common set of actors. The key feature of the model is a hierarchical prior distribution that allows us to represent the entire system jointly, achieving a compromise between dependent and independent networks. Among others things, such a specification easily allows us to visualize multilayer network data in a low-dimensional Euclidean space, generate a weighted network that reflects the consensus affinity between actors, establish a measure of correlation between networks, assess cognitive judgements that subjects form about the relationships among actors, and perform clustering tasks at different social instances. Our model's capabilities are illustrated using several real-world data sets, taking into account different types of actors, sizes, and relations.

actor, mnlpm, network data, (15 more...)

2102.0956

Country:

South America > Colombia (0.04)
North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Telecommunications > Networks (0.72)
Information Technology > Networks (0.72)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

arXiv.org Artificial IntelligenceFeb-16-2021

FIXME: Enhance Software Reliability with Hybrid Approaches in Cloud

Hwang, Jinho, Shwartz, Larisa, Wang, Qing, Batta, Raghav, Kumar, Harshit, Nidd, Michael

With the promise of reliability in cloud, more enterprises are migrating to cloud. The process of continuous integration/deployment (CICD) in cloud connects developers who need to deliver value faster and more transparently with site reliability engineers (SREs) who need to manage applications reliably. SREs feed back development issues to developers, and developers commit fixes and trigger CICD to redeploy. The release cycle is more continuous than ever, thus the code to production is faster and more automated. To provide this higher level agility, the cloud platforms become more complex in the face of flexibility with deeper layers of virtualization. However, reliability does not come for free with all these complexities. Software engineers and SREs need to deal with wider information spectrum from virtualized layers. Therefore, providing correlated information with true positive evidences is critical to identify the root cause of issues quickly in order to reduce mean time to recover (MTTR), performance metrics for SREs. Similarity, knowledge, or statistics driven approaches have been effective, but with increasing data volume and types, an individual approach is limited to correlate semantic relations of different data sources. In this paper, we introduce FIXME to enhance software reliability with hybrid diagnosis approaches for enterprises. Our evaluation results show using hybrid diagnosis approach is about 17% better in precision. The results are helpful for both practitioners and researchers to develop hybrid diagnosis in the highly dynamic cloud environment.

alert, correlation, information, (16 more...)

2102.09336

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > North Carolina (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services (0.66)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Cloud Computing (1.00)
(6 more...)