Goto

Collaborating Authors

 Rajasegarar, Sutharshan


Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data

arXiv.org Machine Learning

Subspace clustering aims to find groups of similar objects (clusters) that exist in lower dimensional subspaces from a high dimensional dataset. It has a wide range of applications, such as analysing high dimensional sensor data or DNA sequences. However, existing algorithms have limitations in finding clusters in non-disjoint subspaces and scaling to large data, which impinge their applicability in areas such as bioinformatics and the Internet of Things. We aim to address such limitations by proposing a subspace clustering algorithm using a bottom-up strategy. Our algorithm first searches for base clusters in low dimensional subspaces. It then forms clusters in higher-dimensional subspaces using these base clusters, which we formulate as a frequent pattern mining problem. This formulation enables efficient search for clusters in higher-dimensional subspaces, which is done using FP-trees. The proposed algorithm is evaluated against traditional bottom-up clustering algorithms and state-of-the-art subspace clustering algorithms. The experimental results show that the proposed algorithm produces clusters with high accuracy, and scales well to large volumes of data. We also demonstrate the algorithm's performance using real-life data, including ten genomic datasets and a car parking occupancy dataset.


A Scalable Framework for Trajectory Prediction

arXiv.org Artificial Intelligence

Trajectory prediction (TP) is of great importance for a wide range of location-based applications in intelligent transport systems such as location-based advertising, route planning, traffic management, and early warning systems. In the last few years, the widespread use of GPS navigation systems and wireless communication technology enabled vehicles has resulted in huge volumes of trajectory data. The task of utilizing this data employing spatio-temporal techniques for trajectory prediction in an efficient and accurate manner is an ongoing research problem. Existing TP approaches are limited to short-term predictions. Moreover, they cannot handle a large volume of trajectory data for long-term prediction. To address these limitations, we propose a scalable clustering and Markov chain based hybrid framework, called Traj-clusiVAT-based TP, for both short-term and long-term trajectory prediction, which can handle a large number of overlapping trajectories in a dense road network. In addition, Traj-clusiVAT can also determine the number of clusters, which represent different movement behaviours in input trajectory data. In our experiments, we compare our proposed approach with a mixed Markov model (MMM)-based scheme, and a trajectory clustering, NETSCAN-based TP method for both short- and long-term trajectory predictions. We performed our experiments on two real, vehicle trajectory datasets, including a large-scale trajectory dataset consisting of 3.28 million trajectories obtained from 15,061 taxis in Singapore over a period of one month. Experimental results on two real trajectory datasets show that our proposed approach outperforms the existing approaches in terms of both short- and long-term prediction performances, based on prediction accuracy and distance error (in km).


R1SVM: A Randomised Nonlinear Approach to Large-Scale Anomaly Detection

AAAI Conferences

The problem of unsupervised anomaly detection arises in awide variety of practical applications. While one-class sup-port vector machines have demonstrated their effectiveness asan anomaly detection technique, their ability to model largedatasets is limited due to their memory and time complexityfor training. To address this issue for supervised learning ofkernel machines, there has been growing interest in randomprojection methods as an alternative to the computationallyexpensive problems of kernel matrix construction and sup-port vector optimisation. In this paper we leverage the theoryof nonlinear random projections and propose the RandomisedOne-class SVM (R1SVM), which is an efficient and scalableanomaly detection technique that can be trained on large-scale datasets. Our empirical analysis on several real-life andsynthetic datasets shows that our randomised 1SVM algo-rithm achieves comparable or better accuracy to deep autoen-coder and traditional kernelised approaches for anomaly de-tection, while being approximately 100 times faster in train-ing and testing