Baniya, Arbind Agrahari
Enabling clustering algorithms to detect clusters of varying densities through scale-invariant data preprocessing
Aryal, Sunil, Wells, Jonathan R., Baniya, Arbind Agrahari, Santosh, KC
In this paper, we show that preprocessing data using a variant of rank transformation called 'Average Rank over an Ensemble of Sub-samples (ARES)' makes clustering algorithms robust to data representation and enable them to detect varying density clusters. Our empirical results, obtained using three most widely used clustering algorithms-namely KMeans, DBSCAN, and DP (Density Peak)-across a wide range of real-world datasets, show that clustering after ARES transformation produces better and more consistent results.
Requirements Engineering Framework for Human-centered Artificial Intelligence Software Systems
Ahmad, Khlood, Abdelrazek, Mohamed, Arora, Chetan, Baniya, Arbind Agrahari, Bano, Muneera, Grundy, John
AI-based software systems are rapidly becoming essential in many organizations [1]. However, the focus on the technical side of building artificial intelligence (AI)-based systems are most common, and many projects, more often than not, fail to address critical human aspects during the development phases [2, 3]. These include but are not limited to age, gender, ethnicity, socio-economic status, education, language, culture, emotions, personality, and many others [4]. Ignoring human-centered aspects in AI-based software tends to produce biased and non-inclusive outcomes [5]. Shneiderman [6] emphasizes the dangers of autonomy-first design in AI and the hidden biases that follow. Misrepresenting human aspects in requirements for model selection and data used in training AI algorithms can lead to discriminatory decision procedures even if the underlying computational processes were unbiased [7]. For example, a study by Carnegie Mellon revealed that women were far less likely to receive high-paying job ads from Google than men [8] due to the under-representation of people of color and women in high paying IT jobs. Studies on human-centered design aim to develop systems that put human needs and values at the center of software development and clearly understand the context of the software system's usage [2, 9].
Improved histogram-based anomaly detector with the extended principal component features
Aryal, Sunil, Baniya, Arbind Agrahari, Santosh, KC
In this era of big data, databases are growing rapidly in terms of the number of records. Fast automatic detection of anomalous records in these massive databases is a challenging task. Traditional distance based anomaly detectors are not applicable in these massive datasets. Recently, a simple but extremely fast anomaly detector using one-dimensional histograms has been introduced. The anomaly score of a data instance is computed as the product of the probability mass of histograms in each dimensions where it falls into. It is shown to produce competitive results compared to many state-of-the-art methods in many datasets. Because it assumes data features are independent of each other, it results in poor detection accuracy when there is correlation between features. To address this issue, we propose to increase the feature size by adding more features based on principal components. Our results show that using the original input features together with principal components improves the detection accuracy of histogram-based anomaly detector significantly without compromising much in terms of run-time.