AITopics

doi: 10.1109/TPAMI.2022.3230414

2103.01511

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Gupta, Shivam, Ghalme, Ganesh, Krishnan, Narayanan C., Jain, Shweta

Efficient Algorithms For Fair Clustering with a New Fairness Notion

arXiv.org Artificial IntelligenceSep-3-2021

We revisit the problem of fair clustering, first introduced by Chierichetti et al., that requires each protected attribute to have approximately equal representation in every cluster; i.e., a balance property. Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objective and fairness. In this paper, we propose a new notion of fairness, which we call $tau$-fair fairness, that strictly generalizes the balance property and enables a fine-grained efficiency vs. fairness trade-off. Furthermore, we show that simple greedy round-robin based algorithms achieve this trade-off efficiently. Under a more general setting of multi-valued protected attributes, we rigorously analyze the theoretical properties of the our algorithms. Our experimental results suggest that the proposed solution outperforms all the state-of-the-art algorithms and works exceptionally well even for a large number of clusters.

algorithm, fairness, new fairness notion, (12 more...)

2109.00708

Country:

Asia > India (0.04)
North America > United States > California (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

#artificialintelligenceSep-1-2021, 09:15:07 GMT

Fuzzy Clustering Using HDBSCAN

Like most undergraduates right out of college with little to no first-hand experience working on industry ML projects and loads of ML/python certifications, I joined the Business Intelligence team at Samsung. There were 3 new hires in the team and there was only 1 Data Scientist (DS) position available, the other 2 were Data Engineering. With the 3 of us riding the ML wave, we all sought the Data Scientist position. During the first meeting with our manager, you can imagine the amount of malarkey all the candidates spat out to get the position. We were given a 3-week trial period during which each of us had a Data Engineering pipeline to build and perform an Exploratory Data Analysis on a given dataset.

data engineering, fuzzy clustering, hdbscan

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Hassan, Bryar A., Rashid, Tarik A.

Artificial Intelligence Algorithms for Natural Language Processing and the Semantic Web Ontology Learning

arXiv.org Artificial IntelligenceAug-31-2021

Evolutionary clustering algorithms have considered as the most popular and widely used evolutionary algorithms for minimising optimisation and practical problems in nearly all fields. In this thesis, a new evolutionary clustering algorithm star (ECA*) is proposed. Additionally, a number of experiments were conducted to evaluate ECA* against five state-of-the-art approaches. For this, 32 heterogeneous and multi-featured datasets were used to examine their performance using internal and external clustering measures, and to measure the sensitivity of their performance towards dataset features in the form of operational framework. The results indicate that ECA* overcomes its competitive techniques in terms of the ability to find the right clusters. Based on its superior performance, exploiting and adapting ECA* on the ontology learning had a vital possibility. In the process of deriving concept hierarchies from corpora, generating formal context may lead to a time-consuming process. Therefore, formal context size reduction results in removing uninterested and erroneous pairs, taking less time to extract the concept lattice and concept hierarchies accordingly. In this premise, this work aims to propose a framework to reduce the ambiguity of the formal context of the existing framework using an adaptive version of ECA*. In turn, an experiment was conducted by applying 385 sample corpora from Wikipedia on the two frameworks to examine the reduction of formal context size, which leads to yield concept lattice and concept hierarchy. The resulting lattice of formal context was evaluated to the original one using concept lattice-invariants. Accordingly, the homomorphic between the two lattices preserves the quality of resulting concept hierarchies by 89% in contrast to the basic ones, and the reduced concept lattice inherits the structural relation of the original one.

algorithm, bsa, dataset, (11 more...)

2108.13772

Country:

Asia > Middle East > Iraq > Kurdistan Region (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
(2 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

#artificialintelligenceAug-30-2021, 11:08:13 GMT

Machine Learning in R & Predictive Models

My course will be your complete guide to the theory and applications of supervised & unsupervised machine learning and predictive modeling using the R-programming language. Unlike other courses, it offers NOT ONLY the guided demonstrations of the R-scripts but also covers theoretical background that will allow you to FULLY UNDERSTAND & APPLY MACHINE LEARNING & PREDICTIVE MODELS (K-means, Random Forest, SVM, logistic regression, etc) in R (many R packages incl. This course also covers all the main aspects of practical and highly applied data science related to Machine Learning (classification & regressions) and unsupervised clustering techniques. Thus, if you take this course, you will save lots of time & money on other expensive materials in the R based Data Science and Machine Learning domain. In this age of big data, companies across the globe use R to analyze big volumes of data for business and research.

learning, machine learning, unsupervised machine learning, (4 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.76)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.58)

Zaman, S. M Mehedi, Qureshi, Wasay Mahmood, Raihan, Md. Mohsin Sarker, Monjur, Ocean, Shams, Abdullah Bin

Survival Prediction of Heart Failure Patients using Stacked Ensemble Machine Learning Algorithm

arXiv.org Machine LearningAug-30-2021

Cardiovascular disease, especially heart failure is one of the major health hazard issues of our time and is a leading cause of death worldwide. Advancement in data mining techniques using machine learning (ML) models is paving promising prediction approaches. Data mining is the process of converting massive volumes of raw data created by the healthcare institutions into meaningful information that can aid in making predictions and crucial decisions. Collecting various follow-up data from patients who have had heart failures, analyzing those data, and utilizing several ML models to predict the survival possibility of cardiovascular patients is the key aim of this study. Due to the imbalance of the classes in the dataset, Synthetic Minority Oversampling Technique (SMOTE) has been implemented. Two unsupervised models (K-Means and Fuzzy C-Means clustering) and three supervised classifiers (Random Forest, XGBoost and Decision Tree) have been used in our study. After thorough investigation, our results demonstrate a superior performance of the supervised ML algorithms over unsupervised models. Moreover, we designed and propose a supervised stacked ensemble learning model that can achieve an accuracy, precision, recall and F1 score of 99.98%. Our study shows that only certain attributes collected from the patients are imperative to successfully predict the surviving possibility post heart failure, using supervised ML algorithms.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

2108.13367

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceAug-29-2021, 09:10:25 GMT

K means clustering using scala spark

K means clustering is a method of vector quantization which is used to partition n observation into k cluster in which each observation belongs to the cluster with nearest means. For real estate firm you want to make a recommend engine. We want to recommend their customers suitable houses for this we are considering the following features. Clustering is an unsupervised learning method, in which we are trying to find the relation between n observations. In the above example we are trying to find the relation between the three feature and giving a recommendation to the customer.

algorithm, centroid, scala spark, (3 more...)

Industry: Banking & Finance > Real Estate (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Liu, Jinxin, Simsek, Murat, Kantarci, Burak, Erol-Kantarci, Melike, Malton, Andrew, Walenstein, Andrew

Risk-Aware Fine-Grained Access Control in Cyber-Physical Contexts

arXiv.org Artificial IntelligenceAug-28-2021

Access to resources by users may need to be granted only upon certain conditions and contexts, perhaps particularly in cyber-physical settings. Unfortunately, creating and modifying context-sensitive access control solutions in dynamic environments creates ongoing challenges to manage the authorization contexts. This paper proposes RASA, a context-sensitive access authorization approach and mechanism leveraging unsupervised machine learning to automatically infer risk-based authorization decision boundaries. We explore RASA in a healthcare usage environment, wherein cyber and physical conditions create context-specific risks for protecting private health information. The risk levels are associated with access control decisions recommended by a security policy. A coupling method is introduced to track coexistence of the objects within context using frequency and duration of coexistence, and these are clustered to reveal sets of actions with common risk levels; these are used to create authorization decision boundaries. In addition, we propose a method for assessing the risk level and labelling the clusters with respect to their corresponding risk levels. We evaluate the promise of RASA-generated policies against a heuristic rule-based policy. By employing three different coupling features (frequency-based, duration-based, and combined features), the decisions of the unsupervised method and that of the policy are more than 99% consistent.

artificial intelligence, machine learning, risk level, (12 more...)

doi: 10.1145/3480468

2108.12739

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
(12 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Commercial Services & Supplies > Security & Alarm Services (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Cho, Minsik, Vahid, Keivan A., Adya, Saurabh, Rastegari, Mohammad

DKM: Differentiable K-Means Clustering Layer for Neural Network Compression

arXiv.org Artificial IntelligenceAug-28-2021

Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DMK delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5% top-1 ImageNet1k accuracy on ResNet50 DNN model with 3.3MB model size (29.4x model compression factor). For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 62.8% top-1 ImageNet1k accuracy with 0.74 MB model size (22.4x model compression factor). This result is 6.8% higher top-1 accuracy and 33% relatively smaller model size than the current state-of-the-art DNN compression algorithms. Additionally, DKM enables compression of DistilBERT model by 11.8x with minimal (1.1%) accuracy loss on GLUE NLP benchmarks.

centroid, compression, dkm, (17 more...)

2108.12659

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

#artificialintelligenceAug-24-2021, 10:55:27 GMT

Important Clustering Algorithms in Machine Learning

Clustering is a Machine Learning method. It is an unsupervised machine learning task. In which, we draw references from datasets consisting of input data without labelled responses. With a clustering algorithm, we give the algorithm a lot of input data with no labels and let it find any groupings in the data it can. We can use a clustering algorithm to categorize each data point into a specific group.

important clustering algorithm, input data, machine learning

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)