AITopics

doi: 10.1007/s10618-018-0564-z

2010.0543

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Connecticut > Tolland County > Storrs (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(8 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Consumer Health (1.00)
Education > Educational Setting (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
(2 more...)

P, Deepak, Abraham, Savitha Sam

Representativity Fairness in Clustering

arXiv.org Artificial IntelligenceOct-11-2020

Incorporating fairness constructs into machine learning algorithms is a topic of much societal importance and recent interest. Clustering, a fundamental task in unsupervised learning that manifests across a number of web data scenarios, has also been subject of attention within fair ML research. In this paper, we develop a novel notion of fairness in clustering, called representativity fairness. Representativity fairness is motivated by the need to alleviate disparity across objects' proximity to their assigned cluster representatives, to aid fairer decision making. We illustrate the importance of representativity fairness in real-world decision making scenarios involving clustering and provide ways of quantifying objects' representativity and fairness over it. We develop a new clustering formulation, RFKM, that targets to optimize for representativity fairness along with clustering quality. Inspired by the $K$-Means framework, RFKM incorporates novel loss terms to formulate an objective function. The RFKM objective and optimization approach guides it towards clustering configurations that yield higher representativity fairness. Through an empirical evaluation over a variety of public datasets, we establish the effectiveness of our method. We illustrate that we are able to significantly improve representativity fairness at only marginal impact to clustering quality.

artificial intelligence, fairness, machine learning, (17 more...)

doi: 10.1145/3394231.3397910

2010.07054

Country:

Asia > India (0.14)
Europe > United Kingdom > England > Hampshire > Southampton (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Perrot, Michaël, Esser, Pascal Mattia, Ghoshdastidar, Debarghya

Near-Optimal Comparison Based Clustering

arXiv.org Machine LearningOct-9-2020

The goal of clustering is to group similar objects into meaningful partitions. This process is well understood when an explicit similarity measure between the objects is given. However, far less is known when this information is not readily available and, instead, one only observes ordinal comparisons such as "object i is more similar to j than to k." In this paper, we tackle this problem using a two-step procedure: we estimate a pairwise similarity matrix from the comparisons before using a clustering method based on semi-definite programming (SDP). We theoretically show that our approach can exactly recover a planted clustering using a near-optimal number of passive comparisons. We empirically validate our theoretical findings and demonstrate the good behaviour of our method on real data.

artificial intelligence, machine learning, similarity, (17 more...)

2010.03918

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Germany > Baden-Württemberg (0.04)
Europe > France (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Esmaeili, Ahmad, Gallagher, John C., Springer, John A., Matson, Eric T.

HAMLET: A Hierarchical Agent-based Machine Learning Platform

arXiv.org Artificial IntelligenceOct-9-2020

Hierarchical Multi-Agent Systems provide a convenient and relevant way to analyze, model, and simulate complex systems in which a large number of entities are interacting at different levels of abstraction. In this paper, we introduce HAMLET (Hierarchical Agent-based Machine LEarning plaTform), a platform based on hierarchical multi-agent systems, to facilitate the research and democratization of machine learning entities distributed geographically or locally. This is carried out by firstly modeling the machine learning solutions as a hypergraph and then autonomously setting up a multi-level structure composed of heterogeneous agents based on their innate capabilities and learned skills. HAMLET aids the design and management of machine learning systems and provides analytical capabilities for the research communities to assess the existing and/or new algorithms/datasets through flexible and customizable queries. The proposed platform does not assume restrictions on the type of machine learning algorithms/datasets and is theoretically proven to be sound and complete with polynomial computational requirements. Additionally, it is examined empirically on 120 training and four generalized batch testing tasks performed on 24 machine learning algorithms and 9 standard datasets. The experimental results provided not only establish confidence in the platform's consistency and correctness but also demonstrates its testing and analytical capacity.

artificial intelligence, holon, machine learning, (18 more...)

2010.04894

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > United States > Ohio > Hamilton County > Cincinnati (0.04)
(3 more...)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Zantedeschi, Valentina, Kusner, Matt J., Niculae, Vlad

Learning Binary Trees via Sparse Relaxation

arXiv.org Artificial IntelligenceOct-9-2020

One of the most classical problems in machine learning is how to learn binary trees that split data into useful partitions. From classification/regression via decision trees to hierarchical clustering, binary trees are useful because they (a) are often easy to visualize; (b) make computationally-efficient predictions; and (c) allow for flexible partitioning. Because of this there has been extensive research on how to learn such trees that generally fall into one of three categories: 1. greedy node-by-node optimization; 2. probabilistic relaxations for differentiability; 3. mixed-integer programs (MIP). Each of these have downsides: greedy can myopically choose poor splits, probabilistic relaxations do not have principled ways to prune trees, MIP methods can be slow on large problems and may not generalize. In this work we derive a novel sparse relaxation for binary tree learning. By deriving a new MIP and sparsely relaxing it, our approach is able to learn tree splits and tree pruning using argmin differentiation. We demonstrate how our approach is easily visualizable and is competitive with current tree-based approaches in classification/regression and hierarchical clustering. Source code is available at http://github.com/vzantedeschi/LatentTrees .

artificial intelligence, machine learning, optimization problem, (16 more...)

2010.04627

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Portugal (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Haeffele, Benjamin D., You, Chong, Vidal, René

A Critique of Self-Expressive Deep Subspace Clustering

arXiv.org Artificial IntelligenceOct-7-2020

Subspace clustering is an unsupervised clustering technique designed to cluster data that is supported on a union of linear subspaces, with each subspace defining a cluster with dimension lower than the ambient space. Many existing formulations for this problem are based on exploiting the self-expressive property of linear subspaces, where any point within a subspace can be represented as linear combination of other points within the subspace. To extend this approach to data supported on a union of non-linear manifolds, numerous studies have proposed learning an appropriate kernel embedding of the original data using a neural network, which is regularized by a self-expressive loss function on the data in the embedded space to encourage a union of linear subspaces prior on the data in the embedded space. Here we show that there are a number of potential flaws with this approach which have not been adequately addressed in prior work. In particular, we show the model formulation is often ill-posed in multiple ways, which can lead to a degenerate embedding of the data, which need not correspond to a union of subspaces at all. We validate our theoretical results experimentally and additionally repeat prior experiments reported in the literature, where we conclude that a significant portion of the previously claimed performance benefits can be attributed to an ad-hoc post processing step rather than the clustering model.

artificial intelligence, machine learning, subspace, (18 more...)

2010.03697

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Maryland > Baltimore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Zhao, Jieyu, Chang, Kai-Wei

LOGAN: Local Group Bias Detection by Clustering

arXiv.org Artificial IntelligenceOct-6-2020

Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.

artificial intelligence, machine learning, natural language, (17 more...)

2010.02867

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Africa > Eswatini > Manzini > Manzini (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Machine LearningOct-4-2020

Ensemble Machine Learning Methods for Modeling COVID19 Deaths

Bathwal, R., Chitta, P., Tirumala, K., Varadarajan, V.

Using a hybrid of machine learning and epidemiological approaches, we propose a novel data-driven approach in predicting US COVID-19 deaths at a county level. The model gives a more complete description of the daily death distribution, outputting quantile-estimates instead of mean deaths, where the model's objective is to minimize the pinball loss on deaths reported by the New York Times coronavirus county dataset. The resulting quantile estimates accurately forecast deaths at an individual-county level for a variable-length forecast period, and the approach generalizes well across different forecast period lengths. We won the Caltech-run modeling competition out of 50+ teams, and our aggregate is competitive with the best COVID-19 modeling systems (on root mean squared error).

artificial intelligence, machine learning, prediction, (16 more...)

2010.04052

Country:

North America > United States > California (0.05)
Asia > China (0.04)
North America > United States > New York > New York County (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Epidemiology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceOct-4-2020

"Drunk Man" Saves Our Lives: Route Planning by a Biased Random Walk Mode

Hu, Xinyi, Miao, Quchen, Zhao, Zexuan

Based on the hurricane struking Puerto Rico in 2017, we developed a transportable disaster response system "DroneGo" featuring a drone fleet capable of delivering medical package and videoing roads. Assuming equal weight for both mission, we take the capability of carrying out the former missions as a constraint and a starting point from which reconnaissance routes are built. The feasibility of fitting packages into cargo bay 1 or 2 is tested by genetic algorithm. In scenario where drones carry packages to and unloaded back, from specification of drones and loading weight can we derive the maximum reachable distance of each drone loaded. A k-means clustering algorithm is used for partitioning destinations and deriving centroids as locations of bases.

artificial intelligence, drone, machine learning, (18 more...)

2010.03365

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > California > Orange County > Irvine (0.14)
North America > Puerto Rico > Arecibo > Arecibo (0.05)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.68)
Transportation (0.47)
Health & Medicine > Health Care Providers & Services (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)

arXiv.org Machine LearningOct-3-2020

EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering

Jiao, Lianmeng, Denoeux, Thierry, Liu, Zhun-ga, Pan, Quan

The Gaussian mixture model (GMM) provides a convenient yet principled framework for clustering, with properties suitable for statistical inference. In this paper, we propose a new model-based clustering algorithm, called EGMM (evidential GMM), in the theoretical framework of belief functions to better characterize cluster-membership uncertainty. With a mass function representing the cluster membership of each object, the evidential Gaussian mixture distribution composed of the components over the powerset of the desired clusters is proposed to model the entire dataset. The parameters in EGMM are estimated by a specially designed Expectation-Maximization (EM) algorithm. A validity index allowing automatic determination of the proper number of clusters is also provided. The proposed EGMM is as convenient as the classical GMM, but can generate a more informative evidential partition for the considered dataset. Experiments with synthetic and real datasets demonstrate the good performance of the proposed method as compared with some other prototype-based and model-based clustering techniques.

algorithm, artificial intelligence, machine learning, (14 more...)

2010.01333

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > Germany (0.04)
(9 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)