AITopics

2006.11118

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(11 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Lartigue, Thomas, Durrleman, Stanley, Allassonnière, Stéphanie

Mixture of Conditional Gaussian Graphical Models for unlabelled heterogeneous populations in the presence of co-factors

arXiv.org Machine LearningJun-19-2020

Conditional correlation networks, within Gaussian Graphical Models (GGM), are widely used to describe the direct interactions between the components of a random vector. In the case of an unlabelled Heterogeneous population, Expectation Maximisation (EM) algorithms for Mixtures of GGM have been proposed to estimate both each sub-population's graph and the class labels. However, we argue that, with most real data, class affiliation cannot be described with a Mixture of Gaussian, which mostly groups data points according to their geometrical proximity. In particular, there often exists external co-features whose values affect the features' average value, scattering across the feature space data points belonging to the same sub-population. Additionally, if the co-features' effect on the features is Heterogeneous, then the estimation of this effect cannot be separated from the sub-population identification. In this article, we propose a Mixture of Conditional GGM (CGGM) that subtracts the heterogeneous effects of the co-features to regroup the data points into sub-population corresponding clusters. We develop a penalised EM algorithm to estimate graph-sparse model parameters. We demonstrate on synthetic and real data how this method fulfils its goal and succeeds in identifying the sub-populations where the Mixtures of GGM are disrupted by the effect of the co-features.

algorithm, artificial intelligence, machine learning, (17 more...)

2006.11094

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Esmaeili, Seyed A., Brubach, Brian, Tsepenekas, Leonidas, Dickerson, John P.

Probabilistic Fair Clustering

arXiv.org Artificial IntelligenceJun-18-2020

In clustering problems, a central decision-maker is given a complete metric graph over vertices and must provide a clustering of vertices that minimizes some objective function. In fair clustering problems, vertices are endowed with a color (e.g., membership in a group), and the features of a valid clustering might also include the representation of colors in that clustering. Prior work in fair clustering assumes complete knowledge of group membership. In this paper, we generalize prior work by assuming imperfect knowledge of group membership through probabilistic assignments. We present clustering algorithms in this more general setting with approximation ratio guarantees. We also address the problem of "metric membership", where different groups have a notion of order and distance. Experiments are conducted using our proposed algorithms as well as baselines to validate our approach and also surface nuanced concerns when group membership is not known deterministically.

artificial intelligence, data mining, machine learning, (19 more...)

2006.10916

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Ghanem, Sally, Panahi, Ashkan, Krim, Hamid, Kerekes, Ryan A.

Robust Group Subspace Recovery: A New Approach for Multi-Modality Data Fusion

arXiv.org Machine LearningJun-18-2020

Robust Subspace Recovery (RoSuRe) algorithm was recently introduced as a principled and numerically efficient algorithm that unfolds underlying Unions of Subspaces (UoS) structure, present in the data. The union of Subspaces (UoS) is capable of identifying more complex trends in data sets than simple linear models. We build on and extend RoSuRe to prospect the structure of different data modalities individually. We propose a novel multi-modal data fusion approach based on group sparsity which we refer to as Robust Group Subspace Recovery (RoGSuRe). Relying on a bi-sparsity pursuit paradigm and non-smooth optimization techniques, the introduced framework learns a new joint representation of the time series from different data modalities, respecting an underlying UoS model. We subsequently integrate the obtained structures to form a unified subspace structure. The proposed approach exploits the structural dependencies between the different modalities data to cluster the associated target objects. The resulting fusion of the unlabeled sensors' data from experiments on audio and magnetic data has shown that our method is competitive with other state of the art subspace clustering methods. The resulting UoS structure is employed to classify newly observed data points, highlighting the abstraction capacity of the proposed method.

matrix, modality, subspace, (15 more...)

doi: 10.1109/JSEN.2020.2999461

2006.10657

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
(8 more...)

Genre:

Research Report (1.00)
Overview (0.68)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Energy (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

arXiv.org Machine LearningJun-18-2020

Fair Hierarchical Clustering

Ahmadian, Sara, Epasto, Alessandro, Knittel, Marina, Kumar, Ravi, Mahdian, Mohammad, Moseley, Benjamin, Pham, Philip, Vassilvitskii, Sergei, Wang, Yuyan

As machine learning has become more prevalent, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering. In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a specific objective. For various natural objectives, we obtain simple, efficient algorithms to find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.

algorithm, artificial intelligence, machine learning, (15 more...)

2006.10221

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningJun-17-2020

Robust Unsupervised Learning of Temporal Dynamic Interactions

Guha, Aritra, Lei, Rayleigh, Zhu, Jiacheng, Nguyen, XuanLong, Zhao, Ding

Robust representation learning of temporal dynamic interactions is an important problem in robotic learning in general and automated unsupervised learning in particular. Temporal dynamic interactions can be described by (multiple) geometric trajectories in a suitable space over which unsupervised learning techniques may be applied to extract useful features from raw and high-dimensional data measurements. Taking a geometric approach to robust representation learning for temporal dynamic interactions, it is necessary to develop suitable metrics and a systematic methodology for comparison and for assessing the stability of an unsupervised learning method with respect to its tuning parameters. Such metrics must account for the (geometric) constraints in the physical world as well as the uncertainty associated with the learned patterns. In this paper we introduce a model-free metric based on the Procrustes distance for robust representation learning of interactions, and an optimal transport based distance metric for comparing between distributions of interaction primitives. These distance metrics can serve as an objective for assessing the stability of an interaction learning algorithm. They are also used for comparing the outcomes produced by different algorithms. Moreover, they may also be adopted as an objective function to obtain clusters and representative interaction primitives. These concepts and techniques will be introduced, along with mathematical properties, while their usefulness will be demonstrated in unsupervised learning of vehicle-to-vechicle interactions extracted from the Safety Pilot database, the world's largest database for connected vehicles.

change point, interaction, trajectory, (16 more...)

2006.10241

Country:

North America > United States > New Jersey > Hudson County > Hoboken (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Ghadiri, Mehrdad, Samadi, Samira, Vempala, Santosh

Fair k-Means Clustering

arXiv.org Artificial IntelligenceJun-17-2020

We show that the popular $k$-means clustering algorithm (Lloyd's heuristic), used for a variety of scientific data, can result in outcomes that are unfavorable to subgroups of data (e.g., demographic groups). Such biased clusterings can have deleterious implications for human-centric applications such as resource allocation. We present a fair $k$-means objective and algorithm to choose cluster centers that provide equitable costs for different groups. The algorithm, Fair-Lloyd, is a modification of Lloyd's heuristic for $k$-means, inheriting its simplicity, efficiency, and stability. In comparison with standard Lloyd's, we find that on benchmark data sets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have balanced costs in the output $k$-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever $k$-means is currently used.

algorithm, artificial intelligence, machine learning, (17 more...)

2006.10085

Country: North America > United States > California (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningJun-15-2020

Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction

Yu, Yaodong, Chan, Kwan Ho Ryan, You, Chong, Song, Chaobing, Ma, Yi

To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features. The coding rate can be accurately computed from finite samples of degenerate subspace-like distributions and can learn intrinsic representations in supervised, self-supervised, and unsupervised settings in a unified manner. Empirically, the representations learned using this principle alone are significantly more robust to label corruptions in classification than those using cross-entropy, and can lead to state-of-the-art results in clustering mixed data from self-learned invariant features.

artificial intelligence, deep learning, machine learning, (12 more...)

2006.08558

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceJun-15-2020

Explainable AI for a No-Teardown Vehicle Component Cost Estimation: A Top-Down Approach

Moawad, Ayman, Islam, Ehsan, Kim, Namdoo, Vijayagopal, Ram, Rousseau, Aymeric, Wu, Wei Biao

The broader ambition of this article is to popularize an approach for the fair distribution of the quantity of a system's output to its subsystems, while allowing for underlying complex subsystem level interactions. Particularly, we present a data-driven approach to vehicle price modeling and its component price estimation by leveraging a combination of concepts from machine learning and game theory. We show an alternative to common teardown methodologies and surveying approaches for component and vehicle price estimation at the manufacturer's suggested retail price (MSRP) level that has the advantage of bypassing the uncertainties involved in 1) the gathering of teardown data, 2) the need to perform expensive and biased surveying, and 3) the need to perform retail price equivalent (RPE) or indirect cost multiplier (ICM) adjustments to mark up direct manufacturing costs to MSRP. This novel exercise not only provides accurate pricing of the technologies at the customer level, but also shows the, a priori known, large gaps in pricing strategies between manufacturers, vehicle sizes, classes, market segments, and other factors. There is also clear synergism or interaction between the price of certain technologies and other specifications present in the same vehicle. Those (unsurprising) results are indication that old methods of manufacturer-level component costing, aggregation, and the application of a flat and rigid RPE or ICM adjustment factor should be carefully examined. The findings are based on an extensive database, developed by Argonne National Laboratory, that includes more than 64,000 vehicles covering MY1990 to MY2020 over hundreds of vehicle specs.

artificial intelligence, data mining, machine learning, (20 more...)

2006.08828

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > France (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Okulewicz, Michał, Mańdziuk, Jacek

Dynamic Vehicle Routing Problem: A Monte Carlo approach

arXiv.org Artificial IntelligenceJun-15-2020

In this work we solve the Dynamic Vehicle Routing Problem (DVRP). DVRP is a modification of the Vehicle Routing Problem, in which the clients' requests (cities) number and location might not be known at the beginning of the working day Additionally, all requests must be served during one working day by a fleet of vehicles with limited capacity. In this work we propose a Monte Carlo method (MCTree), which directly approaches the dynamic nature of arriving requests in the DVRP. The method is also hybridized (MCTree+PSO) with our previous Two-Phase Multi-swarm Particle Swarm Optimization (2MPSO) algorithm. Our method is based on two assumptions. First, that we know a bounding rectangle of the area in which the requests might appear. Second, that the initial requests' sizes and frequency of appearance are representative for the yet unknown clients' requests. In order to solve the DVRP we divide the working day into several time slices in which we solve a static problem. In our Monte Carlo approach we randomly generate the unknown clients' requests with uniform spatial distribution over the bounding rectangle and requests' sizes uniformly sampled from the already known requests' sizes. The solution proposal is constructed with the application of a clustering algorithm and a route construction algorithm. The MCTree method is tested on a well established set of benchmarks proposed by Kilby et al. and is compared with the results achieved by applying our previous 2MPSO algorithm and other literature results. The proposed MCTree approach achieves a better time to quality trade-off then plain heuristic algorithms. Moreover, a hybrid MCTree+PSO approach achieves better time to quality trade-off then 2MPSO for small optimization time limits, making the hybrid a good candidate for handling real world scale goods delivery problems.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2006.09996

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Industry: Transportation > Freight & Logistics Services (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)