We examine various methods for data clustering and data classification that are based on the minimization of the so-called cluster function and its modications. These functions are nonsmooth and nonconvex. We use Discrete Gradient methods for their local minimization. We consider also a combination of this method with the cutting angle method for global minimization. We present and discuss results of numerical experiments.

Hu, Yi-Qi (Nanjing University) | Qian, Hong (Nanjing University) | Yu, Yang (Nanjing University)

Classification-based optimization is a recently developed framework for derivative-free optimization, which has shown to be effective for non-convex optimization problems with many local optima. This framework requires to sample a batch of solutions for every update of the search model. However, in reinforcement learning, direct policy search often offers only sequential policy evaluation. Thus, classificationbased optimization is not efficient for direct policy search where solutions have to be sampled sequentially. In this paper, we adapt the classification-based optimization for sequential sampled solutions by forming the batch of reused historical solutions. Experiments on helicopter hovering control task and reinforcement learning benchmark tasks in OpenAI Gym show that the new algorithm is superior to state-of-the-art derivative-free optimization approaches.

Esbroeck, Alex Van (University of Michigan) | Singh, Satinder (University of Michigan) | Rubinfeld, Ilan (Henry Ford Hospital) | Syed, Zeeshan (University of Michigan)

Missing values are a common problem when applying classification algorithms to real-world medical data. This is especially true for trauma patients, where the emergent nature of the cases makes it difficult to collect all of the relevant data for each patient. Standard methods for handling missingness first learn a model to estimate missing data values, and subsequently train and evaluate a classifier using data imputed with this model. Recently, several proposed methods have demonstrated the benefits of jointly estimating the imputation model and classifier parameters. However, these methods make assumptions that limit their utility with many real-world medical datasets. For example, the assumption that data elements are missing at random is often invalid. We address this situation by exploring a novel approach for jointly learning the imputation model and classifier. Unlike previous algorithms, our approach makes no assumptions about the missingness of the data, can be used with arbitrary probabilistic data models and classification loss functions, and can be used when both the training and testing data have missing values. We investigate the utility of this approach on the prediction of several patient outcomes in a large national registry of trauma patients, and find that it significantly outperforms standard sequential methods.

Shimada, Takuya, Bao, Han, Sato, Issei, Sugiyama, Masashi

In supervised classification, we need a vast amount of labeled training data to train our classifiers. However, it is often not easy to obtain labels due to high labeling costs [Chapelle et al., 2010], privacy concern [Warner, 1965], social bias [Nederhof, 1985], and difficulty to label data. For such reasons, there is a situation in real-world classification problems, where pairwise similarities (i.e., pairs of samples in the same class) and pairwise dissimilarities (i.e., pairs of samples in different classes) might be easier to collect than fully labeled data. For example, in the task of protein function prediction [Klein et al., 2002], the knowledge about similarities/dissimilarities can be obtained as additional supervision, which can be found by experimental means. To handle such pairwise information, similar-unlabeled (SU) classification [Bao et al., 2018] has been proposed, where the classification risk is estimated in an unbiased fashion from only similar pairs and unlabeled data. Although they assumed that only similar pairs and unlabeled data are available, we may also obtain dissimilar pairs in practice. In this case, a method which can handle all of similarities/dissimilarities and unlabeled data is desirable. Semi-supervised clustering [Wagstaff et al., 2001] is one of the methods that can handle both similar and dissimilar pairs, where must-link pairs (i.e., similar pairs) and cannot-link pairs (i.e., dissimilar pairs) are used to obtain meaningful clusters.

Liu, Yun (University of Texas at Arlington) | Guo, Yiming (Illinois Institute of Technology) | Wang, Hua (Colorado School of Mines) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)

Transductive semi-supervised learning can only predict labels for unlabeled data appearing in training data, and can not predict labels for testing data never appearing in training set. To handle this out-of-sample problem, many inductive methods make a constraint such that the predicted label matrix should be exactly equal to a linear model. In practice, this constraint might be too rigid to capture the manifold structure of data. In this paper, we relax this rigid constraint and propose to use an elastic constraint on the predicted label matrix such that the manifold structure can be better explored. Moreover, since unlabeled data are often very abundant in practice and usually there are some outliers, we use a non-squared loss instead of the traditional squared loss to learn a robust model. The derived problem, although is convex, has so many nonsmooth terms, which make it very challenging to solve. In the paper, we propose an efficient optimization algorithm to solve a more general problem, based on which we find the optimal solution to the derived problem.