University of Vermont
Imbalanced Multiple Noisy Labeling for Supervised Learning
Zhang, Jing (Hefei University of Technology) | Wu, Xindong (University of Vermont) | Sheng, Victor Shengli (University of Central Arkansas)
When labeling objects via Internet-based outsourcing systems, the labelers may have bias, because they lack expertise, dedication and personal preference. These reasons cause Imbalanced Multiple Noisy Labeling. To deal with the imbalance labeling issue, we propose an agnostic algorithm PLAT (Positive LAbel frequency Threshold) which does not need any information about quality of labelers and underlying class distribution. Simulations on eight real-world datasets with different underlying class distributions demonstrate that PLAT not only effectively deals with the imbalanced multiple noisy labeling problem that off-the-shelf agnostic methods cannot cope with, but also performs nearly the same as majority voting under the circumstances that labelers have no bias.
Research about 3-Color, 2 Direction Mobile Automata
Manukyan, Narine (University of Vermont)
This paper studies 3-state, 2-direction Mobile Au- tomata. The results of this study show that although it is more difficult to find complexity in Mobile Automata than Cellular Automata, 3-color Mobile Automata can still be divided into four classes of complexity, thus pro- ducing complex behavior. There are 627 number of 3- color Mobile Automata, which were studied and filtered to prove the complexity of Mobile automata. The results of this study infer that it is possible to observe complex- ity in systems that contain only one active cell, if the system has more then two states.
Learning from Concept Drifting Data Streams with Unlabeled Data
Li, Peipei (Hefei University of Technology) | Wu, Xindong (University of Vermont) | Hu, Xuegang (Hefei University of Technology)
Contrary to the previous beliefs that all arrived streaming data are labeled and the class labels are immediately availa- ble, we propose a Semi-supervised classification algorithm for data streams with concept drifts and UNlabeled data, called SUN. SUN is based on an evolved decision tree. In terms of deviation between history concept clusters and new ones generated by a developed clustering algorithm of k-Modes, concept drifts are distinguished from noise at leaves. Extensive studies on both synthetic and real data demonstrate that SUN performs well compared to several known online algorithms on unlabeled data. A conclusion is hence drawn that a feasible reference framework is provided for tackling concept drifting data streams with unlabeled data.
A Phrase-Based Method for Hierarchical Clustering of Web Snippets
Li, Zhao (University of Vermont) | Wu, Xindong (University of Vermont)
Document clustering has been applied in web information retrieval, which facilitates users’ quick browsing by organizing retrieved results into different groups. Meanwhile, a tree-like hierarchical structure is wellsuited for organizing the retrieved results in favor of web users. In this regard, we introduce a new method for hierarchical clustering of web snippets by exploiting a phrase-based document index. In our method, a hierarchy of web snippets is built based on phrases instead of all snippets, and the snippets are then assigned to the corresponding clusters consisting of phrases. We show that, as opposed to the traditional hierarchical clustering, our method not only presents meaningful cluster labels but also improves clustering performance.