Education
Support Vector Machine Classification with Indefinite Kernels
Luss, Ronny, d'Aspremont, Alexandre
We propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our algorithm simultaneously computes support vectors and a proxy kernel matrix used in forming the loss. This can be interpreted as a penalized kernel learning problem where indefinite kernel matrices are treated as a noisy observations of a true Mercer kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the projected gradient or analytic center cutting plane methods. We compare the performance of our technique with other methods on several classic data sets.
Streamed Learning: One-Pass SVMs
Rai, Piyush, Daumé, Hal III, Venkatasubramanian, Suresh
We present a streaming model for large-scale classification (in the context of $\ell_2$-SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The $\ell_2$-SVM is known to have an equivalent formulation in terms of the minimum enclosing ball (MEB) problem, and an efficient algorithm based on the idea of \emph{core sets} exists (Core Vector Machine, CVM). CVM learns a $(1+\varepsilon)$-approximate MEB for a set of points and yields an approximate solution to corresponding SVM instance. However CVM works in batch mode requiring multiple passes over the data. This paper presents a single-pass SVM which is based on the minimum enclosing ball of streaming data. We show that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector in a way similar to using online stochastic gradient updates. Our algorithm performs polylogarithmic computation at each example, and requires very small and constant storage. Experimental results show that, even in such restrictive settings, we can learn efficiently in just one pass and get accuracies comparable to other state-of-the-art SVM solvers (batch and online). We also give an analysis of the algorithm, and discuss some open issues and possible extensions.
A Unified Semi-Supervised Dimensionality Reduction Framework for Manifold Learning
Chatpatanasiri, Ratthachat, Kijsirikul, Boonserm
We present a general framework of semi-supervised dimensionality reduction for manifold learning which naturally generalizes existing supervised and unsupervised learning frameworks which apply the spectral decomposition. Algorithms derived under our framework are able to employ both labeled and unlabeled examples and are able to handle complex problems where data form separate clusters of manifolds. Our framework offers simple views, explains relationships among existing frameworks and provides further extensions which can improve existing algorithms. Furthermore, a new semi-supervised kernelization framework called ``KPCA trick'' is proposed to handle non-linear problems.
Automated Critique of Sketched Mechanisms
Wetzel, Jon William (Northwestern University) | Forbus, Ken (Northwestern University)
Designers often use a series of sketches to explain how their design goes through different states or modes to achieve its intended function. Learning how to create such explanations turns out to be a difficult problem for engineering students. An automated "crash test dummy" to let students practice explanations would be desirable. This paper describes how to carry out a core piece of the reasoning needed in such system. We show how an open-domain sketch understanding system can be used to enter many aspects of such explanations, and how qualitative mechanics can be used to check the plausibility of the intended state transitions. The system is evaluated using a corpus of sketches based on designs from an engineering school design and communications course.
Hashigo: A Next-Generation Sketch Interactive System for Japanese Kanji
Taele, Paul (Texas A&M University) | Hammond, Tracy (Texas A&M University)
Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor-level critique and feedback on both the visual structure and written technique of students’ sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long-term kanji learning.
Not So Naive Online Bayesian Spam Filter
Su, Baojun (Zhejiang University) | Xu, Congfu (Zhejiang University)
Spam filtering, as a key problem in electronic communication, has drawn significant attention due to increasingly huge amounts of junk email on the Internet. Content-based filtering is one reliable method in combating with spammers' changing tactics. Naive Bayes (NB) is one of the earliest content-based machine learning methods both in theory and practice in combating with spammers, which is easy to implement while can achieve considerable accuracy. In this paper, the traditional online Bayesian classifier are enhanced by two ways. First, from theory's point of view, we devise a self-adaptive mechanism to gradually weaken the assumption of independence required by original NB in the online training process, and as a result of that our NSNB is no longer ``naive''. Second, we propose other engineering ways to make the classifier more robust and accuracy. The experiment results show that our NSNB does give state-of-the-art classification performance on online spam filtering on large benchmark data sets while it is extremely fast and takes up little memory in comparison with other statistical methods.
Pedagogical Discourse: Connecting Students to Past Discussions and Peer Mentors within an Online Discussion Board
The goal of the Pedagogical Discourse project is to develop instructional tools that will help students and instructors use discussion boards more effectively, with an emphasis on automatically assessing discussion activities and building tools for promoting student discussion participation and learning. In this paper, we present a two related participation and learning scaffolding tools that exploit natural language processing and information retrieval techniques. The PedaBot tool is designed to aid student knowledge acquisition and promote reflection about course topics by connecting related discussions from a knowledge base of past discussions to the current discussion thread. The MentorMatch tool aims at promoting student participation using student mentors, i.e., course peers with a relatively good understanding of a particular topic. The system identifies students who often provide answers on a given topic and encourages classmates to invite mentors to participate in related discussions. Both tools have been integrated into a live discussion board that is used by an undergraduate computer science course. This paper describes our approaches to applying information retrieval and natural language processing techniques in the development of the tools and presents initial results from instrumentation and survey.
Evaluating User-Adaptive Systems: Lessons from Experiences with a Personalized Meeting Scheduling Assistant
Berry, Pauline M. (SRI International) | Donneau-Golencer, Thierry (SRI International) | Duong, Khang (SRI International) | Gervasio, Melinda (SRI International) | Peintner, Bart (SRI International) | Yorke-Smith, Neil (SRI International)
We discuss experiences from evaluating the learning performance of a user-adaptive personal assistant agent. We discuss the challenge of designing adequate evaluation and the tension of collecting adequate data without a fully functional, deployed system. Reflections on negative and positive experiences point to the challenges of evaluating user-adaptive AI systems. Lessons learned concern early consideration of evaluation and deployment, characteristics of AI technology and domains that make controlled evaluations appropriate or not, holistic experimental design, implications of "in the wild" evaluation, and the effect of AI-enabled functionality and its impact upon existing tools and work practices.
Archiving the Semantics of Digital Engineering Artifacts in CIBER-U
Regli, William C. (Drexel University) | Grauer, Michael (Drexel University) | Kopena, Joseph (Drexel University) | Wilkie, David (University of North Carolina) | Piecyk, Martin (Drexel University) | Osecki, Jordan (Drexel University)
This paper introduces the challenge of digital preservation in the area of engineering design and manufacturing and presents a methodology to apply knowledge representation and semantic techniques to develop Digital Engineering Archives. This work is part of an ongoing, multi-university, effort to create Cyber-Infrastructure-Based Engineering Repositories for Undergraduates (CIBER-U) to support engineering design education. The technical approach is to use knowledge representation techniques to create formal models of engineering data elements, workflows and processes. With these formal engineering knowledge and processes can be captured and preserved with some guarantee of long-term interpretability. The paper presents examples of how the techniques can be used to encode specific engineering information packages and workflows. These techniques are being integrated into a semantic Wiki that supports the CIBER-U engineering education activities across nine universities and involving over 3,500 students since 2006.
An AI Framework to Teach English as a Foreign Language: CSIEC
Jia, Jiyou (Peking University)
CSIEC (Computer Simulation in Educational Communication), is not only an intelligent web-based human-computer dialogue system with natural language for English instruction, but also a learning assessment system for learners and teachers. Its multiple functions--including grammar-based gap filling exercises, scenario show, free chatting and chatting on a given topic--can satisfy the various requirements for students with different backgrounds and learning abilities. We will summarize the free Internet usage within a six month period and its integration into English classes in universities and middle schools. The evaluation findings about the class integration show that the chatting function has been improved and frequently utilized by the users, and the application of the CSIEC system on English instruction can motivate the learners to practice English and enhance their learning process.