Yin, Yue
Dynamic Learning and Productivity for Data Analysts: A Bayesian Hidden Markov Model Perspective
Yin, Yue
Data analysts are essential in organizations, transforming raw data into insights that drive decision-making and strategy. This study explores how analysts' productivity evolves on a collaborative platform, focusing on two key learning activities: writing queries and viewing peer queries. While traditional research often assumes static models, where performance improves steadily with cumulative learning, such models fail to capture the dynamic nature of real-world learning. To address this, we propose a Hidden Markov Model (HMM) that tracks how analysts transition between distinct learning states based on their participation in these activities. Using an industry dataset with 2,001 analysts and 79,797 queries, this study identifies three learning states: novice, intermediate, and advanced. Productivity increases as analysts advance to higher states, reflecting the cumulative benefits of learning. Writing queries benefits analysts across all states, with the largest gains observed for novices. Viewing peer queries supports novices but may hinder analysts in higher states due to cognitive overload or inefficiencies. Transitions between states are also uneven, with progression from intermediate to advanced being particularly challenging. This study advances understanding of into dynamic learning behavior of knowledge worker and offers practical implications for designing systems, optimizing training, enabling personalized learning, and fostering effective knowledge sharing.
Component Segmentation of Engineering Drawings Using Graph Convolutional Networks
Zhang, Wentai, Joseph, Joe, Yin, Yue, Xie, Liuyue, Furuhata, Tomotake, Yamakawa, Soji, Shimada, Kenji, Kara, Levent Burak
Such drawings encode the topological information, dimensions, and manufacturing requirements of a product in a unified and standard form, which can then be utilized in various engineering applications including content-based part indexing (Fonseca et al., 2005; Kasimov et al., 2015), cost estimation (Sajadfar and Ma, 2015), and process planning (Kulkarni et al., 2000). Although the underlying designs are commonly created in a vector format through digital design tools, a raster drawing is more frequently used by manufacturers due to the ease of information exchange and quality assurance. According to a survey of Japan's manufacturing industry (Mitsubishi UFJ Research & Consulting Co., 2019), 84% of the customers use 2D raster-based drawings such as PDF, paper, or fax format when placing an order for manufacturing, which results in a major impediment in the automation of the aforementioned applications due to the need for human involvement in interpreting these drawings. For a modern online platform of part manufacturing, clients often upload their designs in raster image format for better quality assurance and IP protection since the information in image drawings is noneditable. Unlike a vector format, which enables trivial digital access to all stored information through a script file, raster drawings usually require manual inspection by technicians to extract the information required for quotation and manufacturing. The inspection process includes the identification of the part shape, dimensions, and manufacturing requirements. Here, we focus on the problem of semantic segmentation of the components in raster drawings. Common mechanical engineering components consist of straight lines, arcs, and circles. Our goal is to develop an automated data-driven framework that learns to distinguish between contour shapes, dimension sets, and text at the component level.
Explainable Recommendation via Multi-Task Learning in Opinionated Text Data
Wang, Nan, Wang, Hongning, Jia, Yiling, Yin, Yue
Explaining automatically generated recommendations allows users to make more informed and accurate decisions about which results to utilize, and therefore improves their satisfaction. In this work, we develop a multi-task learning solution for explainable recommendation. Two companion learning tasks of user preference modeling for recommendation} and \textit{opinionated content modeling for explanation are integrated via a joint tensor factorization. As a result, the algorithm predicts not only a user's preference over a list of items, i.e., recommendation, but also how the user would appreciate a particular item at the feature level, i.e., opinionated textual explanation. Extensive experiments on two large collections of Amazon and Yelp reviews confirmed the effectiveness of our solution in both recommendation and explanation tasks, compared with several existing recommendation algorithms. And our extensive user study clearly demonstrates the practical value of the explainable recommendations generated by our algorithm.
Computing Optimal Monitoring Strategy for Detecting Terrorist Plots
Wang, Zhen (Nanyang Technological University) | Yin, Yue (University of Chinese Academy of Sciences) | An, Bo (Nanyang Technological University)
In recent years, terrorist organizations (e.g., ISIS or al-Qaeda) are increasingly directing terrorists to launch coordinated attacks in their home countries. One example is the Paris shootings on January 7, 2015.By monitoring potential terrorists, security agencies are able to detect and stop terrorist plots at their planning stage.Although security agencies may have knowledge about potential terrorists (e.g., who they are, how they interact), they usually have limited resources and cannot monitor all terrorists.Moreover, a terrorist planner may strategically choose to arouse terrorists considering the security agency's monitoring strategy. This paper makes five key contributions toward the challenging problem of computing optimal monitoring strategies: 1) A new Stackelberg game model for terrorist plot detection;2) A modified double oracle framework for computing the optimal strategy effectively;3) Complexity results for both defender and attacker oracle problems;4) Novel mixed-integer linear programming (MILP) formulations for best response problems of both players;and 5) Effective approximation algorithms for generating suboptimal responses for both players.Experimental evaluation shows that our approach can obtain a robust enough solution outperforming widely-used centrality based heuristics significantly and scale up to realistic-sized problems.
Game-Theoretic Resource Allocation for Protecting Large Public Events
Yin, Yue (University of Chinese Academy of Sciences) | An, Bo (Nanyang Technological University) | Jain, Manish (Virginia Tech)
High profile large scale public events are attractive targets for terrorist attacks. The recent Boston Marathon bombings on April 15, 2013 have further emphasized the importance of protecting public events. The security challenge is exacerbated by the dynamic nature of such events: e.g., the impact of an attack at different locations changes over time as the Boston marathon participants and spectators move along the race track. In addition, the defender can relocate security resources among potential attack targets at any time and the attacker may act at any time during the event. This paper focuses on developing efficient patrolling algorithms for such dynamic domains with continuous strategy spaces for both the defender and the attacker. We aim at computing optimal pure defender strategies, since an attacker does not have an opportunity to learn and respond to mixed strategies due to the relative infrequency of such events. We propose SCOUT-A, which makes assumptions on relocation cost, exploits payoff representation and computes optimal solutions efficiently. We also propose SCOUT-C to compute the exact optimal defender strategy for general cases despite the continuous strategy spaces. SCOUT-C computes the optimal defender strategy by constructing an equivalent game with discrete defender strategy space, then solving the constructed game. Experimental results show that both SCOUT-A and SCOUT-C significantly outperform other existing strategies.