Academia Sinica
On Organizing Online Soirees with Live Multi-Streaming
Shen, Chih-Ya (National Tsing Hua University) | Fotsing, C. P. Kankeu (Academia Sinica) | Yang, De-Nian (Academia Sinica) | Chen, Yi-Shin (National Tsing Hua University) | Lee, Wang-Chien (The Pennsylvania State University)
The popularity of live streaming has led to the explosive growth in new video contents and social communities on emerging platforms such as Facebook Live and Twitch. Viewers on these platforms are able to follow multiple streams of live events simultaneously, while engaging discussions with friends. However, existing approaches for selecting live streaming channels still focus on satisfying individual preferences of users, without considering the need to accommodate real-time social interactions among viewers and to diversify the content of streams. In this paper, therefore, we formulate a new Social-aware Diverse and Preferred Live Streaming Channel Query (SDSQ) that jointly selects a set of diverse and preferred live streaming channels and a group of socially tight viewers. We prove that SDSQ is NP-hard and inapproximable within any factor, and design SDSSel, a 2-approximation algorithm with a guaranteed error bound. We perform a user study on Twitch with 432 participants to validate the need of SDSQ and the usefulness of SDSSel. We also conduct large-scale experiments on real datasets to demonstrate the superiority of the proposed algorithm over several baselines in terms of solution quality and efficiency.
Learning Adaptive Hidden Layers for Mobile Gesture Recognition
Hu, Ting-Kuei (Academia Sinica) | Lin, Yen-Yu (Academia Sinica) | Hsiu, Pi-Cheng (Academia Sinica)
This paper addresses two obstacles hindering advances in accurate gesture recognition on mobile devices. First, gesture recognition performance is highly dependent on feature selection, but optimal features typically vary from gesture to gesture. Second, diverse user behaviors and mobile environments result in extremely large intra-class variations. We tackle these issues by introducing a new network layer, called an adaptive hidden layer (AHL), to generalize a hidden layer in deep neural networks and dynamically generate an activation map conditioned on the input. To this end, an AHL is composed of multiple neuron groups and an extra selector. The former compiles multi-modal features captured by mobile sensors, while the latter adaptively picks a plausible group for each input sample. The AHL is end-to-end trainable and can generalize an arbitrary subset of hidden layers. Through a series of AHLs, the great expressive power from exponentially many forward paths allows us to choose proper multi-modal features in a sample-specific fashion and resolve the problems caused by the unfavorable variations in mobile gesture recognition. The proposed approach is evaluated on a benchmark for gesture recognition and a newly collected dataset. Superior performance demonstrates its effectiveness.
Generating Music Medleys via Playing Music Puzzle Games
Huang, Yu-Siang (Academia Sinica) | Chou, Szu-Yu (Academia Sinica) | Yang, Yi-Hsuan (Academia Sinica)
Generating music medleys is about finding an optimal permutation of a given set of music clips. Toward this goal, we propose a self-supervised learning task, called the music puzzle game, to train neural network models to learn the sequential patterns in music. In essence, such a game requires machines to correctly sort a few multisecond music fragments. In the training stage, we learn the model by sampling multiple non-overlapping fragment pairs from the same songs and seeking to predict whether a given pair is consecutive and is in the correct chronological order. For testing, we design a number of puzzle games with different difficulty levels, the most difficult one being music medley, which requiring sorting fragments from different songs. On the basis of state-of-the-art Siamese convolutional network, we propose an improved architecture that learns to embed frame-level similarity scores computed from the input fragment pairs to a common space, where fragment pairs in the correct order can be more easily identified. Our result shows that the resulting model, dubbed as the similarity embedding network (SEN), performs better than competing models across different games, including music jigsaw puzzle, music sequencing, and music medley. Example results can be found at our project website, https://remyhuang.github.io/DJnet.
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment
Dong, Hao-Wen (Academia Sinica) | Hsiao, Wen-Yi (Academia Sinica) | Yang, Li-Chia (Academia Sinica) | Yang, Yi-Hsuan (Academia Sinica)
Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/.
Domain-Constraint Transfer Coding for Imbalanced Unsupervised Domain Adaptation
Tsai, Yao-Hung Hubert (Academia Sinica) | Hou, Cheng-An (Carnegie Mellon University) | Chen, Wei-Yu (National Taiwan University) | Yeh, Yi-Ren (National Kaohsiung Normal University) | Wang, Yu-Chiang Frank (Academia Sinica)
Unsupervised domain adaptation (UDA) deals with the task that labeled training and unlabeled test data collected from source and target domains, respectively. In this paper, we particularly address the practical and challenging scenario of imbalanced cross-domain data. That is, we do not assume the label numbers across domains to be the same, and we also allow the data in each domain to be collected from multiple datasets/sub-domains. To solve the above task of imbalanced domain adaptation, we propose a novel algorithm of Domain-constraint Transfer Coding (DcTC). Our DcTC is able to exploit latent subdomains within and across data domains, and learns a common feature space for joint adaptation and classification purposes. Without assuming balanced cross-domain data as most existing UDA approaches do, we show that our method performs favorably against state-of-the-art methods on multiple cross-domain visual classification tasks.
Unfolding Temporal Dynamics: Predicting Social Media Popularity Using Multi-scale Temporal Decomposition
Wu, Bo (Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences) | Mei, Tao (Microsoft Research) | Cheng, Wen-Huang (Academia Sinica) | Zhang, Yongdong (Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences)
Time information plays a crucial role on social media popularity. Existing research on popularity prediction, effective though, ignores temporal information which is highly related to user-item associations and thus often results in limited success. An essential way is to consider all these factors (user, item, and time), which capture the dynamic nature of photo popularity. In this paper, we present a novel approach to factorize the popularity into user-item context and time-sensitive context for exploring the mechanism of dynamic popularity. The user-item context provides a holistic view of popularity, while the time-sensitive context captures the temporal dynamics nature of popularity. Accordingly, we develop two kinds of time-sensitive features, including user activeness variability and photo prevalence variability. To predict photo popularity, we propose a novel framework named Multi-scale Temporal Decomposition (MTD), which decomposes the popularity matrix in latent spaces based on contextual associations. Specifically, the proposed MTD models time-sensitive context on different time scales, which is beneficial to automatically learn temporal patterns. Based on the experiments conducted on a real-world dataset with 1.29M photos from Flickr, our proposed MTD can achieve the prediction accuracy of 79.8% and outperform the best three state-of-the-art methods with a relative improvement of 9.6% on average.
T-Gram: A Time-Aware Language Model to Predict Human Mobility
Hsieh, Hsun-Ping (National Taiwan University) | Li, Cheng-Te (Academia Sinica) | Gao, Xiaoqing (Xidian University )
This paper presents a novel time-aware language model, T-gram , to predict the human mobility using location check-in data. While the conventional n-gram language model, which use the contextual co-occurrence to estimate the probability of a sequence of items, are often employed to predict human mobility, the time information of items is merely considered. T-gram exploits the time information associated at each location, and aims to estimate the probability of visiting satisfaction for a given sequence of locations. For a location sequence, if locations are visited at right times and the transitions between locations are proper as well, the T-gram probability gets higher. We also devise a T-gram Search algorithm to predict future locations. Experiments of human mobility prediction conducted on Gowalla check-in data significantly outperform a series of n-gram-based methods and encourage the future usage of T-gram.
Semantical Clustering of Morphologically Related Chinese Words
Lee, Chia-Ling (National Taiwan University) | Chang, Ya-Ning (Academia Sinica) | Liu, Chao-Lin (National Chengchi University) | Lee, Chia-Ying (Academia Sinica) | Hsu, Jane Yung-jen (National Taiwan University)
A Chinese character embedded in different compound words may carry different meanings. In this paper, we aim at semantical clustering of a given family of morphologically related Chinese words. In Experiment 1, we employed linguistic features at the word, syntactic, semantic, and contextual levels in aggregated computational linguistics methods to handle the clustering task. In Experiment 2, we recruited adults and children to perform the clustering task. Experimental results indicate that our computational model achieved a similar level of performance as children.
Boosting OCR Accuracy Using Crowdsourcing
Wang, Shuo-Yang (Academia Sinica) | Wang, Ming-Hung (National Taiwan University) | Chen, Kuan-Ta (Academia Sinica)
Book digitizing is an important work in preserving ancient heritages. However, digitizing books contains a series of labor-intensive works, and one of them is to verify optical character recognition (OCR) outcomes. In this paper, we propose a crowdsourceable OCR verification method. Using our method, content holders are able to leverage the power of crowds to complete verification tasks and avoid content leakage. From the experiment results, our method is more efficient and reliable than the traditional method.
Disease Detection and Symptom Tracking by Retrieving Information from the Web
Ku, Lun-Wei (Academia Sinica) | Li, Wan-Lun (National Yunlin University of Science and Technology) | Chang, Ting-Chih (National Yunlin University of Science and Technology)
This paper proposes techniques for preliminary disease detection and personal symptom tracking adopting concepts and methods of web information retrieval. The proposed approaches are inspired by web users’ behavior. People look for information of symptoms from Internet. Therefore, considering information in Web pages, the developed system proposes possible diseases related to one or more queried symptoms. Moreover, these queried symptoms would be recorded in the query log so that the user could utilize these records to trace the history of symptoms, further to manage their own health or provide them to doctors as reference. As ranking detected diseases needs professional knowledge, we instead evaluate relevancy of retrieved sentences containing detected diseases in both strict and lenient metrics. Experimental results support the proposed ranking approach. The techniques described in this paper are also implemented to develop an Android application called “Health Generation”. In this application, the detected disease is further linked to its Wikipedia introduction and the nearby clinics are listed. Users can utilize the GPS function provided by cell phones to plan the route for them. Through the proposed approaches and the application to provide medical information and solutions according to users’ need and further to help users manage their health is the aim of this research.