AITopics | Ma, Wei-Ying

Collaborating Authors

Ma, Wei-Ying

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dual Learning for Machine Translation

He, Di, Xia, Yingce, Qin, Tao, Wang, Liwei, Yu, Nenghai, Liu, Tie-Yan, Ma, Wei-Ying

Neural Information Processing SystemsDec-31-2016

While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e.g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e.g., using the policy gradient methods). We call the corresponding approach to neural machine translation \emph{dual-NMT}. Experiments show that dual-NMT works very well on English$\leftrightarrow$French translation; especially, by learning from monolingual data (with 10\% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.

deep learning, neural network, translation model, (18 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
Europe > Spain (0.14)
Europe > Portugal (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Hashtag-Based Sub-Event Discovery Using Mutually Generative LDA in Twitter

Xing, Chen (Nankai University) | Wang, Yuan (Nankai University) | Liu, Jie (Nankai University) | Huang, Yalou (Nankai University) | Ma, Wei-Ying (Microsoft Research, China)

AAAI ConferencesApr-19-2016

Sub-event discovery is an effective method for social event analysis in Twitter. It can discover sub-events from large amount of noisy event-related information in Twitter and semantically represent them. The task is challenging because tweets are short, informal and noisy. To solve this problem, we consider leveraging event-related hashtags that contain many locations, dates and concise sub-event related descriptions to enhance sub-event discovery. To this end, we propose a hashtag-based mutually generative Latent Dirichlet Allocation model(MGe-LDA). In MGe-LDA, hashtags and topics of a tweet are mutually generated by each other. The mutually generative process models the relationship between hashtags and topics of tweets, and highlights the role of hashtags as a semantic representation of the corresponding tweets. Experimental results show that MGe-LDA can significantly outperform state-of-the-art methods for sub-event discovery.

hashtag, social media, text processing, (20 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East (1.00)
Africa > Middle East > Egypt (0.15)
Africa > Middle East > Tunisia (0.14)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment (0.49)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

LightLDA: Big Topic Models on Modest Compute Clusters

Yuan, Jinhui, Gao, Fei, Ho, Qirong, Dai, Wei, Wei, Jinliang, Zheng, Xun, Xing, Eric P., Liu, Tie-Yan, Ma, Wei-Ying

arXiv.org Machine LearningDec-4-2014

When building large-scale machine learning (ML) programs, such as big topic models or deep neural nets, one usually assumes such tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for most practitioners or academic researchers. We consider this challenge in the context of topic modeling on web-scale corpora, and show that with a modest cluster of as few as 8 machines, we can train a topic model with 1 million topics and a 1-million-word vocabulary (for a total of 1 trillion parameters), on a document collection with 200 billion tokens -- a scale not yet reported even with thousands of machines. Our major contributions include: 1) a new, highly efficient O(1) Metropolis-Hastings sampling algorithm, whose running cost is (surprisingly) agnostic of model size, and empirically converges nearly an order of magnitude faster than current state-of-the-art Gibbs samplers; 2) a structure-aware model-parallel scheme, which leverages dependencies within the topic model, yielding a sampling strategy that is frugal on machine memory and network communication; 3) a differential data-structure for model storage, which uses separate data structures for high- and low-frequency words to allow extremely large models to fit in memory, while maintaining high inference speed; and 4) a bounded asynchronous data-parallel scheme, which allows efficient distributed processing of massive data via a parameter server. Our distribution strategy is an instance of the model-and-data-parallel programming model underlying the Petuum framework for general distributed ML, and was implemented on top of the Petuum open-source system. We provide experimental evidence showing how this development puts massive models within reach on a small cluster while still enjoying proportional time cost reductions with increasing cluster size, in comparison with alternative options.

artificial intelligence, lightlda, natural language, (20 more...)

arXiv.org Machine Learning

1412.1576

Country:

Asia (0.14)
North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes

Yu, Kai, Schwaighofer, Anton, Tresp, Volker, Ma, Wei-Ying, Zhang, HongJiang

arXiv.org Machine LearningOct-19-2012

Collaborative filtering (CF) and content-based filtering (CBF) have widely been used in information filtering applications. Both approaches have their strengths and weaknesses which is why researchers have developed hybrid systems. This paper proposes a novel approach to unify CF and CBF in a probabilistic framework, named collaborative ensemble learning. It uses probabilistic SVMs to model each user's profile (as CBF does).At the prediction phase, it combines a society OF users profiles, represented by their respective SVM models, to predict an active users preferences(the CF idea).The combination scheme is embedded in a probabilistic framework and retains an intuitive explanation.Moreover, collaborative ensemble learning does not require a global training stage and thus can incrementally incorporate new data.We report results based on two data sets. For the Reuters-21578 text data set, we simulate user ratings under the assumption that each user is interested in only one category. In the second experiment, we use users' opinions on a set of 642 art images that were collected through a web-based survey. For both data sets, collaborative ensemble achieved excellent performance in terms of recommendation accuracy.

artificial intelligence, data mining, information, (18 more...)

arXiv.org Machine Learning

1212.2508

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.50)

Add feedback