AITopics | Sun, Min

Plotting

Sun, Min

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Meta-CPR: Generalize to Unseen Large Number of Agents with Communication Pattern Recognition Module

Tseng, Wei-Cheng, Wei, Wei, Juan, Da-Chen, Sun, Min

arXiv.org Artificial IntelligenceDec-15-2021

Designing an effective communication mechanism among agents in reinforcement learning has been a challenging task, especially for real-world applications. The number of agents can grow or an environment sometimes needs to interact with a changing number of agents in real-world scenarios. To this end, a multi-agent framework needs to handle various scenarios of agents, in terms of both scales and dynamics, for being practical to real-world applications. We formulate the multi-agent environment with a different number of agents as a multi-tasking problem and propose a meta reinforcement learning (meta-RL) framework to tackle this problem. The proposed framework employs a meta-learned Communication Pattern Recognition (CPR) module to identify communication behavior and extract information that facilitates the training process. Experimental results are poised to demonstrate that the proposed framework (a) generalizes to an unseen larger number of agents and (b) allows the number of agents to change between episodes. The ablation study is also provided to reason the proposed CPR design and show such design is effective.

agent, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2112.07222

Country: Asia (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction

Tseng, Wei-Cheng, Chi, Po-Han, Wu, Jia-Hua, Sun, Min

arXiv.org Artificial IntelligenceDec-1-2021

The capability of accurate prediction of protein functions and properties is essential in the biotechnology industry, e.g. drug development and artificial protein synthesis, etc. The main challenges of protein function prediction are the large label space and the lack of labeled training data. Our method leverages unsupervised sequence embedding and the success of deep convolutional neural network to overcome these challenges. In contrast, most of the existing methods delete the rare protein functions to reduce the label space. Furthermore, some existing methods require additional bio-information (e.g., the 3-dimensional structure of the proteins) which is difficult to be determined in biochemical experiments. Our proposed method significantly outperforms the other methods on the publicly available benchmark using only protein sequences as input. This allows the process of identifying protein functions to be sped up.

artificial intelligence, health & medicine, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2112.00344

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Toward Robust Long Range Policy Transfer

Tseng, Wei-Cheng, Lin, Jin-Siang, Feng, Yao-Min, Sun, Min

arXiv.org Artificial IntelligenceMar-4-2021

Humans can master a new task within a few trials by drawing upon skills acquired through prior experience. To mimic this capability, hierarchical models combining primitive policies learned from prior tasks have been proposed. However, these methods fall short comparing to the human's range of transferability. We propose a method, which leverages the hierarchical structure to train the combination function and adapt the set of diverse primitive polices alternatively, to efficiently produce a range of complex behaviors on challenging new tasks. We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase. We demonstrate that our method outperforms other recent policy transfer methods by combining and adapting these reusable primitives in tasks with continuous action space. The experiment results further show that our approach provides a broader transferring range. The ablation study also shows the regularization terms are critical for long range policy transfer. Finally, we show that our method consistently outperforms other methods when the quality of the primitives varies.

combination function, neural network, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2103.02957

Country:

Asia (0.14)
Europe > Sweden (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning a Multi-Modal Policy via Imitating Demonstrations with Mixed Behaviors

Hsiao, Fang-I, Kuo, Jui-Hsuan, Sun, Min

arXiv.org Machine LearningMar-25-2019

We propose a novel approach to train a multi-modal policy from mixed demonstrations without their behavior labels. We develop a method to discover the latent factors of variation in the demonstrations. Specifically, our method is based on the variational autoencoder with a categorical latent variable. The encoder infers discrete latent factors corresponding to different behaviors from demonstrations. The decoder, as a policy, performs the behaviors accordingly. Once learned, the policy is able to reproduce a specific behavior by simply conditioning on a categorical vector. We evaluate our method on three different tasks, including a challenging task with high-dimensional visual inputs. Experimental results show that our approach is better than various baseline methods and competitive with a multi-modal policy trained by ground truth behavior labels.

deep learning, demonstration, neural network, (18 more...)

arXiv.org Machine Learning

1903.10304

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

InstaNAS: Instance-aware Neural Architecture Search

Cheng, An-Chieh, Lin, Chieh Hubert, Juan, Da-Cheng, Wei, Wei, Sun, Min

arXiv.org Machine LearningNov-26-2018

Neural Architecture Search (NAS) aims at finding one "single" architecture that achieves the best accuracy for a given task such as image recognition.In this paper, we study the instance-level variation,and demonstrate that instance-awareness is an important yet currently missing component of NAS. Based on this observation, we propose InstaNAS for searching toward instance-level architectures;the controller is trained to search and form a "distribution of architectures" instead of a single final architecture. Then during the inference phase, the controller selects an architecture from the distribution, tailored for each unseen image to achieve both high accuracy and short latency. The experimental results show that InstaNAS reduces the inference latency without compromising classification accuracy. On average, InstaNAS achieves 48.9% latency reduction on CIFAR-10 and 40.2% latency reduction on CIFAR-100 with respect to MobileNetV2 architecture.

architecture, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1811.10201

Country:

Europe (0.47)
North America > United States > Louisiana (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Searching Toward Pareto-Optimal Device-Aware Neural Architectures

Cheng, An-Chieh, Dong, Jin-Dong, Hsu, Chi-Hung, Chang, Shu-Huan, Sun, Min, Chang, Shih-Chieh, Pan, Jia-Yu, Chen, Yu-Ting, Wei, Wei, Juan, Da-Cheng

arXiv.org Machine LearningAug-29-2018

Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding. However, most existing works only optimize for model accuracy and largely ignore other important factors imposed by the underlying hardware and devices, such as latency and energy, when making inference. In this paper, we first introduce the problem of NAS and provide a survey on recent works. Then we deep dive into two recent advancements on extending NAS into multiple-objective frameworks: MONAS and DPP-Net. Both MONAS and DPP-Net are capable of optimizing accuracy and other objectives imposed by devices, searching for neural architectures that can be best deployed on a wide spectrum of devices: from embedded systems and mobile devices to workstations. Experimental results are poised to show that architectures found by MONAS and DPP-Net achieves Pareto optimality w.r.t the given objectives for various devices.

child network, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1808.0983

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Compatibility Family Learning for Item Recommendation and Generation

Shih, Yong-Siang (Appier Inc.) | Chang, Kai-Yueh (Appier Inc.) | Lin, Hsuan-Tien (Appier Inc.) | Sun, Min ( National Tsing Hua University )

AAAI ConferencesFeb-8-2018

Compatibility between items, such as clothes and shoes, is a major factor among customer's purchasing decisions. However, learning "compatibility" is challenging due to (1) broader notions of compatibility than those of similarity, (2) the asymmetric nature of compatibility, and (3) only a small set of compatible and incompatible items are observed. We propose an end-to-end trainable system to embed each item into a latent vector and project a query item into K compatible prototypes in the same space. These prototypes reflect the broad notions of compatibility. We refer to both the embedding and prototypes as "Compatibility Family." In our learned space, we introduce a novel Projected Compatibility Distance (PCD) function which is differentiable and ensures diversity by aiming for at least one prototype to be close to a compatible item, whereas none of the prototypes are close to an incompatible item. We evaluate our system on a toy dataset, two Amazon product datasets, and Polyvore outfit dataset. Our method consistently achieves state-of-the-art performance. Finally, we show that we can visualize the candidate compatible prototypes using a Metric-regularized Conditional Generative Adversarial Network (MrCGAN), where the input is a projected prototype and the output is a generated image of a compatible item. We ask human evaluators to judge the relative compatibility between our generated images and images generated by CGANs conditioned directly on query items. Our generated images are significantly preferred, with roughly twice the number of votes as others.

compatible item, deep learning, neural network, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > Taiwan (0.14)

Industry: Education > Educational Setting > Continuing Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Self-View Grounding Given a Narrated 360° Video

Chou, Shih-Han (National Tsing Hua University) | Chen, Yi-Chun (National Tsing Hua University) | Zeng, Kuo-Hao (National Tsing Hua University) | Hu, Hou-Ning (National Tsing Hua University) | Fu, Jianlong (Microsoft Research, Beijing) | Sun, Min (National Tsing Hua University)

AAAI ConferencesFeb-8-2018

Narrated 360° videos are typically provided in many touring scenarios to mimic real-world experience. However, previous work has shown that smart assistance (i.e., providing visual guidance) can significantly help users to follow the Normal Field of View (NFoV) corresponding to the narrative.In this project, we aim at automatically grounding the NFoVs of a 360° video given subtitles of the narrative (referred to as ''NFoV-grounding"). We propose a novel Visual Grounding Model (VGM) to implicitly and efficiently predict the NFoVs given the video content and subtitles. Specifically, at each frame, we efficiently encode the panorama into feature map of candidate NFoVs using a Convolutional Neural Network (CNN) and the subtitles to the same hidden space using an RNN with Gated Recurrent Units (GRU). Then, we apply soft-attention on candidate NFoVs to trigger sentence decoder aiming to minimize the reconstruct loss between the generated and given sentence. Finally, we obtain the NFoV as the candidate NFoV with the maximum attention without any human supervision.To train VGM more robustly, we also generate a reverse sentence conditioning on one minus the soft-attention such that the attention focuses on candidate NFoVs less relevant to the given sentence. The negative log reconstruction loss of the reverse sentence (referred to as ''irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence. To evaluate our method, we collect the first narrated 360° videos dataset and achieve state-of-the-art NFoV-grounding performance.

deep learning, neural network, subtitle, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia (0.46)
North America > United States (0.28)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner

Chen, Tseng-Hung, Liao, Yuan-Hong, Chuang, Ching-Yao, Hsu, Wan-Ting, Fu, Jianlong, Sun, Min

arXiv.org Artificial IntelligenceAug-14-2017

Impressive image captioning results are achieved in domains with plenty of training image and sentence pairs (e.g., MSCOCO). However, transferring to a target domain with significant domain shifts but no paired training data (referred to as cross-domain image captioning) remains largely unexplored. We propose a novel adversarial training procedure to leverage unpaired data in the target domain. Two critic networks are introduced to guide the captioner, namely domain critic and multi-modal critic. The domain critic assesses whether the generated sentences are indistinguishable from sentences in the target domain. The multi-modal critic assesses whether an image and its generated sentence are a valid pair. During training, the critics and captioner act as adversaries -- captioner aims to generate indistinguishable sentences, whereas critics aim at distinguishing them. The assessment improves the captioner through policy gradient updates. During inference, we further propose a novel critic-based planning method to select high-quality sentences without additional supervision (e.g., tags). To evaluate, we use MSCOCO as the source domain and four other datasets (CUB-200-2011, Oxford-102, TGIF, and Flickr30k) as the target domains. Our method consistently performs well on all datasets. In particular, on CUB-200-2011, we achieve 21.8% CIDEr-D improvement after adaptation. Utilizing critics during inference further gives another 4.5% boost.

deep learning, neural network, target domain, (18 more...)

arXiv.org Artificial Intelligence

1705.0093

Country: Asia > China (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Leveraging Video Descriptions to Learn Video Question Answering

Zeng, Kuo-Hao (Stanford University and National Tsing Hua University) | Chen, Tseng-Hung (National Tsing Hua University) | Chuang, Ching-Yao (National Tsing Hua University) | Liao, Yuan-Hong (National Tsing Hua University) | Niebles, Juan Carlos (Stanford University) | Sun, Min (National Tsing Hua University)

AAAI ConferencesFeb-14-2017

We propose a scalable approach to learn video-based question answering (QA): to answer a free-form natural language question about the contents of a video. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated. Next, we use these candidate QA pairs to train a number of video-based QA methods extended from MN (Sukhbaatar et al. 2015), VQA (Antol et al. 2015), SA (Yao et al. 2015), and SS (Venugopalan et al. 2015). In order to handle non-perfect candidate QA pairs, we propose a self-paced learning procedure to iteratively identify them and mitigate their effects in training. Finally, we evaluate performance on manually generated video-based QA pairs. The results show that our self-paced learning procedure is effective, and the extended SS model outperforms various baselines.

deep learning, neural network, video, (20 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Sports (0.68)
Media > Film (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback