AITopics

2107.06516

Country:

Asia > China (0.46)
North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (1.00)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Rethinking Learnable Tree Filter for Generic Feature Transform

Song, Lin, Li, Yanwei, Jiang, Zhengkai, Li, Zeming, Zhang, Xiangyu, Sun, Hongbin, Sun, Jian, Zheng, Nanning

The Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation. Nevertheless, the intrinsic geometric constraint forces it to focus on the regions with close spatial distance, hindering the effective long-range interactions. To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term. Besides, we propose a learnable spanning tree algorithm to replace the original non-differentiable one, which further improves the flexibility and robustness. With the above improvements, our method can better capture long-range dependencies and preserve structural details with linear complexity, which is extended to several vision tasks for more generic feature transform. Extensive experiments on object detection/instance segmentation demonstrate the consistent improvements over the original version. For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles. Code is available at https://github.com/StevenGrove/LearnableTreeFilterV2.

deep learning, module, neural network, (16 more...)

2012.03482

Country: Asia > China (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Fine-Grained Dynamic Head for Object Detection

Song, Lin, Li, Yanwei, Jiang, Zhengkai, Li, Zeming, Sun, Hongbin, Sun, Jian, Zheng, Nanning

The Feature Pyramid Network (FPN) presents a remarkable approach to alleviate the scale variance in object representation by performing instance-level assignments. Nevertheless, this strategy ignores the distinct characteristics of different sub-regions in an instance. To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation. Moreover, we design a spatial gate with the new activation function to reduce computational complexity dramatically through spatially sparse convolutions. Extensive experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks.

artificial intelligence, computer vision, neural network, (15 more...)

2012.03519

Country: Asia > China (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

End-to-End Object Detection with Fully Convolutional Network

Wang, Jianfeng, Song, Lin, Li, Zeming, Sun, Hongbin, Sun, Jian, Zheng, Nanning

Mainstream object detectors based on the fully convolutional network has achieved impressive performance. While most of them still need a hand-designed non-maximum suppression (NMS) post-processing, which impedes fully end-to-end training. In this paper, we give the analysis of discarding NMS, where the results reveal that a proper label assignment plays a crucial role. To this end, for fully convolutional detectors, we introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection, which obtains comparable performance with NMS. Besides, a simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region. With these techniques, our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets. The code is available at https://github.com/Megvii-BaseDetection/DeFCN .

deep learning, neural network, prediction, (15 more...)

2012.03544

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation

Wan, Lipeng, Song, Xuwei, Lan, Xuguang, Zheng, Nanning

Cooperative multi-agent tasks require agents to deduce their own contributions with shared global rewards, known as the challenge of credit assignment. General methods for policy based multi-agent reinforcement learning to solve the challenge introduce differentiate value functions or advantage functions for individual agents. In multi-agent system, polices of different agents need to be evaluated jointly. In order to update polices synchronously, such value functions or advantage functions also need synchronous evaluation. However, in current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously, thus suffer from natural estimation bias. In this work, we propose the approximatively synchronous advantage estimation. We first derive the marginal advantage function, an expansion from single-agent advantage function to multi-agent system. Further more, we introduce a policy approximation for synchronous advantage estimation, and break down the multi-agent policy optimization problem into multiple sub-problems of single-agent policy optimization. Our method is compared with baseline algorithms on StarCraft multi-agent challenges, and shows the best performance on most of the tasks.

agent, artificial intelligence, estimation, (14 more...)

2012.03488

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.35)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

arXiv.org Machine LearningSep-8-2020

Conditional Uncorrelation and Efficient Non-approximate Subset Selection in Sparse Regression

Wang, Jianji, Liu, Qi, Zheng, Nanning

Given $m$ $d$-dimensional responsors and $n$ $d$-dimensional predictors, sparse regression finds at most $k$ predictors for each responsor for linearly approximation, $1\leq k \leq d-1$. The key problem in sparse regression is subset selection, which usually suffers from the high computational cost. Here we consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation. Then an efficient non-approximate method of subset selection is proposed in which we do not need to calculate any linear coefficients for the candidate predictors. By the proposed method, the computational complexity is reduced from $O(\frac{1}{2}{k^3}+kd)$ to $O(\frac{1}{3}{k^3})$ for each candidate subset in sparse regression. Because the dimension $d$ is generally the number of observations or experiments and large enough, the proposed method can significantly improve the efficiency of sparse regression.

artificial intelligence, sparse regression, subset selection, (12 more...)

arXiv.org Machine Learning

2009.03986

Country: Asia > China (0.47)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.46)

arXiv.org Artificial IntelligenceJul-5-2020

Traffic Agent Trajectory Prediction Using Social Convolution and Attention Mechanism

Yang, Tao, Nan, Zhixiong, Zhang, He, Chen, Shitao, Zheng, Nanning

The trajectory prediction is significant for the decision-making of autonomous driving vehicles. In this paper, we propose a model to predict the trajectories of target agents around an autonomous vehicle. The main idea of our method is considering the history trajectories of the target agent and the influence of surrounding agents on the target agent. To this end, we encode the target agent history trajectories as an attention mask and construct a social map to encode the interactive relationship between the target agent and its surrounding agents. Given a trajectory sequence, the LSTM networks are firstly utilized to extract the features for all agents, based on which the attention mask and social map are formed. Then, the attention mask and social map are fused to get the fusion feature map, which is processed by the social convolution to obtain a fusion feature representation. Finally, this fusion feature is taken as the input of a variable-length LSTM to predict the trajectory of the target agent. We note that the variable-length LSTM enables our model to handle the case that the number of agents in the sensing scope is highly dynamic in traffic scenes. To verify the effectiveness of our method, we widely compare with several methods on a public dataset, achieving a 20% error decrease. In addition, the model satisfies the real-time requirement with the 32 fps.

agent, deep learning, neural network, (20 more...)

2007.02515

Country: Asia > China (0.46)

Genre: Research Report (0.50)

Industry:

Information Technology (0.88)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-18-2020

Compositional Generalization by Learning Analytical Expressions

Liu, Qian, An, Shengnan, Lou, Jian-Guang, Chen, Bei, Lin, Zeqi, Gao, Yan, Zhou, Bin, Zheng, Nanning, Zhang, Dongmei

Compositional generalization is a basic but essential intellective capability of human beings, which allows us to recombine known parts readily. However, existing neural network based models have been proven to be extremely deficient in such a capability. Inspired by work in cognition which argues compositionality can be captured by variable slots with symbolic functions, we present a refreshing view that connects a memory-augmented neural model with analytical expressions, to achieve compositional generalization. Our model consists of two cooperative neural modules Composer and Solver, fitting well with the cognitive argument while still being trained in an end-to-end manner via a hierarchical reinforcement learning algorithm. Experiments on a well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization, solving all challenges addressed by previous works with 100% accuracies.

compositional generalization, deep learning, neural network, (19 more...)

2006.10627

Country:

Europe (1.00)
Asia > China (0.68)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningAug-22-2019

An encoding framework with brain inner state for natural image identification

Wu, Hao, Zhu, Ziyu, Wang, Jiayi, Zheng, Nanning, Chen, Badong

Neural encoding and decoding, which aim to characterize the relationship between stimuli and brain activities, have emerged as an important area in cognitive neuroscience. Traditional encoding models, which focus on feature extraction and mapping, consider the brain as an input-output mapper without inner states. In this work, inspired by the fact that human brain acts like a state machine, we proposed a novel encoding framework that combines information from both the external world and the inner state to predict brain activity. The framework comprises two parts: forward encoding model that deals with visual stimuli and inner state model that captures influence from intrinsic connections in the brain. The forward model can be any traditional encoding model, making the framework flexible. The inner state model is a linear model to utilize information in the prediction residuals of the forward model. The proposed encoding framework can achieve much better performance on natural image identification from fMRI response than forwardonly models. The identification accuracy will decrease slightly with the dataset size increasing, but remain relatively stable with different identification methods. The results confirm that the new encoding framework is effective and robust when used for brain decoding.

deep learning, neural network, voxel, (20 more...)

arXiv.org Machine Learning

1908.08807

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJul-29-2019

Hindsight Trust Region Policy Optimization

Zhang, Hanbo, Bai, Site, Lan, Xuguang, Zheng, Nanning

As reinforcement learning continues to drive machine intelligence beyond its conventional boundary, unsubstantial practices in sparse reward environment severely limit further applications in a broader range of advanced fields. Motivated by the demand for an effective deep reinforcement learning algorithm that accommodates sparse reward environment, this paper presents Hindsight Trust Region Policy Optimization (Hindsight TRPO), a method that efficiently utilizes interactions in sparse reward conditions and maintains learning stability by restricting variance during the policy update process. Firstly, the hindsight methodology is expanded to TRPO, an advanced and efficient on-policy policy gradient method. Then, under the condition that the distributions are close, the KL-divergence is appropriately approximated by another $f$-divergence. Such approximation results in the decrease of variance during KL-divergence estimation and alleviates the instability during policy update. Experimental results on both discrete and continuous benchmark tasks demonstrate that Hindsight TRPO converges steadily and significantly faster than previous policy gradient methods. It achieves effective performances and high data-efficiency for training policies in sparse reward environments.

artificial intelligence, null, reinforcement learning, (14 more...)

1907.12439

Country: Asia > China > Shaanxi Province (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)