AITopics | Gong, Minglun

Collaborating Authors

Gong, Minglun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Systematic Literature Review of Vision-Based Approaches to Outdoor Livestock Monitoring with Lessons from Wildlife Studies

Scott, Stacey D., Abbas, Zayn J., Ellid, Feerass, Dykhne, Eli-Henry, Islam, Muhammad Muhaiminul, Ayad, Weam, Kacmorova, Kristina, Tulpan, Dan, Gong, Minglun

arXiv.org Artificial IntelligenceOct-7-2024

Precision livestock farming (PLF) aims to improve the health and welfare of livestock animals and farming outcomes through the use of advanced technologies. Computer vision, combined with recent advances in machine learning and deep learning artificial intelligence approaches, offers a possible solution to the PLF ideal of 24/7 livestock monitoring that helps facilitate early detection of animal health and welfare issues. However, a significant number of livestock species are raised in large outdoor habitats that pose technological challenges for computer vision approaches. This review provides a comprehensive overview of computer vision methods and open challenges in outdoor animal monitoring. We include research from both the livestock and wildlife fields in the review because of the similarities in appearance, behaviour, and habitat for many livestock and wildlife. We focus on large terrestrial mammals, such as cattle, horses, deer, goats, sheep, koalas, giraffes, and elephants. We use an image processing pipeline to frame our discussion and highlight the current capabilities and open technical challenges at each stage of the pipeline. The review found a clear trend towards the use of deep learning approaches for animal detection, counting, and multi-species classification. We discuss in detail the applicability of current vision-based methods to PLF contexts and promising directions for future research.

artificial intelligence, detection, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.05041

Country:

Europe (1.00)
Asia (0.93)
North America > United States (0.93)
North America > Canada > Ontario > Wellington County > Guelph (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Media (1.00)
Health & Medicine (1.00)
Food & Agriculture > Agriculture (1.00)
Leisure & Entertainment (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
(2 more...)

Add feedback

Neural Packing: from Visual Sensing to Reinforcement Learning

Xu, Juzhan, Gong, Minglun, Zhang, Hao, Huang, Hui, Hu, Ruizhen

arXiv.org Artificial IntelligenceOct-16-2023

We present a novel learning framework to solve the transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforcement learning (RL), to solve the NP-hard combinatorial optimization problem. Our network simultaneously selects an object to pack and determines the final packing location, based on a judicious encoding of the continuously evolving states of partially observed source objects and available spaces in the target container, using separate encoders both enabled with attention mechanisms. The encoded feature vectors are employed to compute the matching scores and feasibility masks of different pairings of box selection and available space configuration for packing strategy optimization. Extensive experiments, including ablation studies and physical packing execution by a real robot (Universal Robot UR5e), are conducted to evaluate our method in terms of its design choices, scalability, generalizability, and comparisons to baselines, including the most recent RL-based TAP solution. We also contribute the first benchmark for TAP which covers a variety of input settings and difficulty levels.

container, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2311.09233

Country:

Asia > China (0.15)
North America > Canada (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

3D Pose Estimation and Future Motion Prediction from 2D Images

Yang, Ji, Ma, Youdong, Zuo, Xinxin, Wang, Sen, Gong, Minglun, Cheng, Li

arXiv.org Artificial IntelligenceNov-25-2021

In many recent efforts [1, 2, 3, 4], 3D human pose estimation has been decomposed into a two-stage process: first, the 2D keypoints that correspond to the body joints are detected from the 2D image, after which the detected joints are lifted to obtain 3D pose. This type of solution is elegant in terms of the simplicity of problem formulation, unfortunately it suffers from inherent ambiguities caused by projection: different 3D poses can share the same 2D pose projection given a specific viewpoint; that is, the mapping between the 2D joints detection and 3D pose is not bijective. To resolve this ambiguity of 3D pose estimation from a monocular image, video-based pose estimation is also investigated in the literature [5, 6]. Existing video-based pose estimation methods, however, either need to observe a relatively long history (243 frames [5] or can only handle a short video sequence (4-6 frames [6]) to achieve their best results.

artificial intelligence, machine learning, sequence, (20 more...)

arXiv.org Artificial Intelligence

2111.13285

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

STNet: Scale Tree Network with Multi-level Auxiliator for Crowd Counting

Wang, Mingjie, Cai, Hao, Han, Xianfeng, Zhou, Jun, Gong, Minglun

arXiv.org Artificial IntelligenceDec-18-2020

Crowd counting remains a challenging task because the presence of drastic scale variation, density inconsistency, and complex background can seriously degrade the counting accuracy. To battle the ingrained issue of accuracy degradation, we propose a novel and powerful network called Scale Tree Network (STNet) for accurate crowd counting. STNet consists of two key components: a Scale-Tree Diversity Enhancer and a Semi-supervised Multi-level Auxiliator. Specifically, the Diversity Enhancer is designed to enrich scale diversity, which alleviates limitations of existing methods caused by insufficient level of scales. A novel tree structure is adopted to hierarchically parse coarse-to-fine crowd regions. Furthermore, a simple yet effective Multi-level Auxiliator is presented to aid in exploiting generalisable shared characteristics at multiple levels, allowing more accurate pixel-wise background cognition. The overall STNet is trained in an end-to-end manner, without the needs for manually tuning loss weights between the main and the auxiliary tasks. Extensive experiments on four challenging crowd datasets demonstrate the superiority of the proposed method.

artificial intelligence, neural network, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2012.10189

Country:

Asia > China (0.28)
North America > Canada (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Sensing and Signal Processing (0.68)

Add feedback

Multi-scale Convolution Aggregation and Stochastic Feature Reuse for DenseNets

Wang, Mingjie, Zhou, Jun, Mao, Wendong, Gong, Minglun

arXiv.org Machine LearningOct-2-2018

Recently, Convolution Neural Networks (CNNs) obtained huge success in numerous vision tasks. In particular, DenseNets have demonstrated that feature reuse via dense skip connections can effectively alleviate the difficulty of training very deep networks and that reusing features generated by the initial layers in all subsequent layers has strong impact on performance. To feed even richer information into the network, a novel adaptive Multi-scale Convolution Aggregation module is presented in this paper. Composed of layers for multi-scale convolutions, trainable cross-scale aggregation, maxout, and concatenation, this module is highly non-linear and can boost the accuracy of DenseNet while using much fewer parameters. In addition, due to high model complexity, the network with extremely dense feature reuse is prone to overfitting. To address this problem, a regularization method named Stochastic Feature Reuse is also presented. Through randomly dropping a set of feature maps to be reused for each mini-batch during the training phase, this regularization method reduces training costs and prevents co-adaptation. Experimental results on CIFAR-10, CIFAR-100 and SVHN benchmarks demonstrated the effectiveness of the proposed methods.

deep learning, densenet, neural network, (15 more...)

arXiv.org Machine Learning

1810.01373

Country:

Asia > China (0.14)
North America > Canada (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback