svpn
Flexbee: A Grasping and Perching UAV Based on Soft Vector-Propulsion Nozzle
Wang, Yue, Zhang, Lixian, Zhu, Yimin, Liu, Yangguang, Yang, Xuwei
Abstract--The aim of this paper is to design a new type of grasping and perching unmanned aerial vehicle (UA V), Flexbee, characterized by its soft vector-propulsion nozzle (SVPN). Compared to previous UA Vs, Flexbee integrates flight, grasping, and perching functionalities into the four SVPNs, offering advantages such as decoupled position and attitude control, high structural reuse, and strong adaptability for grasping and perching. A dynamics model of Flexbee has been developed, and the nonlinear coupling issue of the moment has been resolved through lin-earization of the equivalent moment model. Hierarchical control strategy was employed to design the controllers for Flexbee's two operational modes. Finally, flight, grasping, and perching experiments were conducted to validate Flexbee's kinematic capabilities and the effectiveness of the control strategy. UL TI-ROTOR unmanned aerial vehicles (UA Vs), with their three-dimensional maneuverabilities, have demonstrated remarkable effectiveness in environments that are difficult for humans to reach [1]-[5]. As people's requirements for UA V endurance performance and adaptability to complex environments offer greater advantages, compared with large UA Vs, small UA Vs have the characteristics of small size, light weight, low cost, and high maneuverability, which play a greater advantage in complex environments [6]-[8].
- Asia > China > Heilongjiang Province > Harbin (0.06)
- Asia > China > Shandong Province > Qingdao (0.04)
- Africa > Ghana (0.04)
- Aerospace & Defense > Aircraft (1.00)
- Transportation > Air (0.69)
So-ViT: Mind Visual Tokens for Vision Transformer
Xie, Jiangtao, Zeng, Ruiren, Wang, Qilong, Zhou, Ziqi, Li, Peihua
Recently the vision transformer (ViT) architecture, where the backbone purely consists of self-attention mechanism, has achieved very promising performance in visual classification. However, the high performance of the original ViT heavily depends on pretraining using ultra large-scale datasets, and it significantly underperforms on ImageNet-1K if trained from scratch. This paper makes the efforts toward addressing this problem, by carefully considering the role of visual tokens. First, for classification head, existing ViT only exploits class token while entirely neglecting rich semantic information inherent in high-level visual tokens. Therefore, we propose a new classification paradigm, where the second-order, cross-covariance pooling of visual tokens is combined with class token for final classification. Meanwhile, a fast singular value power normalization is proposed for improving the second-order pooling. Second, the original ViT employs the naive embedding of fixed-size image patches, lacking the ability to model translation equivariance and locality. To alleviate this problem, we develop a light-weight, hierarchical module based on off-the-shelf convolutions for visual token embedding. The proposed architecture, which we call So-ViT, is thoroughly evaluated on ImageNet-1K. The results show our models, when trained from scratch, outperform the competing ViT variants, while being on par with or better than state-of-the-art CNN models. Code is available at https://github.com/jiangtaoxie/So-ViT
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)