Goto

Collaborating Authors

 computer vision technique


An Image-Based Path Planning Algorithm Using a UAV Equipped with Stereo Vision

Iz, Selim Ahmet, Unel, Mustafa

arXiv.org Artificial Intelligence

This paper presents a novel image-based path planning algorithm that was developed using computer vision techniques, as well as its comparative analysis with well-known deterministic and probabilistic algorithms, namely A* and Probabilistic Road Map algorithm (PRM). The terrain depth has a significant impact on the calculated path safety. The craters and hills on the surface cannot be distinguished in a two-dimensional image. The proposed method uses a disparity map of the terrain that is generated by using a UAV. Several computer vision techniques, including edge, line and corner detection methods, as well as the stereo depth reconstruction technique, are applied to the captured images and the found disparity map is used to define candidate way-points of the trajectory. The initial and desired points are detected automatically using ArUco marker pose estimation and circle detection techniques. After presenting the mathematical model and vision techniques, the developed algorithm is compared with well-known algorithms on different virtual scenes created in the V-REP simulation program and a physical setup created in a laboratory environment. Results are promising and demonstrate effectiveness of the proposed algorithm.


From classical techniques to convolution-based models: A review of object detection algorithms

Neha, Fnu, Bhati, Deepshikha, Shukla, Deepak Kumar, Amiruzzaman, Md

arXiv.org Artificial Intelligence

Object detection is a fundamental task in computer vision and image understanding, with the goal of identifying and localizing objects of interest within an image while assigning them corresponding class labels. Traditional methods, which relied on handcrafted features and shallow models, struggled with complex visual data and showed limited performance. These methods combined low-level features with contextual information and lacked the ability to capture high-level semantics. Deep learning, especially Convolutional Neural Networks (CNNs), addressed these limitations by automatically learning rich, hierarchical features directly from data. These features include both semantic and high-level representations essential for accurate object detection. This paper reviews object detection frameworks, starting with classical computer vision methods. We categorize object detection approaches into two groups: (1) classical computer vision techniques and (2) CNN-based detectors. We compare major CNN models, discussing their strengths and limitations. In conclusion, this review highlights the significant advancements in object detection through deep learning and identifies key areas for further research to improve performance.


Maximum Solar Energy Tracking Leverage High-DoF Robotics System with Deep Reinforcement Learning

Jiang, Anjie, Mo, Kangtong, Fujimoto, Satoshi, Taylor, Michael, Kumar, Sanjay, Dimitrios, Chiotis, Ruiz, Emilia

arXiv.org Artificial Intelligence

Solar trajectory monitoring is a pivotal challenge in solar energy systems, underpinning applications such as autonomous energy harvesting and environmental sensing. A prevalent failure mode in sustained solar tracking arises when the predictive algorithm erroneously diverges from the solar locus, erroneously anchoring to extraneous celestial or terrestrial features. This phenomenon is attributable to an inadequate assimilation of solar-specific objectness attributes within the tracking paradigm. To mitigate this deficiency inherent in extant methodologies, we introduce an innovative objectness regularization framework that compels tracking points to remain confined within the delineated boundaries of the solar entity. By encapsulating solar objectness indicators during the training phase, our approach obviates the necessity for explicit solar mask computation during operational deployment. Furthermore, we leverage the high-DoF robot arm to integrate our method to improve its robustness and flexibility in different outdoor environments.


Reviews: Unsupervised Video Object Segmentation for Deep Reinforcement Learning

Neural Information Processing Systems

In particular, this work uses SfM-Net [1], which learns to predict optical flow of a single image, to segment the objects in a state, and then uses this object-mask for reinforcement learning. MOREL is evaluated on all 59 Atari games, where it outperforms the baselines in several environments.


Computer Vision Approaches for Automated Bee Counting Application

Bilik, Simon, Janakova, Ilona, Ligocki, Adam, Ficek, Dominik, Horak, Karel

arXiv.org Artificial Intelligence

Many application from the bee colony health state monitoring could be efficiently solved using a computer vision techniques. One of such challenges is an efficient way for counting the number of incoming and outcoming bees, which could be used to further analyse many trends, such as the bee colony health state, blooming periods, or for investigating the effects of agricultural spraying. In this paper, we compare three methods for the automated bee counting over two own datasets. The best performing method is based on the ResNet-50 convolutional neural network classifier, which achieved accuracy of 87% over the BUT1 dataset and the accuracy of 93% over the BUT2 dataset.


Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision Techniques

K., Abhinand, Nair, Abhiram B., C., Dhananjay, Hamza, Hanan, J., Mohammed Fawaz, K., Rahma Fahim, S, Anoop V.

arXiv.org Artificial Intelligence

Technological advancements and innovations are advancing our daily life in all the ways possible but there is a larger section of society who are deprived of accessing the benefits due to their physical inabilities. To reap the real benefits and make it accessible to society, these talented and gifted people should also use such innovations without any hurdles. Many applications developed these days address these challenges, but localized communities and other constrained linguistic groups may find it difficult to use them. Malayalam, a Dravidian language spoken in the Indian state of Kerala is one of the twenty-two scheduled languages in India. Recent years have witnessed a surge in the development of systems and tools in Malayalam, addressing the needs of Kerala, but many of them are not empathetically designed to cater to the needs of hearing-impaired people. One of the major challenges is the limited or no availability of sign language data for the Malayalam language and sufficient efforts are not made in this direction. In this connection, this paper proposes an approach for sign language identification for the Malayalam language using advanced deep learning and computer vision techniques. We start by developing a labeled dataset for Malayalam letters and for the identification we use advanced deep learning techniques such as YOLOv8 and computer vision. Experimental results show that the identification accuracy is comparable to other sign language identification systems and other researchers in sign language identification can use the model as a baseline to develop advanced models.


Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature Review

Bamigbade, Opeyemi, Sheppard, John, Scanlon, Mark

arXiv.org Artificial Intelligence

The task of multimedia geolocation is becoming an increasingly essential component of the digital forensics toolkit to effectively combat human trafficking, child sexual exploitation, and other illegal acts. Typically, metadata-based geolocation information is stripped when multimedia content is shared via instant messaging and social media. The intricacy of geolocating, geotagging, or finding geographical clues in this content is often overly burdensome for investigators. Recent research has shown that contemporary advancements in artificial intelligence, specifically computer vision and deep learning, show significant promise towards expediting the multimedia geolocation task. This systematic literature review thoroughly examines the state-of-the-art leveraging computer vision techniques for multimedia geolocation and assesses their potential to expedite human trafficking investigation. This includes a comprehensive overview of the application of computer vision-based approaches to multimedia geolocation, identifies their applicability in combating human trafficking, and highlights the potential implications of enhanced multimedia geolocation for prosecuting human trafficking. 123 articles inform this systematic literature review. The findings suggest numerous potential paths for future impactful research on the subject.


Long-term monitoring of bird flocks in the wild – interview with Kshitiz

AIHub

In work presented at the 32nd International Joint Conference on Artificial Intelligence (IJCAI 2023), Kshitiz, Sonu Shreshtha, Ramy Mounir, Mayank Vatsa, Richa Singh, Saket Anand, Sudeep Sarkar and Sevaram Mali Parihar investigate using computer vision techniques to monitor large flocks of birds. In this interview, Kshitiz tells us more about this research. In our work, Long-term Monitoring of Bird Flocks in the Wild, published in IJCAI 2023, we delve into developing and applying computer vision techniques and datasets tailored for non-invasive monitoring and analysis of migratory bird flocks in their natural habitats. The aim is to understand the behavior and ecology of migratory birds through automated video analysis with minimal human intervention, thereby bolstering conservation initiatives. The core technical challenges associated with wildlife monitoring arise from the uncontrolled, outdoor nature of the imagery (both images and videos) capturing large flocks of migratory birds over several months.


Overview of Computer Vision Techniques in Robotized Wire Harness Assembly: Current State and Future Opportunities

Wang, Hao, Salunkhe, Omkar, Quadrini, Walter, Lämkull, Dan, Ore, Fredrik, Johansson, Björn, Stahre, Johan

arXiv.org Artificial Intelligence

Wire harnesses are essential hardware for electronic systems in modern automotive vehicles. With a shift in the automotive industry towards electrification and autonomous driving, more and more automotive electronics are responsible for energy transmission and safety-critical functions such as maneuvering, driver assistance, and safety system. This paradigm shift places more demand on automotive wire harnesses from the safety perspective and stresses the greater importance of high-quality wire harness assembly in vehicles. However, most of the current operations of wire harness assembly are still performed manually by skilled workers, and some of the manual processes are problematic in terms of quality control and ergonomics. There is also a persistent demand in the industry to increase competitiveness and gain market share. Hence, assuring assembly quality while improving ergonomics and optimizing labor costs is desired. Robotized assembly, accomplished by robots or in human-robot collaboration, is a key enabler for fulfilling the increasingly demanding quality and safety as it enables more replicable, transparent, and comprehensible processes than completely manual operations. However, robotized assembly of wire harnesses is challenging in practical environments due to the flexibility of the deformable objects, though many preliminary automation solutions have been proposed under simplified industrial configurations. Previous research efforts have proposed the use of computer vision technology to facilitate robotized automation of wire harness assembly, enabling the robots to better perceive and manipulate the flexible wire harness. This article presents an overview of computer vision technology proposed for robotized wire harness assembly and derives research gaps that require further study to facilitate a more practical robotized assembly of wire harnesses.


The Analysis and Extraction of Structure from Organizational Charts

Manali, Nikhil, Doermann, David, Desai, Mahesh

arXiv.org Artificial Intelligence

Organizational charts, also known as org charts, are critical representations of an organization's structure and the hierarchical relationships between its components and positions. However, manually extracting information from org charts can be error-prone and time-consuming. To solve this, we present an automated and end-to-end approach that uses computer vision, deep learning, and natural language processing techniques. Additionally, we propose a metric to evaluate the completeness and hierarchical accuracy of the extracted information. This approach has the potential to improve organizational restructuring and resource utilization by providing a clear and concise representation of the organizational structure. Our study lays a foundation for further research on the topic of hierarchical chart analysis.