AITopics | image coordinate

Collaborating Authors

image coordinate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DIJE: Dense Image Jacobian Estimation for Robust Robotic Self-Recognition and Visual Servoing

Toshimitsu, Yasunori, Kawaharazuka, Kento, Miki, Akihiro, Okada, Kei, Inaba, Masayuki

arXiv.org Artificial IntelligenceJul-2-2025

For robots to move in the real world, they must first correctly understand the state of its own body and the tools that it holds. In this research, we propose DIJE, an algorithm to estimate the image Jacobian for every pixel. It is based on an optical flow calculation and a simplified Kalman Filter that can be efficiently run on the whole image in real time. It does not rely on markers nor knowledge of the robotic structure. We use the DIJE in a self-recognition process which can robustly distinguish between movement by the robot and by external entities, even when the motion overlaps. We also propose a visual servoing controller based on DIJE, which can learn to control the robot's body to conduct reaching movements or bimanual tool-tip control. The proposed algorithms were implemented on a physical musculoskeletal robot and its performance was verified. We believe that such global estimation of the visuomotor policy has the potential to be extended into a more general framework for manipulation.

artificial intelligence, machine learning, robot, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS47612.2022.9981868

2507.00446

Country: Asia > Japan (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Role of Uncertainty in Model Development and Control Design for a Manufacturing Process

Li, Rongfei, Assadian, Francis

arXiv.org Artificial IntelligenceJun-17-2025

The use of robotic technology has drastically increased in manufacturing in the 21st century. But by utilizing their sensory cues, humans still outperform machines, especially in the micro scale manufacturing, which requires high-precision robot manipulators. These sensory cues naturally compensate for high level of uncertainties that exist in the manufacturing environment. Uncertainties in performing manufacturing tasks may come from measurement noise, model inaccuracy, joint compliance (e.g., elasticity) etc. Although advanced metrology sensors and high-precision microprocessors, which are utilized in nowadays robots, have compensated for many structural and dynamic errors in robot positioning, but a well-designed control algorithm still works as a comparable and cheaper alternative to reduce uncertainties in automated manufacturing. Our work illustrates that a multi-robot control system can reduce various uncertainties to a great amount.

artificial intelligence, controller, transfer function, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.5772/intechopen.104780

2506.12273

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report (0.81)

Industry: Aerospace & Defense (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.66)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)

Add feedback

A Novel Feedforward Youla Parameterization Method for Avoiding Local Minima in Stereo Image Based Visual Servoing Control

Li, Rongfei, Assadian, Francis

arXiv.org Artificial IntelligenceJun-13-2025

In robot navigation and manipulation, accurately determining the camera's pose relative to the environment is crucial for effective task execution. In this paper, we systematically prove that this problem corresponds to the Perspective-3-Point (P3P) formulation, where exactly three known 3D points and their corresponding 2D image projections are used to estimate the pose of a stereo camera. In image-based visual servoing (IBVS) control, the system becomes overdetermined, as the 6 degrees of freedom (DoF) of the stereo camera must align with 9 observed 2D features in the scene. When more constraints are imposed than available DoFs, global stability cannot be guaranteed, as the camera may become trapped in a local minimum far from the desired configuration during servoing. To address this issue, we propose a novel control strategy for accurately positioning a calibrated stereo camera. Our approach integrates a feedforward controller with a Youla parameterization-based feedback controller, ensuring robust servoing performance. Through simulations, we demonstrate that our method effectively avoids local minima and enables the camera to reach the desired pose accurately and efficiently.

artificial intelligence, image coordinate, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/app15094991

2506.10252

Country: North America > United States > California (0.46)

Genre: Research Report (0.64)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Control Systems (0.94)
(2 more...)

Add feedback

Innovative Adaptive Imaged Based Visual Servoing Control of 6 DoFs Industrial Robot Manipulators

Li, Rongfei, Assadian, Francis

arXiv.org Artificial IntelligenceJun-13-2025

Image-based visual servoing (IBVS) methods have been well developed and used in many applications, especially in pose (position and orientation) alignment. However, most research papers focused on developing control solutions when 3D point features can be detected inside the field of view. This work proposes an innovative feedforward-feedback adaptive control algorithm structure with the Youla Parameterization method. A designed feature estimation loop ensures stable and fast motion control when point features are outside the field of view. As 3D point features move inside the field of view, the IBVS feedback loop preserves the precision of the pose at the end of the control period. Also, an adaptive controller is developed in the feedback loop to stabilize the system in the entire range of operations. The nonlinear camera and robot manipulator model is linearized and decoupled online by an adaptive algorithm. The adaptive controller is then computed based on the linearized model evaluated at current linearized point. The proposed solution is robust and easy to implement in different industrial robotic systems. Various scenarios are used in simulations to validate the effectiveness and robust performance of the proposed controller.

artificial intelligence, controller, matrix, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.5772/intechopen.1004857

2506.1024

Country:

Asia (0.68)
North America > United States > California > Yolo County > Davis (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.34)

Add feedback

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Li, Xiang, Mata, Cristina, Park, Jongwoo, Kahatapitiya, Kumara, Jang, Yoo Sung, Shang, Jinghuan, Ranasinghe, Kanchana, Burgert, Ryan, Cai, Mu, Lee, Yong Jae, Ryoo, Michael S.

arXiv.org Artificial IntelligenceJun-28-2024

Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with auxiliary data that complements policy learning. LLMs with visual inputs, i.e., Vision Language Models (VLMs), have the capacity to process state information as visual-textual prompts and generate optimal policy decisions in text. To train such action policy VLMs, we first introduce an automated pipeline to generate diverse high-quality robotics instruction data from existing behavior cloning data. A VLM finetuned with the resulting collection of datasets based on a conversation-style formulation tailored for robotics tasks, can generate meaningful robot action policy decisions. Our experiments across multiple simulated and real-world environments demonstrate the state-of-the-art performance of the proposed LLaRA framework. The code, datasets, and pretrained models are available at https://github.com/LostXine/LLaRA.

center distance, dataset, top left corner, (13 more...)

arXiv.org Artificial Intelligence

2406.20095

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards Learning Monocular 3D Object Localization From 2D Labels using the Physical Laws of Motion

Kienzle, Daniel, Lorenz, Julian, Ludwig, Katja, Lienhart, Rainer

arXiv.org Artificial IntelligenceNov-29-2023

We present a novel method for precise 3D object localization in single images from a single calibrated camera using only 2D labels. No expensive 3D labels are needed. Thus, instead of using 3D labels, our model is trained with easy-to-annotate 2D labels along with the physical knowledge of the object's motion. Given this information, the model can infer the latent third dimension, even though it has never seen this information during training. Our method is evaluated on both synthetic and real-world datasets, and we are able to achieve a mean distance error of just 6 cm in our experiments on real data. The results indicate the method's potential as a step towards learning 3D object location estimation, where collecting 3D data for training is not feasible.

camera location, dataset, video, (16 more...)

arXiv.org Artificial Intelligence

2310.17462

Country:

North America > Canada > Ontario > Hamilton (0.04)
Europe > Germany (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Sports (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Single Image Automatic Radial Distortion Compensation Using Deep Convolutional Network

Janos, Igor, Benesova, Wanda

arXiv.org Artificial IntelligenceDec-14-2021

In many computer vision domains, the input images must conform with the pinhole camera model, where straight lines in the real world are projected as straight lines in the image. Performing computer vision tasks on live sports broadcast footage imposes challenging requirements where the algorithms cannot rely on a specific calibration pattern must be able to cope with unknown and uncalibrated cameras, radial distortion originating from complex television lenses, few visual clues to compensate distortion by, and the necessity for real-time performance. We present a novel method for single-image automatic lens distortion compensation based on deep convolutional neural networks, capable of real-time performance and accuracy using two highest-order coefficients of the polynomial distortion model operating in the application domain of sports broadcast. Keywords: Deep Convolutional Neural Network, Radial Distortion, Single Image Rectification

distortion, neural network, straight line, (16 more...)

arXiv.org Artificial Intelligence

2112.08198

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network

Zhang, Guoqiang, Li, Haopeng, Wenger, Fabian

arXiv.org Machine LearningFeb-4-2019

Typical sensors for object detection include cameras, radars,and LiDARs. In general, different sensors have their unique sensing properties, which brings each type of sensor an advantage overothers when performing object detection. For instance, cameras are able to capture rich texture information of objects in normal light conditions, which makes it possible to identify and distinguish objectsfrom background. Radars attempt to detect objects by continuously transmitting microwaves and then analyzing the received signalsreflected by the objects, which allow the sensors to work regardless of bad weather conditions or dark environments. In recent years, object detection based on cameras has made significant progressby using deep learning framework. The basic idea is to design and train a deep neural network (DNN) by feeding a large number of annotated image samples. The training process enables theDNN to effectively capture informative image features of interested objects via multiple neural layers [2]. As a result, the trained DNN is able to produce impressive performance for visual object detection and other similar tasks such as object classification and segmentation (e.g., Mask R-CNN [3], YOLO [4], and U-Net [5]). Researchon exploiting DNNs for analyzing radar signals is still at an early stage.

radar signal, range-doppler spectrum, spectrum, (16 more...)

arXiv.org Machine Learning

1902.05394

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Sweden (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Self-Supervised Aerial Images Analysis for Extracting Parking lot Structure

Seo, Young-Woo (Robotics Institute, Carnegie Mellon University) | Ratliff, Nathan (Robotics Institute, Carnegie Mellon University) | Urmson, Chris (Robotics Institute, Carnegie Mellon University)

AAAI ConferencesJun-23-2009

Road network information simplifies autonomous driving by providing strong priors about environments. It informs a robotic vehicle with where it can drive, models of what can be expected, and contextual cues that influence driving behaviors. Currently, however, road network information is manually generated using a combination of GPS survey and aerial imagery. These manual techniques are labor intensive and error prone. To full exploit the benefits of digital imagery, these processes should be automated. As a step toward this goal, we present an algorithm that extracts the structure of parking lot visible from a given aerial image. To minimize human intervention in the use of aerial imagery, we devise a self-supervised learning algorithm that automatically generates a set of parking spot templates to learn the appearance of a parking lot and estimates the structure of the parking lot from the learned model. The data set extracted from a single image alone is too small to sufficiently learn an accurate parking spot model. However, strong priors trained using large data sets collected across multiple images dramatically improvce performance. Our self-supervised approach outperforms the prior alone by adapting the distribution of examples toward that found in the current image. A thorough empirical analysis compares leading state-of-the-art learning techniques on this problem.

hypothesis, parking lot, parking spot, (16 more...)

AAAI Conferences

Twenty-First International Joint Conference on Artificial Intelligence

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.73)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)
(2 more...)

Add feedback