AITopics | camera angle

Collaborating Authors

camera angle

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix

Neural Information Processing SystemsJun-17-2026, 16:20:42 GMT

A.1 Details of Dimension Design We argue that multi-dimensional evaluation is significant to visual caption evaluation and is more comprehensive than previous work. So how to choose proper dimensions? We refer to existing VQA benchmarks [62, 63, 64, 65] and visual generation benchmarks [31, 32, 33]. VQA benchmarks usually design various types of questions to include multi-dimensional evaluation and analysis of MLLMs. For instance, MMBench [64] defines 20 ability dimensions, including attribute recognition, attribute comparison, action recognition, spatial relationship, physical property, OCR, object localization, image style, image scene, identity reasoning, etc. MVBench [64] covers 20 challenging video tasks including action, object, position, count, scene, pose, attribute, character, cognition, etc. Due to the flexible design of questions, VQA benchmarks can be naturally built with comprehensive dimensions. Different from the VQA task, the visual caption task does not require specific questions, but inspects the alignment of visual and textual information. Visual generation is the inverse task of visual captioning, as it requires models to generate specific visual content based on detailed textual descriptions. GenEval [31] designs 6 different tasks to evaluate text-to-image alignment, including single object, two object, counting, colors, position, and attribute binding. VBench [32] comprises 16 dimensions, including subject consistency, background consistency, object class, human action, color, spatial relationship, scene, style, etc. We follow their explored dimensions to design proper dimensions for visual captioning. Finally, we design 6 views, covering object, global, text, camera, temporal, and knowledge. The object-related view includes object category, object color, object 1 number, and spatial relation, the global-related view includes scene and style, the text-related view evaluates the OCR capability of captions, the camera-related view covers the camera angle and movement, the temporal-related view contains action and event, and we also design a view to evaluate the knowledge of MLLMs, i.e., character identification. We believe these dimensions contribute to a comprehensive visual caption benchmarking.

dimension, large language model, machine learning, (23 more...)

Neural Information Processing Systems

Industry: Media (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.54)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

5f2809607f692d79a01c05c43d702883-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-17-2026, 16:20:38 GMT

V multimodal isual captioning large benchma language rks models have become (MLLMs), outdated as the with brief the ground-truth emergence of sentences modern and benchmarks centric incomplete traditional evaluation, visual attempt metri the elem cs to y address remain f ent ail co to v assess limited erage.

dimension, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Media > Photography (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

A Deep Learning-Based CCTV System for Automatic Smoking Detection in Fire Exit Zones

Sadat, Sami, Hossain, Mohammad Irtiza, Sifat, Junaid Ahmed, Rafi, Suhail Haque, Alvi, Md. Waseq Alauddin, Rhaman, Md. Khalilur

arXiv.org Artificial IntelligenceOct-28-2025

A deep learning real-time smoking detection system for CCTV surveillance of fire exit areas is proposed in this research due to its critical safety requirements. The dataset contained 8,124 images which came from 20 different scenarios along with images from 2,708 raw samples demonstrating low-light areas. We implemented an evaluation of three advanced object detection models which included YOLOv8 and YOLOv11 and YOLOv12 followed by development of our custom model that derived its design from YOLOv8 through added structures for facing demanding surveillance contexts. The proposed model outperformed other evaluated models by reaching recall of 78.90% and mAP@50 of 83.70% to deliver optimal object identification and detection results across different environments. A performance evaluation for inference involved analysing multiple edge devices through mul-tithreaded operations. The Jetson Xavier NX processed information at the fastest real-time rate of 52-97 ms which established its suitability for time-sensitive operations. The study establishes the proposed system delivers a fair and adjustable platform to monitor public safety processes while enabling automatic regulatory compliance checks.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.11696

Country: Asia > Bangladesh (0.16)

Genre: Research Report (0.64)

Industry: Commercial Services & Supplies > Security & Alarm Services (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Visio-Verbal Teleimpedance Interface: Enabling Semi-Autonomous Control of Physical Interaction via Eye Tracking and Speech

Jekel, Henk H. A., Rosales, Alejandro Díaz, Peternel, Luka

arXiv.org Artificial IntelligenceAug-28-2025

The paper presents a visio-verbal teleimpedance interface for commanding 3D stiffness ellipsoids to the remote robot with a combination of the operator's gaze and verbal interaction. The gaze is detected by an eye-tracker, allowing the system to understand the context in terms of what the operator is currently looking at in the scene. Along with verbal interaction, a Visual Language Model (VLM) processes this information, enabling the operator to communicate their intended action or provide corrections. Based on these inputs, the interface can then generate appropriate stiffness matrices for different physical interaction actions. To validate the proposed visio-verbal teleimpedance interface, we conducted a series of experiments on a setup including a Force Dimension Sigma.7 haptic device to control the motion of the remote Kuka LBR iiwa robotic arm. The human operator's gaze is tracked by Tobii Pro Glasses 2, while human verbal commands are processed by a VLM using GPT-4o. The first experiment explored the optimal prompt configuration for the interface. The second and third experiments demonstrated different functionalities of the interface on a slide-in-the-groove task.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.20037

Country: Europe (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

Terrain-Aware Adaptation for Two-Dimensional UAV Path Planners

Karakontis, Kostas, Petsanis, Thanos, Kapoutsis, Athanasios Ch., Kapoutsis, Pavlos Ch., Kosmatopoulos, Elias B.

arXiv.org Artificial IntelligenceJul-25-2025

-- Multi-UA V Coverage Path Planning (mCPP) algorithms in popular commercial software typically treat a Region of Interest (RoI) only as a 2D plane, ignoring important 3D structure characteristics. This leads to incomplete 3D reconstructions, especially around occluded or vertical surfaces. In this paper, we propose a modular algorithm that can extend commercial two-dimensional path planners to facilitate terrain-aware planning by adjusting altitude and camera orientations. T o demonstrate it, we extend the well-known DARP (Divide Areas for Optimal Multi-Robot Coverage Path Planning) algorithm and produce DARP-3D. Compared to baseline, our approach consistently captures improved 3D reconstructions, particularly in areas with significant vertical features. An open-source implementation of the algorithm is available here: https://github.com/konskara/T

algorithm, artificial intelligence, planning & scheduling, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/MED64031.2025.11073484

2507.17519

Country: Europe > Greece (0.28)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry:

Transportation > Air (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)

Add feedback

DJI Air 3S review: LiDAR and improved image quality make for a nearly faultless drone

EngadgetOct-15-2024, 13:00:02 GMT

DJI just announced the dual-camera Air 3S drone and there's some all-new cutting-edge tech hiding in the nose. A LiDAR sensor is there to provide extra crash protection at night, a time that's often dangerous for drones. The Air 3S also has a new main camera with a larger sensor better suited for capturing video in low-light. And it now comes with the company's ActiveTrack 360, which it first introduced in the Mini 4 Pro, allowing the device to zoom all around your subject while tracking and filming them. There are a bunch of other little improvements, from storage to the new panoramic photo mode, all at the same 1,099 price as the Air 3 was at launch.

activetrack 360, drone, sensor, (14 more...)

Engadget

Country:

North America > United States (0.05)
Europe (0.05)

Industry: Media > Photography (0.35)

Technology: Information Technology > Artificial Intelligence (0.91)

Add feedback

Optimizing Parking Space Classification: Distilling Ensembles into Lightweight Classifiers

Alves, Paulo Luza, Hochuli, André, de Oliveira, Luiz Eduardo, de Almeida, Paulo Lisboa

arXiv.org Artificial IntelligenceOct-7-2024

When deploying large-scale machine learning models for smart city applications, such as image-based parking lot monitoring, data often must be sent to a central server to perform classification tasks. This is challenging for the city's infrastructure, where image-based applications require transmitting large volumes of data, necessitating complex network and hardware infrastructures to process the data. To address this issue in image-based parking space classification, we propose creating a robust ensemble of classifiers to serve as Teacher models. These Teacher models are distilled into lightweight and specialized Student models that can be deployed directly on edge devices. The knowledge is distilled to the Student models through pseudo-labeled samples generated by the Teacher model, which are utilized to fine-tune the Student models on the target scenario. Our results show that the Student models, with 26 times fewer parameters than the Teacher models, achieved an average accuracy of 96.6% on the target test datasets, surpassing the Teacher models, which attained an average accuracy of 95.3%.

artificial intelligence, machine learning, student model, (15 more...)

arXiv.org Artificial Intelligence

2410.14705

Country:

South America > Brazil > Paraná > Curitiba (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report > New Finding (0.86)

Industry:

Education (1.00)
Transportation > Infrastructure & Services (0.87)
Transportation > Ground > Road (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Online Distribution Shift Detection via Recency Prediction

Luo, Rachel, Sinha, Rohan, Sun, Yixiao, Hindy, Ali, Zhao, Shengjia, Savarese, Silvio, Schmerling, Edward, Pavone, Marco

arXiv.org Artificial IntelligenceSep-28-2023

When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical. However, most existing methods for detecting distribution shift are not well-suited to robotics settings, where data often arrives in a streaming fashion and may be very high-dimensional. In this work, we present an online method for detecting distribution shift with guarantees on the false positive rate - i.e., when there is no distribution shift, our system is very unlikely (with probability $< \epsilon$) to falsely issue an alert; any alerts that are issued should therefore be heeded. Our method is specifically designed for efficient detection even with high dimensional data, and it empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work while maintaining a low false negative rate in practice (whenever there is a distribution shift in our experiments, our method indeed emits an alert). We demonstrate our approach in both simulation and hardware for a visual servoing task, and show that our method indeed issues an alert before a failure occurs.

alert, distribution shift, martingale, (16 more...)

arXiv.org Artificial Intelligence

2211.09916

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Transportation (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

A View Independent Classification Framework for Yoga Postures

Chasmai, Mustafa, Das, Nirjhar, Bhardwaj, Aman, Garg, Rahul

arXiv.org Artificial IntelligenceAug-14-2022

Yoga is a globally acclaimed and widely recommended practice for a healthy living. Maintaining correct posture while performing a Yogasana is of utmost importance. In this work, we employ transfer learning from Human Pose Estimation models for extracting 136 key-points spread all over the body to train a Random Forest classifier which is used for estimation of the Yogasanas. The results are evaluated on an in-house collected extensive yoga video database of 51 subjects recorded from 4 different camera angles. We propose a 3 step scheme for evaluating the generalizability of a Yoga classifier by testing it on 1) unseen frames, 2) unseen subjects, and 3) unseen camera angles. We argue that for most of the applications, validation accuracies on unseen subjects and unseen camera angles would be most important. We empirically analyze over three public datasets, the advantage of transfer learning and the possibilities of target leakage. We further demonstrate that the classification accuracies critically depend on the cross validation method employed and can often be misleading. To promote further research, we have made key-points dataset and code publicly available.

camera angle, dataset, pose estimation, (14 more...)

arXiv.org Artificial Intelligence

2206.13577

Country: Asia > India (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in Darts using a Single Camera

McNally, William, Walters, Pascale, Vats, Kanav, Wong, Alexander, McPhee, John

arXiv.org Artificial IntelligenceMay-20-2021

Existing multi-camera solutions for automatic scorekeeping in steel-tip darts are very expensive and thus inaccessible to most players. Motivated to develop a more accessible low-cost solution, we present a new approach to keypoint detection and apply it to predict dart scores from a single image taken from any camera angle. This problem involves detecting multiple keypoints that may be of the same class and positioned in close proximity to one another. The widely adopted framework for regressing keypoints using heatmaps is not well-suited for this task. To address this issue, we instead propose to model keypoints as objects. We develop a deep convolutional neural network around this idea and use it to predict dart locations and dartboard calibration points within an overall pipeline for automatic dart scoring, which we call DeepDarts. Additionally, we propose several task-specific data augmentation strategies to improve the generalization of our method. As a proof of concept, two datasets comprising 16k images originating from two different dartboard setups were manually collected and annotated to evaluate the system. In the primary dataset containing 15k images captured from a face-on view of the dartboard using a smartphone, DeepDarts predicted the total score correctly in 94.7% of the test images. In a second more challenging dataset containing limited training data (830 images) and various camera angles, we utilize transfer learning and extensive data augmentation to achieve a test accuracy of 84.0%. Because DeepDarts relies only on single images, it has the potential to be deployed on edge devices, giving anyone with a smartphone access to an automatic dart scoring system for steel-tip darts. The code and datasets are available.

dartboard, detection, keypoint, (15 more...)

arXiv.org Artificial Intelligence

2105.0988

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback