annotation tool
AI-Boosted Video Annotation: Assessing the Process Enhancement
Gutiérrez, Juan, Mora, Ángel, Regodón, Pablo, Rodriguez, Silvia, Blanco, José Luis
We explore the enhancement of Human-in-the-Loop video annotation by integrating automatic capabilities to ease the task for annotators and assess their performance. The research delves into the practical implications of the annotation processes, the integration of AI components, and the evaluation of its outcomes. We analyze their impact on efficiency, accuracy, and overall annotation quality. Focusing on the Human-in-the-Loop for video annotation tasks, we implemented a single-iteration scheme using Label Studio and AI-powered zero-shot pre-annotations. Using this framework, we designed a test based on the annotation of the UCF-Crime dataset to discriminate between normal and abnormal activities in video footage. Our results evidence how automatic AI-based pre-annotation can streamline the video annotation workflow, empowering human annotators and optimizing the overall pipeline. Using the pre-annotated data, we observed a 35% reduction in the annotation time for 70% of the annotators with similar quality annotations, compared to the traditional manual annotation task. Results are consistent with asset duration and complexity. We also observed that while annotators rapidly learned to use the tool, the produced annotations are more coherent among annotators and better match the natural clustering of the video frames.
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in Videos
Giedemann, Patrick, von Däniken, Pius, Deriu, Jan, Rodrigo, Alvaro, Peñas, Anselmo, Cieliebak, Mark
The growing influence of video content as a medium for communication and misinformation underscores the urgent need for effective tools to analyze claims in multilingual and multi-topic settings. Existing efforts in misinformation detection largely focus on written text, leaving a significant gap in addressing the complexity of spoken text in video transcripts. We introduce ViClaim, a dataset of 1,798 annotated video transcripts across three languages (English, German, Spanish) and six topics. Each sentence in the transcripts is labeled with three claim-related categories: fact-check-worthy, fact-non-check-worthy, or opinion. We developed a custom annotation tool to facilitate the highly complex annotation process. Experiments with state-of-the-art multilingual language models demonstrate strong performance in cross-validation (macro F1 up to 0.896) but reveal challenges in generalization to unseen topics, particularly for distinct domains. Our findings highlight the complexity of claim detection in video transcripts. ViClaim offers a robust foundation for advancing misinformation detection in video-based communication, addressing a critical gap in multimodal analysis.
- Europe > United Kingdom (0.14)
- Europe > Ukraine (0.05)
- Europe > Russia (0.04)
- (12 more...)
- Media > News (0.91)
- Government > Voting & Elections (0.68)
- Government > Regional Government > North America Government > United States Government (0.47)
Supplementary Materials: Humans in Kitchens: A Dataset for Multi-Person Human Motion Forecasting with Scene Context
Figure 1: Sample scenes with 3d human poses projected onto camera views for each kitchen. A sample skeleton can be seen in Figure 2. frames: t; frame number in actual dataset time act: t 82; action annotations, where 1 determines an action and 0 its absence. On top of that, SMPL's shape parameter determines limb length ensuring that the body skeleton remains consistent across time. We bear all responsibility in case of violation of rights. Please note that the dataset can be used without the video data.
Annotate Rhetorical Relations with INCEpTION: A Comparison with Automatic Approaches
Automatically identifying rhetorical relations in discourse units is a challenging task in natural language processing (NLP) because it should be able to logically and semantically connect the discourse units. Although large language models (LLMs) shows po tential for application in many domains, including text classification tasks, their effectiveness in predicting rhetorical relations remains open for research. One of the major challenges in this domain is the lack of annotated data sets capturing differen t rhetorical relations, which would then make model training more difficult. In this research, we manually created the da-tasets from various cricket reports and then annotated the reports as discourse units. We used the INCEpTION annotation tools for annotation and then structured the dataset for the machine - learning model.
- North America > United States > California (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
EduCoder: An Open-Source Annotation System for Education Transcript Data
Pan, Guanzhong, Tan, Mei, Nam, Hyunji, Langlois, Lucía, Malamut, James, Deonizio, Liliana, Demszky, Dorottya
We introduce EduCoder, a domain-specialized tool designed to support utterance-level annotation of educational dialogue. While general-purpose text annotation tools for NLP and qualitative research abound, few address the complexities of coding education dialogue transcripts -- with diverse teacher-student and peer interactions. Common challenges include defining codebooks for complex pedagogical features, supporting both open-ended and categorical coding, and contextualizing utterances with external features, such as the lesson's purpose and the pedagogical value of the instruction. EduCoder is designed to address these challenges by providing a platform for researchers and domain experts to collaboratively define complex codebooks based on observed data. It incorporates both categorical and open-ended annotation types along with contextual materials. Additionally, it offers a side-by-side comparison of multiple annotators' responses, allowing comparison and calibration of annotations with others to improve data reliability. The system is open-source, with a demo video available.
- South America > Uruguay > Maldonado > Maldonado (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- (4 more...)
- Research Report (1.00)
- Instructional Material (0.93)
Enhancing Sports Strategy with Video Analytics and Data Mining: Automated Video-Based Analytics Framework for Tennis Doubles
We present a comprehensive video-based analytics framework for tennis doubles that addresses the lack of automated analysis tools for this strategically complex sport. Our approach introduces a standardised annotation methodology encompassing player positioning, shot types, court formations, and match outcomes, coupled with a specialised annotation tool designed to meet the unique requirements of tennis video labelling. The framework integrates advanced machine learning techniques including GroundingDINO for precise player localisation through natural language grounding and YOLO-Pose for robust pose estimation. This combination significantly reduces manual annotation effort whilst improving data consistency and quality. We evaluate our approach on doubles tennis match data and demonstrate that CNN-based models with transfer learning substantially outperform pose-based methods for predicting shot types, player positioning, and formations. The CNN models effectively capture complex visual and contextual features essential for doubles tennis analysis. Our integrated system bridges advanced analytical capabilities with the strategic complexities of tennis doubles, providing a foundation for automated tactical analysis, performance evaluation, and strategic modelling in professional tennis.
A Spatial Relationship Aware Dataset for Robotics
Wang, Peng, Pham, Minh Huy, Guo, Zhihao, Zhou, Wei
Robotic task planning in real-world environments requires not only object recognition but also a nuanced understanding of spatial relationships between objects. We present a spatial-relationship-aware dataset of nearly 1,000 robot-acquired indoor images, annotated with object attributes, positions, and detailed spatial relationships. Captured using a Boston Dynamics Spot robot and labelled with a custom annotation tool, the dataset reflects complex scenarios with similar or identical objects and intricate spatial arrangements. We benchmark six state-of-the-art scene-graph generation models on this dataset, analysing their inference speed and relational accuracy. Our results highlight significant differences in model performance and demonstrate that integrating explicit spatial relationships into foundation models, such as ChatGPT 4o, substantially improves their ability to generate executable, spatially-aware plans for robotics. The dataset and annotation tool are publicly available at https://github.com/PengPaulWang/SpatialAwareRobotDataset, supporting further research in spatial reasoning for robotics.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.06)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > Wales > Cardiff (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.74)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
SALT: A Flexible Semi-Automatic Labeling Tool for General LiDAR Point Clouds with Cross-Scene Adaptability and 4D Consistency
Wang, Yanbo, Chen, Yongtao, Cao, Chuan, Deng, Tianchen, Zhao, Wentao, Wang, Jingchuan, Chen, Weidong
We propose a flexible Semi-Automatic Labeling Tool (SALT) for general LiDAR point clouds with cross-scene adaptability and 4D consistency. Unlike recent approaches that rely on camera distillation, SALT operates directly on raw LiDAR data, automatically generating pre-segmentation results. To achieve this, we propose a novel zero-shot learning paradigm, termed data alignment, which transforms LiDAR data into pseudo-images by aligning with the training distribution of vision foundation models. Additionally, we design a 4D-consistent prompting strategy and 4D non-maximum suppression module to enhance SAM2, ensuring high-quality, temporally consistent presegmentation. SALT surpasses the latest zero-shot methods by 18.4% PQ on SemanticKITTI and achieves nearly 40-50% of human annotator performance on our newly collected low-resolution LiDAR data and on combined data from three LiDAR types, significantly boosting annotation efficiency. We anticipate that SALT's open-sourcing will catalyze substantial expansion of current LiDAR datasets and lay the groundwork for the future development of LiDAR foundation models. Code is available at https://github.com/Cavendish518/SALT.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > China > Jiangsu Province > Yancheng (0.04)
Have LLMs Made Active Learning Obsolete? Surveying the NLP Community
Romberg, Julia, Schröder, Christopher, Gonsior, Julius, Tomanek, Katrin, Olsson, Fredrik
Supervised learning relies on annotated data, which is expensive to obtain. A longstanding strategy to reduce annotation costs is active learning, an iterative process, in which a human annotates only data instances deemed informative by a model. Large language models (LLMs) have pushed the effectiveness of active learning, but have also improved methods such as few- or zero-shot learning, and text synthesis - thereby introducing potential alternatives. This raises the question: has active learning become obsolete? To answer this fully, we must look beyond literature to practical experiences. We conduct an online survey in the NLP community to collect previously intangible insights on the perceived relevance of data annotation, particularly focusing on active learning, including best practices, obstacles and expected future developments. Our findings show that annotated data remains a key factor, and active learning continues to be relevant. While the majority of active learning users find it effective, a comparison with a community survey from over a decade ago reveals persistent challenges: setup complexity, estimation of cost reduction, and tooling. We publish an anonymized version of the collected dataset
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (21 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Education (0.93)
- Information Technology > Security & Privacy (0.68)