Goto

Collaborating Authors

 motorcycle


LearningDebiasedandDisentangledRepresentations forSemanticSegmentation

Neural Information Processing Systems

Despite such phenomenal achievement, semantic segmentation approaches still suffer from the chronic limitations caused byclass imbalance andstereotyped scene contextindatasets.


Riding the Rockies on the Ducati XDiavel V4

Popular Science

The bike provides impressive ease of use to go with its high performance and visceral engine character. Breakthroughs, discoveries, and DIY tips sent every weekday. Ducati's new low-slung 168-horsepower muscle bike is meant to appeal to sport bike riders who have tired of the racer-crouch riding position but still want a sophisticated and powerful ride. Chinnock's not just the CEO; he's also put in his miles on the race-replica bikes that build Ducati's reputation for performance. The XDiavel V4's 1,158cc four-cylinder engine is the bike's centerpiece, both visually and technically.

  Country:
  Industry:

Symbolic Graphics Programming with Large Language Models

Chen, Yamei, Zhang, Haoquan, Huang, Yangyi, Qiu, Zeju, Zhang, Kaipeng, Wen, Yandong, Liu, Weiyang

arXiv.org Artificial Intelligence

Large language models (LLMs) excel at program synthesis, yet their ability to produce symbolic graphics programs (SGPs) that render into precise visual content remains underexplored. We study symbolic graphics programming, where the goal is to generate an SGP from a natural-language description. This task also serves as a lens into how LLMs understand the visual world by prompting them to generate images rendered from SGPs. Among various SGPs, our paper sticks to scalable vector graphics (SVGs). We begin by examining the extent to which LLMs can generate SGPs. To this end, we introduce SGP-GenBench, a comprehensive benchmark covering object fidelity, scene fidelity, and compositionality (attribute binding, spatial relations, numeracy). On SGP-GenBench, we discover that frontier proprietary models substantially outperform open-source models, and performance correlates well with general coding capabilities. Motivated by this gap, we aim to improve LLMs' ability to generate SGPs. We propose a reinforcement learning (RL) with verifiable rewards approach, where a format-validity gate ensures renderable SVG, and a cross-modal reward aligns text and the rendered image via strong vision encoders (e.g., SigLIP for text-image and DINO for image-image). Applied to Qwen-2.5-7B, our method substantially improves SVG generation quality and semantics, achieving performance on par with frontier systems. We further analyze training dynamics, showing that RL induces (i) finer decomposition of objects into controllable primitives and (ii) contextual details that improve scene coherence. Our results demonstrate that symbolic graphics programming offers a precise and interpretable lens on cross-modal grounding.


Vehicle detection from GSV imagery: Predicting travel behaviour for cycling and motorcycling using Computer Vision

Kyriaki, null, Kokka, null, Goel, Rahul, Abbas, Ali, Nice, Kerry A., Martial, Luca, Labib, SM, Ke, Rihuan, Schönlieb, Carola Bibiane, Woodcock, James

arXiv.org Artificial Intelligence

Transportation influence health by shaping exposure to physical activity, air pollution and injury risk. Comparative data on cycling and motorcycling behaviours is scarce, particularly at a global scale. Street view imagery, such as Google Street View (GSV), combined with computer vision, is a valuable resource for efficiently capturing travel behaviour data. This study demonstrates a novel approach using deep learning on street view images to estimate cycling and motorcycling levels across diverse cities worldwide. We utilized data from 185 global cities. The data on mode shares of cycling and motorcycling estimated using travel surveys or censuses. We used GSV images to detect cycles and motorcycles in sampled locations, using 8000 images per city. The YOLOv4 model, fine-tuned using images from six cities, achieved a mean average precision of 89% for detecting cycles and motorcycles. A global prediction model was developed using beta regression with city-level mode shares as outcome, with log transformed explanatory variables of counts of GSV-detected images with cycles and motorcycles, while controlling for population density. We found strong correlations between GSV motorcycle counts and motorcycle mode share (0.78) and moderate correlations between GSV cycle counts and cycling mode share (0.51). Beta regression models predicted mode shares with $R^2$ values of 0.614 for cycling and 0.612 for motorcycling, achieving median absolute errors (MDAE) of 1.3% and 1.4%, respectively. Scatterplots demonstrated consistent prediction accuracy, though cities like Utrecht and Cali were outliers. The model was applied to 60 cities globally for which we didn't have recent mode share data. We provided estimates for some cities in the Middle East, Latin America and East Asia. With computer vision, GSV images capture travel modes and activity, providing insights alongside traditional data sources.



FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation

Jing, Liqiang, Lai, Viet, Yoon, Seunghyun, Bui, Trung, Du, Xinya

arXiv.org Artificial Intelligence

Video Multimodal Large Language Models (VideoMLLMs) have achieved remarkable progress in both Video-to-Text and Text-to-Video tasks. However, they often suffer fro hallucinations, generating content that contradicts the visual input. Existing evaluation methods are limited to one task (e.g., V2T) and also fail to assess hallucinations in open-ended, free-form responses. To address this gap, we propose FIFA, a unified FaIthFulness evAluation framework that extracts comprehensive descriptive facts, models their semantic dependencies via a Spatio-Temporal Semantic Dependency Graph, and verifies them using VideoQA models. We further introduce Post-Correction, a tool-based correction framework that revises hallucinated content. Extensive experiments demonstrate that FIFA aligns more closely with human judgment than existing evaluation methods, and that Post-Correction effectively improves factual consistency in both text and video generation.


Ducati adds 50 tiny sensors to motorbikes to amp up its racing game

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. MotoGP is high-speed, high-tech motorcycle racing. The fastest riders in the world compete on specialized, purpose-built motorcycles from companies like Ducati, Honda, Yamaha on the world stage in this series, which is considered the most prestigious in the game. Riders reach incredible speeds on their machines up to 220 miles per hour, and races can go 350 turns with gravity-defying leaning that scrapes elbows and knees. This Grand Prix is for the toughest of the tough on the moto circuit.


CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis

Zhen, Hao, Yang, Jidong J.

arXiv.org Artificial Intelligence

Road crashes claim over 1.3 million lives annually worldwide and incur global economic losses exceeding \$1.8 trillion. Such profound societal and financial impacts underscore the urgent need for road safety research that uncovers crash mechanisms and delivers actionable insights. Conventional statistical models and tree ensemble approaches typically rely on structured crash data, overlooking contextual nuances and struggling to capture complex relationships and underlying semantics. Moreover, these approaches tend to incur significant information loss, particularly in narrative elements related to multi-vehicle interactions, crash progression, and rare event characteristics. This study presents CrashSage, a novel Large Language Model (LLM)-centered framework designed to advance crash analysis and modeling through four key innovations. First, we introduce a tabular-to-text transformation strategy paired with relational data integration schema, enabling the conversion of raw, heterogeneous crash data into enriched, structured textual narratives that retain essential structural and relational context. Second, we apply context-aware data augmentation using a base LLM model to improve narrative coherence while preserving factual integrity. Third, we fine-tune the LLaMA3-8B model for crash severity inference, demonstrating superior performance over baseline approaches, including zero-shot, zero-shot with chain-of-thought prompting, and few-shot learning, with multiple models (GPT-4o, GPT-4o-mini, LLaMA3-70B). Finally, we employ a gradient-based explainability technique to elucidate model decisions at both the individual crash level and across broader risk factor dimensions. This interpretability mechanism enhances transparency and enables targeted road safety interventions by providing deeper insights into the most influential factors.


This flying motorcycle can take you from traffic to sky in minutes

FOX News

The Skyrider X1 combines land and air travel in one sleek design. The unveiling of the Skyrider X1, which claims to be the "world's first amphibious flying passenger motorcycle," has certainly stirred up excitement. This innovative vehicle promises to change how we think about personal mobility by combining land and air travel in one sleek design. Developed by Rictor, a sub-brand of the Chinese company Kuickwheel, the Skyrider X1 marks a big progression from Rictor's previous product, the K1 e-bike. Transitioning from an electric bicycle to a flying motorcycle is no small feat, and it shows Rictor's ambition to push the boundaries of eco-friendly and energy-efficient transportation.


Classification Drives Geographic Bias in Street Scene Segmentation

Nair, Rahul, Tseng, Gabriel, Rolf, Esther, Tokas, Bhanu, Kerner, Hannah

arXiv.org Artificial Intelligence

Previous studies showed that image datasets lacking geographic diversity can lead to biased performance in models trained on them. While earlier work studied general-purpose image datasets (e.g., ImageNet) and simple tasks like image recognition, we investigated geo-biases in real-world driving datasets on a more complex task: instance segmentation. We examined if instance segmentation models trained on European driving scenes (Eurocentric models) are geo-biased. Consistent with previous work, we found that Eurocentric models were geo-biased. Interestingly, we found that geo-biases came from classification errors rather than localization errors, with classification errors alone contributing 10-90% of the geo-biases in segmentation and 19-88% of the geo-biases in detection. This showed that while classification is geo-biased, localization (including detection and segmentation) is geographically robust. Our findings show that in region-specific models (e.g., Eurocentric models), geo-biases from classification errors can be significantly mitigated by using coarser classes (e.g., grouping car, bus, and truck as 4-wheeler).