Goto

Collaborating Authors

 motorbike


1cc70be9fb6a83bc46cf4ac21a91e0b0-Supplemental-Conference.pdf

Neural Information Processing Systems

In this section, we provide the class assignment of all datasets under different missing rates. The proposed setting is anew multi-task learning scenario. Its practical applications could not be limited by the mentioned assumption in the testing space. Table B.2: The observed classes of each task onOffice-Caltech with different missing rates. Office-Home [9] contains images from four domains/tasks: Artistic, Clipart, Product and Realworld. Skin-Lesion contains three skin lesion classification tasks: HAM10000 [8], Dermofit [2] and Derm7pt[5].


A First Ride With the Maeving RM2 Electric Motorcycle

WIRED

Oozing with flair, and now a little more practical--it's hard not to love Maeving's latest ride. All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links. I test-ride electric kick scooters as a part of my job. They're fantastic to ride and zip around town, but they are not nor particularly comfortable.




Ducati adds 50 tiny sensors to motorbikes to amp up its racing game

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. MotoGP is high-speed, high-tech motorcycle racing. The fastest riders in the world compete on specialized, purpose-built motorcycles from companies like Ducati, Honda, Yamaha on the world stage in this series, which is considered the most prestigious in the game. Riders reach incredible speeds on their machines up to 220 miles per hour, and races can go 350 turns with gravity-defying leaning that scrapes elbows and knees. This Grand Prix is for the toughest of the tough on the moto circuit.


How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Tu, Haoqin, Cui, Chenhang, Wang, Zijun, Zhou, Yiyang, Zhao, Bingchen, Han, Junlin, Zhou, Wangchunshu, Yao, Huaxiu, Xie, Cihang

arXiv.org Artificial Intelligence

This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness. For the OOD evaluation, we present two novel VQA datasets, each with one variant, designed to test model performance under challenging conditions. In exploring adversarial robustness, we propose a straightforward attack strategy for misleading VLLMs to produce visual-unrelated responses. Moreover, we assess the efficacy of two jailbreaking strategies, targeting either the vision or language component of VLLMs. Our evaluation of 21 diverse models, ranging from open-source VLLMs to GPT-4V, yields interesting observations: 1) Current VLLMs struggle with OOD texts but not images, unless the visual information is limited; and 2) These VLLMs can be easily misled by deceiving vision encoders only, and their vision-language training often compromise safety protocols. We release this safety evaluation suite at https://github.com/UCSC-VLAA/vllm-safety-benchmark.


Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark

Fang, Jianwu, Li, Lei-Lei, Yang, Kuan, Zheng, Zhedong, Xue, Jianru, Chua, Tat-Seng

arXiv.org Artificial Intelligence

Traffic accident prediction in driving videos aims to provide an early warning of the accident occurrence, and supports the decision making of safe driving systems. Previous works usually concentrate on the spatial-temporal correlation of object-level context, while they do not fit the inherent long-tailed data distribution well and are vulnerable to severe environmental change. In this work, we propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training. In particular, the text description provides a dense semantic description guidance for the primary context of the traffic scene, while the driver attention provides a traction to focus on the critical region closely correlating with safe driving. CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module. We leverage the attention mechanism in these modules to explore the core semantic cues for accident prediction. In order to train CAP, we extend an existing self-collected DADA-2000 dataset (with annotated driver attention for each frame) with further factual text descriptions for the visual observations before the accidents. Besides, we construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames (named as CAP-DATA) together with labeled fact-effect-reason-introspection description and temporal accident frame label. Based on extensive experiments, the superiority of CAP is validated compared with state-of-the-art approaches. The code, CAP-DATA, and all results will be released in \url{https://github.com/JWFanggit/LOTVS-CAP}.


Vietnam's AI Leadership Status Is Blossoming

#artificialintelligence

Vietnam, a country in Southeast Asia, with an area of 311,699 square kilometres (120,348 square miles) has a population of over 97 million and if you ever visit this beautiful country,you will soon appreciate that there are also over 65 million registered motorbikes, with many families having a motorbike for each family member. This article summarizes a number of research sources to give a sense of where Vietnam's leadership is in the field of AI – and all indications are its status is evolving and blossoming. Where is Vietnam in terms of Information Communication and Technology (ICT) companies, and how are they evolving their AI leadership position? According to the Ministry of Information and Communications, the revenue of the ICT industry in 2021 was $136,153 million USD, a solid increase compared to $124,678 million USD in 2020. It is also estimated that the ratio of Vietnam's value in ICT revenue reached 24.65%, a significant increase compared to previous years.


Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions

Voigt, Henrik, Hombeck, Jan, Meuschke, Monique, Lawonn, Kai, Zarrieß, Sina

arXiv.org Artificial Intelligence

Existing language and vision models achieve impressive performance in image-text understanding. Yet, it is an open question to what extent they can be used for language understanding in 3D environments and whether they implicitly acquire 3D object knowledge, e.g. about different views of an object. In this paper, we investigate whether a state-of-the-art language and vision model, CLIP, is able to ground perspective descriptions of a 3D object and identify canonical views of common objects based on text queries. We present an evaluation framework that uses a circling camera around a 3D object to generate images from different viewpoints and evaluate them in terms of their similarity to natural language descriptions. We find that a pre-trained CLIP model performs poorly on most canonical views and that fine-tuning using hard negative sampling and random contrasting yields good results even under conditions with little available training data.


How a jetpack design helped create a flying motorbike

#artificialintelligence

At around the age of 12, David Mayman tried to build a helicopter out of fence posts and an old lawn mower. Needless to say, it did not go well. His contraption didn't fly and he was made to fix the fence. "I was brought up in a way that I guess challenged me scientifically... I was always told that nothing's impossible," he says.