display image
UnifiedOptimalTransportFrameworkforUniversal DomainAdaptation (SupplementaryMaterial)
Recall measures the fraction ofcommon samples that are retrievedascorrect common class, while specificity measures thefraction ofprivatesamples thatarenotretrieved. Fig. S1(b) shows the sensitivity ofγ, where γ is the rough boundary for splitting positive and negative in adaptive filling. For the cosine similarity of two ℓ2-normalized features, the similarity value is limited from 1to1, where higher value indicates higher similarity. Suchself-supervisedlearning methods encourage the consistency between two augmentations of one image. The display images for source prototypes are chosen by finding the nearest source instance of the prototype.
CAD2DMD-SET: Synthetic Generation Tool of Digital Measurement Device CAD Model Datasets for fine-tuning Large Vision-Language Models
Valente, João, Dehban, Atabak, Ventura, Rodrigo
Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities across various multimodal tasks. They continue, however, to struggle with trivial scenarios such as reading values from Digital Measurement Devices (DMDs), particularly in real-world conditions involving clutter, occlusions, extreme viewpoints, and motion blur; common in head-mounted cameras and Augmented Reality (AR) applications. Motivated by these limitations, this work introduces CAD2DMD-SET, a synthetic data generation tool designed to support visual question answering (VQA) tasks involving DMDs. By leveraging 3D CAD models, advanced rendering, and high-fidelity image composition, our tool produces diverse, VQA-labelled synthetic DMD datasets suitable for fine-tuning LVLMs. Additionally, we present DMDBench, a curated validation set of 1,000 annotated real-world images designed to evaluate model performance under practical constraints. Benchmarking three state-of-the-art LVLMs using Average Normalised Levenshtein Similarity (ANLS) and further fine-tuning LoRA's of these models with CAD2DMD-SET's generated dataset yielded substantial improvements, with InternVL showcasing a score increase of 200% without degrading on other tasks. This demonstrates that the CAD2DMD-SET training dataset substantially improves the robustness and performance of LVLMs when operating under the previously stated challenging conditions. The CAD2DMD-SET tool is expected to be released as open-source once the final version of this manuscript is prepared, allowing the community to add different measurement devices and generate their own datasets.
MToFNet: Object Anti-Spoofing with Mobile Time-of-Flight Data
Jeong, Yonghyun, Kim, Doyeon, Lee, Jaehyeon, Hong, Minki, Hwang, Solbi, Choi, Jongwon
In online markets, sellers can maliciously recapture others' images on display screens to utilize as spoof images, which can be challenging to distinguish in human eyes. To prevent such harm, we propose an anti-spoofing method using the paired rgb images and depth maps provided by the mobile camera with a Time-of-Fight sensor. When images are recaptured on display screens, various patterns differing by the screens as known as the moir\'e patterns can be also captured in spoof images. These patterns lead the anti-spoofing model to be overfitted and unable to detect spoof images recaptured on unseen media. To avoid the issue, we build a novel representation model composed of two embedding models, which can be trained without considering the recaptured images. Also, we newly introduce mToF dataset, the largest and most diverse object anti-spoofing dataset, and the first to utilize ToF data. Experimental results confirm that our model achieves robust generalization even across unseen domains.
Facebook wants to send 'emotional' robots to explore and scan faces to 'help users make friends'
Facebook is considering building'emotionally sensitive' robots that can explore the world, identify objects and people and enable users to make friends remotely. On-board sensors would allow the robots to spot people to engage with, judge their emotional state and listen to what they are saying, a patent filing revealed. At the same time, the robot could display images, video and speak with people -- potentially letting users meet people and make new friends remotely. However, it is not known whether Facebook will follow through on the patent filing and make the rough robot designs a reality. Facebook is considering building'emotionally sensitive' robots (pictured, in this rough sketch from the patent that the social media firm filed) that can explore the world, identify objects and people and enable users to make friends remotely Cameras to detect faces and interpret emotional states.