Goto

Collaborating Authors

 video understanding



Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1

Neural Information Processing Systems

The warping estimation module W is based on an hourglass with five conv3 3 - bn - relu - pool2 2 in the encoders and five upsample2 2 - conv3 3 - bn - relu blocks in the decoders. In G, we use the Johnson architecture [ 3 ] with two down-sampling blocks, six residual-blocks and two up-sampling blocks. The design follows [ 7 ]. The inputs are the base image, displacement field, and inpainting map. It downsampled 4 and upsampled 4 to get the output, i.e. the reconstructed image.


Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1

Neural Information Processing Systems

Pose estimation is remarkably successful under supervised learning, but obtaining annotations, especially for new deployments, is costly and time-consuming. This work tackles adapting models trained on synthetic data to real-world target domains with only unlabelled data. A common approach is model fine-tuning with pseudo-labels from the target domain; yet many pseudo-labelling strategies cannot provide sufficient high-quality pose labels. This work proposes a reconstruction-based strategy as a complement to pseudo-labelling for synthetic-to-real domain adaptation. We generate the driving image by geometrically transforming a base image according to the predicted keypoints and enforce a reconstruction loss to refine the predictions. It provides a novel solution to effectively correct confident yet inaccurate keypoint locations through image reconstruction in domain adaptation. Our approach outperforms the previous state-of-the-arts by 8% for PCK on four large-scale hand and human real-world datasets. In particular, we excel on endpoints such as fingertips and head, with 7.2% and 29.9% improvements in PCK.






Blink Video Doorbell (2nd Gen) review: Impressive features, great price

PCWorld

When you purchase through links in our articles, we may earn a small commission. Amazon's entry-level video doorbell delivers essential features at a bargain price. Limited local storage options (included Sync Module Core doesn't support USB storage) The Blink Video Doorbell (2nd Gen) delivers clear video, wide coverage, reliable alerts, and a long battery life at a remarkably low price. If you don't need advanced features like ultra-sharp resolution, or full-duplex audio, this doorbell is a true bargain. Blink is Amazon's budget line of smart home products.


What video doorbells see (and what they don't): Here's what you can expect

PCWorld

When you purchase through links in our articles, we may earn a small commission. What video doorbells see (and what they don't): Here's what you can expect Don't assume these gadgets will capture everything that happens on your porch. Understand these critical specs and you'll avoid a disappointing purchase. With a camera at the front door and an app on their phone, they jump to the conclusion that they'll capture faces on the sidewalk, license plates at the curb, and anybody cutting across the lawn. Most doorbell cameras deliver far more modest real-world performances. They have a tight field of view that sees what's directly in front of their lens; they're built to frame a visitor's face standing in front of the door, not the entire space the door.


EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity

Neural Information Processing Systems

Research on egocentric tasks in computer vision has mostly focused on head-mounted cameras, such as fisheye cameras or embedded cameras inside immersive headsets.We argue that the increasing miniaturization of optical sensors will lead to the prolific integration of cameras into many more body-worn devices at various locations.This will bring fresh perspectives to established tasks in computer vision and benefit key areas such as human motion tracking, body pose estimation, or action recognition---particularly for the lower body, which is typically occluded.In this paper, we introduce EgoSim, a novel simulator of body-worn cameras that generates realistic egocentric renderings from multiple perspectives across a wearer's body.A key feature of EgoSim is its use of real motion capture data to render motion artifacts, which are especially noticeable with armor leg-worn cameras.In addition, we introduce MultiEgoView, a dataset of egocentric footage from six body-worn cameras and ground-truth full-body 3D poses during several activities:119 hours of data are derived from AMASS motion sequences in four high-fidelity virtual environments, which we augment with 5 hours of real-world motion data from 13 participants using six GoPro cameras and 3D body pose references from an Xsens motion capture suit.We demonstrate EgoSim's effectiveness by training an end-to-end video-only 3D pose estimation network.Analyzing its domain gap, we show that our dataset and simulator substantially aid training for inference on real-world data.EgoSim code & MultiEgoView dataset: https://siplab.org/projects/EgoSim