Lv, Zhaoyang
Aria Everyday Activities Dataset
Lv, Zhaoyang, Charron, Nicholas, Moulon, Pierre, Gamino, Alexander, Peng, Cheng, Sweeney, Chris, Miller, Edward, Tang, Huixuan, Meissner, Jeff, Dong, Jing, Somasundaram, Kiran, Pesqueira, Luis, Schwesinger, Mark, Parkhi, Omkar, Gu, Qiao, De Nardi, Renzo, Cheng, Shangyi, Saarinen, Steve, Baiyya, Vijay, Zou, Yuyang, Newcombe, Richard, Engel, Jakob Julian, Pan, Xiaqing, Ren, Carl
We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data including high frequency globally aligned 3D trajectories, scene point cloud, per-frame 3D eye gaze vector and time aligned speech transcription. In this paper, we demonstrate a few exemplar research applications enabled by this dataset, including neural scene reconstruction and prompted segmentation. AEA is an open source dataset that can be downloaded from https://www.projectaria.com/datasets/aea/. We are also providing open-source implementations and examples of how to use the dataset in Project Aria Tools https://github.com/facebookresearch/projectaria_tools.
Multi-class Classification without Multi-class Labels
Hsu, Yen-Chang, Lv, Zhaoyang, Schlosser, Joel, Odom, Phillip, Kira, Zsolt
This work presents a new strategy for multi-class classification that requires no class-specific labels, but instead leverages pairwise similarity between examples, which is a weaker form of annotation. The proposed method, meta classification learning, optimizes a binary classifier for pairwise similarity prediction and through this process learns a multi-class classifier as a submodule. We formulate this approach, present a probabilistic graphical model for it, and derive a surprisingly simple loss function that can be used to learn neural network-based models. We then demonstrate that this same framework generalizes to the supervised, unsupervised cross-task, and semi-supervised settings. Our method is evaluated against state of the art in all three learning paradigms and shows a superior or comparable accuracy, providing evidence that learning multi-class classification without multi-class labels is a viable learning option.
Taking a Deeper Look at the Inverse Compositional Algorithm
Lv, Zhaoyang, Dellaert, Frank, Rehg, James M., Geiger, Andreas
In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment. We first discuss the assumptions made by this well-established technique, and subsequently propose to relax these assumptions by incorporating data-driven priors into this model. More specifically, we unroll a robust version of the inverse compositional algorithm and replace multiple components of this algorithm using more expressive models whose parameters we train in an end-to-end fashion from data. Our experiments on several challenging 3D rigid motion estimation tasks demonstrate the advantages of combining optimization with learning-based techniques, outperforming the classic inverse compositional algorithm as well as data-driven image-to-pose regression approaches.
A probabilistic constrained clustering for transfer learning and image category discovery
Hsu, Yen-Chang, Lv, Zhaoyang, Schlosser, Joel, Odom, Phillip, Kira, Zsolt
Neural network-based clustering has recently gained popularity, and in particular a constrained clustering formulation has been proposed to perform transfer learning and image category discovery using deep learning. The core idea is to formulate a clustering objective with pairwise constraints that can be used to train a deep clustering network; therefore the cluster assignments and their underlying feature representations are jointly optimized end-to-end. In this work, we provide a novel clustering formulation to address scalability issues of previous work in terms of optimizing deeper networks and larger amounts of categories. The proposed objective directly minimizes the negative log-likelihood of cluster assignment with respect to the pairwise constraints, has no hyper-parameters, and demonstrates improved scalability and performance on both supervised learning and unsupervised transfer learning.