AITopics | Wang, Robert

Collaborating Authors

Wang, Robert

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation

Salter, Sasha, Warren, Richard, Schlager, Collin, Spurr, Adrian, Han, Shangchen, Bhasin, Rohin, Cai, Yujun, Walkington, Peter, Bolarinwa, Anuoluwapo, Wang, Robert, Danielson, Nathan, Merel, Josh, Pnevmatikakis, Eftychios, Marshall, Jesse

arXiv.org Artificial IntelligenceDec-2-2024

Hands are the primary means through which humans interact with the world. Reliable and always-available hand pose inference could yield new and intuitive control schemes for human-computer interactions, particularly in virtual and augmented reality. Computer vision is effective but requires one or multiple cameras and can struggle with occlusions, limited field of view, and poor lighting. Wearable wrist-based surface electromyography (sEMG) presents a promising alternative as an always-available modality sensing muscle activities that drive hand motion. However, sEMG signals are strongly dependent on user anatomy and sensor placement, and existing sEMG models have required hundreds of users and device placements to effectively generalize. To facilitate progress on sEMG pose inference, we introduce the emg2pose benchmark, the largest publicly available dataset of high-quality hand pose labels and wrist sEMG recordings. emg2pose contains 2kHz, 16 channel sEMG and pose labels from a 26-camera motion capture rig for 193 users, 370 hours, and 29 stages with diverse gestures - a scale comparable to vision-based hand pose datasets. We provide competitive baselines and challenging tasks evaluating real-world generalization scenarios: held-out users, sensor placements, and stages. emg2pose provides the machine learning community a platform for exploring complex generalization problems, holding potential to significantly enhance the development of sEMG-based human-computer interactions.

artificial intelligence, human computer interaction, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.02725

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.73)

Add feedback

HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos

Banerjee, Prithviraj, Shkodrani, Sindi, Moulon, Pierre, Hampali, Shreyas, Han, Shangchen, Zhang, Fan, Zhang, Linguang, Fountain, Jade, Miller, Edward, Basol, Selen, Newcombe, Richard, Wang, Robert, Engel, Jakob Julian, Hodan, Tomas

arXiv.org Artificial IntelligenceNov-28-2024

We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground-truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects. In addition to simple pick-up/observe/put-down actions, HOT3D contains scenarios resembling typical actions in a kitchen, office, and living room environment. The dataset is recorded by two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. Ground-truth poses were obtained by a professional motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner. In our experiments, we demonstrate the effectiveness of multi-view egocentric data for three popular tasks: 3D hand tracking, 6DoF object pose estimation, and 3D lifting of unknown in-hand objects. The evaluated multi-view methods, whose benchmarking is uniquely enabled by HOT3D, significantly outperform their single-view counterparts.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.19167

Country:

Asia (0.28)
North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.71)

Add feedback

Experimental Design Using Interlacing Polynomials

Lau, Lap Chi, Wang, Robert, Zhou, Hong

arXiv.org Machine LearningOct-15-2024

Experimental design is a classical problem in statistics [ Puk06 ], which recently found wide applications from machine learning (e.g., active learning, feature selection, data summ arization) to numerical linear algebra (e.g., column subset selection, sparse least squares regression) t o graph algorithms (e.g., total effective resistance minimization, algebraic connectivity maximization). We ref er the reader to [ SX20, AZLSW21, NST22, LZ22b, LZ22a, LWZ23 ] and the references therein for additional background and related applications.

artificial intelligence, machine learning, polynomial, (17 more...)

arXiv.org Machine Learning

2410.1139

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

Analysis of Corrected Graph Convolutions

Wang, Robert, Baranwal, Aseem, Fountoulakis, Kimon

arXiv.org Machine LearningMay-22-2024

Machine learning for node classification on graphs is a prominent area driven by applications such as recommendation systems. State-of-the-art models often use multiple graph convolutions on the data, as empirical evidence suggests they can enhance performance. However, it has been shown empirically and theoretically, that too many graph convolutions can degrade performance significantly, a phenomenon known as oversmoothing. In this paper, we provide a rigorous theoretical analysis, based on the contextual stochastic block model (CSBM), of the performance of vanilla graph convolution from which we remove the principal eigenvector to avoid oversmoothing. We perform a spectral analysis for $k$ rounds of corrected graph convolutions, and we provide results for partial and exact classification. For partial classification, we show that each round of convolution can reduce the misclassification error exponentially up to a saturation level, after which performance does not worsen. For exact classification, we show that the separability threshold can be improved exponentially up to $O({\log{n}}/{\log\log{n}})$ corrected convolutions.

convolution, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2405.13987

Country: Europe (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos

Parger, Mathias, Tang, Chengcheng, Twigg, Christopher D., Keskin, Cem, Wang, Robert, Steinberger, Markus

arXiv.org Artificial IntelligenceSep-2-2023

Convolutional neural network inference on video data requires powerful hardware for real-time processing. Given the inherent coherence across consecutive frames, large parts of a video typically change little. By skipping identical image regions and truncating insignificant pixel updates, computational redundancy can in theory be reduced significantly. However, these theoretical savings have been difficult to translate into practice, as sparse updates hamper computational consistency and memory access coherence; which are key for efficiency on real hardware. With DeltaCNN, we present a sparse convolutional neural network framework that enables sparse frame-by-frame updates to accelerate video inference in practice. We provide sparse implementations for all typical CNN layers and propagate sparse feature updates end-to-end - without accumulating errors over time. DeltaCNN is applicable to all convolutional neural networks without retraining. To the best of our knowledge, we are the first to significantly outperform the dense reference, cuDNN, in practical settings, achieving speedups of up to 7x with only marginal differences in accuracy.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2203.03996

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

MotionDeltaCNN: Sparse CNN Inference of Frame Differences in Moving Camera Videos

Parger, Mathias, Tang, Chengcheng, Neff, Thomas, Twigg, Christopher D., Keskin, Cem, Wang, Robert, Steinberger, Markus

arXiv.org Artificial IntelligenceAug-14-2023

Convolutional neural network inference on video input is computationally expensive and requires high memory bandwidth. Recently, DeltaCNN managed to reduce the cost by only processing pixels with significant updates over the previous frame. However, DeltaCNN relies on static camera input. Moving cameras add new challenges in how to fuse newly unveiled image regions with already processed regions efficiently to minimize the update rate - without increasing memory overhead and without knowing the camera extrinsics of future frames. In this work, we propose MotionDeltaCNN, a sparse CNN inference framework that supports moving cameras. We introduce spherical buffers and padded convolutions to enable seamless fusion of newly unveiled regions and previously processed regions -- without increasing memory footprint. Our evaluation shows that we outperform DeltaCNN by up to 90% for moving camera videos.

artificial intelligence, machine learning, motiondeltacnn, (16 more...)

arXiv.org Artificial Intelligence

2210.09887

Genre: Research Report (0.50)

Industry: Media (0.30)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Fast Algorithms for Directed Graph Partitioning Using Flows and Reweighted Eigenvalues

Lau, Lap Chi, Tung, Kam Chuen, Wang, Robert

arXiv.org Artificial IntelligenceJun-15-2023

We consider a new semidefinite programming relaxation for directed edge expansion, which is obtained by adding triangle inequalities to the reweighted eigenvalue formulation. Applying the matrix multiplicative weight update method to this relaxation, we derive almost linear-time algorithms to achieve $O(\sqrt{\log{n}})$-approximation and Cheeger-type guarantee for directed edge expansion, as well as an improved cut-matching game for directed graphs. This provides a primal-dual flow-based framework to obtain the best known algorithms for directed graph partitioning. The same approach also works for vertex expansion and for hypergraphs, providing a simple and unified approach to achieve the best known results for different expansion problems and different algorithmic techniques.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.09128

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Experimental Design for Any $p$-Norm

Lau, Lap Chi, Wang, Robert, Zhou, Hong

arXiv.org Artificial IntelligenceMay-3-2023

We consider a general $p$-norm objective for experimental design problems that captures some well-studied objectives (D/A/E-design) as special cases. We prove that a randomized local search approach provides a unified algorithm to solve this problem for all $p$. This provides the first approximation algorithm for the general $p$-norm objective, and a nice interpolation of the best known bounds of the special cases.

algorithm, artificial intelligence, randomized exchange algorithm, (16 more...)

arXiv.org Artificial Intelligence

2305.01942

Country:

North America > United States (0.46)
Europe > Italy > Sicily (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.49)

Add feedback

Cheeger Inequalities for Directed Graphs and Hypergraphs Using Reweighted Eigenvalues

Lau, Lap Chi, Tung, Kam Chuen, Wang, Robert

arXiv.org Artificial IntelligenceNov-17-2022

We derive Cheeger inequalities for directed graphs and hypergraphs using the reweighted eigenvalue approach that was recently developed for vertex expansion in undirected graphs [OZ22,KLT22,JPV22]. The goal is to develop a new spectral theory for directed graphs and an alternative spectral theory for hypergraphs. The first main result is a Cheeger inequality relating the vertex expansion $\vec{\psi}(G)$ of a directed graph $G$ to the vertex-capacitated maximum reweighted second eigenvalue $\vec{\lambda}_2^{v*}$: \[ \vec{\lambda}_2^{v*} \lesssim \vec{\psi}(G) \lesssim \sqrt{\vec{\lambda}_2^{v*} \cdot \log (\Delta/\vec{\lambda}_2^{v*})}. \] This provides a combinatorial characterization of the fastest mixing time of a directed graph by vertex expansion, and builds a new connection between reweighted eigenvalued, vertex expansion, and fastest mixing time for directed graphs. The second main result is a stronger Cheeger inequality relating the edge conductance $\vec{\phi}(G)$ of a directed graph $G$ to the edge-capacitated maximum reweighted second eigenvalue $\vec{\lambda}_2^{e*}$: \[ \vec{\lambda}_2^{e*} \lesssim \vec{\phi}(G) \lesssim \sqrt{\vec{\lambda}_2^{e*} \cdot \log (1/\vec{\lambda}_2^{e*})}. \] This provides a certificate for a directed graph to be an expander and a spectral algorithm to find a sparse cut in a directed graph, playing a similar role as Cheeger's inequality in certifying graph expansion and in the spectral partitioning algorithm for undirected graphs. We also use this reweighted eigenvalue approach to derive the improved Cheeger inequality for directed graphs, and furthermore to derive several Cheeger inequalities for hypergraphs that match and improve the existing results in [Lou15,CLTZ18]. These are supporting results that this provides a unifying approach to lift the spectral theory for undirected graphs to more general settings.

artificial intelligence, edge conductance, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2211.09776

Country: North America (0.27)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Neural Correspondence Field for Object Pose Estimation

Huang, Lin, Hodan, Tomas, Ma, Lingni, Zhang, Linguang, Tran, Luan, Twigg, Christopher, Wu, Po-Chen, Yuan, Junsong, Keskin, Cem, Wang, Robert

arXiv.org Artificial IntelligenceJul-29-2022

We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image. Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum. The move from pixels to 3D points, which is inspired by recent PIFu-style methods for 3D reconstruction, enables reasoning about the whole object, including its (self-)occluded parts. For a 3D query point associated with a pixel-aligned image feature, we train a fully-connected neural network to predict: (i) the corresponding 3D object coordinates, and (ii) the signed distance to the object surface, with the first defined only for query points in the surface vicinity. We call the mapping realized by this network as Neural Correspondence Field. The object pose is then robustly estimated from the predicted 3D-3D correspondences by the Kabsch-RANSAC algorithm. The proposed method achieves state-of-the-art results on three BOP datasets and is shown superior especially in challenging cases with occlusion. The project website is at: linhuang17.github.io/NCF.

artificial intelligence, machine learning, pose estimation, (17 more...)

arXiv.org Artificial Intelligence

2208.00113

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.46)

Add feedback