Goto

Collaborating Authors

 monocon


MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints

Gokhale, Shreyas

arXiv.org Artificial Intelligence

Learning high-quality, robust, efficient, and disentangled representations is a central challenge in artificial intelligence (AI). Deep metric learning frameworks tackle this challenge primarily using architectural and optimization constraints. Here, we introduce a third approach that instead relies on $\textit{functional}$ constraints. Specifically, we present MonoCon, a simple framework that uses a small monotonic multi-layer perceptron (MLP) head attached to any pre-trained encoder. Due to co-adaptation between encoder and head guided by contrastive loss and monotonicity constraints, MonoCon learns robust, disentangled, and highly compact embeddings at a practically negligible performance cost. On the CIFAR-100 image classification task, MonoCon yields representations that are nearly 9x more compact and 1.5x more robust than the fine-tuned encoder baseline, while retaining 99\% of the baseline's 5-NN classification accuracy. We also report a 3.4x more compact and 1.4x more robust representation on an SNLI sentence similarity task for a marginal reduction in the STSb score, establishing MonoCon as a general domain-agnostic framework. Crucially, these robust, ultra-compact representations learned via functional constraints offer a unified solution to critical challenges in disparate contexts ranging from edge computing to cloud-scale retrieval.


Technique improves AI ability to understand 3D space using 2D images

#artificialintelligence

"We live in a 3D world, but when you take a picture, it records that world in a 2D image," says Tianfu Wu, corresponding author of a paper on the work and an assistant professor of electrical and computer engineering at North Carolina State University. "AI programs receive visual input from cameras. So if we want AI to interact with the world, we need to ensure that it is able to interpret what 2D images can tell it about 3D space. In this research, we are focused on one part of that challenge: how we can get AI to accurately recognize 3D objects -- such as people or cars -- in 2D images, and place those objects in space." While the work may be important for autonomous vehicles, it also has applications for manufacturing and robotics.


Technique Improves AI Ability to Understand 3D Space Using 2D Images

#artificialintelligence

The work would help the artificial intelligence used in autonomous vehicles navigate in relation to other vehicles, using the two-dimensional images it receives from an onboard camera. A technique developed by researchers at North Carolina State University (NC State) uses two-dimensional (2D) images to improve the ability of artificial intelligence (AI) programs to identify three-dimensional (3D) objects. Called MonoCon, the technique could improve the navigation of autonomous vehicles in relation to other vehicles using 2D images from onboard cameras, which are less expensive than LiDAR sensors. MonoCon can put 3D objects identified in 2D images into a "bounding box," which indicates to the AI the outermost edges of the objects. Said NC State's Tianfu Wu, "In addition to asking the AI to predict the camera-to-object distance and the dimensions of the bounding boxes, we also ask the AI to predict the locations of each of the box's eight points and its distance from the center of the bounding box in two dimensions," which "helps the AI more accurately identify and predict 3D objects based on 2D images."