Goto

Collaborating Authors

 surreal


MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective

arXiv.org Artificial Intelligence

Large Multimodal Models (LMMs) have demonstrated remarkable capabilities. While existing benchmarks for evaluating LMMs mainly focus on image comprehension, few works evaluate them from the image generation perspective. To address this issue, we propose a straightforward automated evaluation pipeline. Specifically, this pipeline requires LMMs to generate an image-prompt from a given input image. Subsequently, it employs text-to-image generative models to create a new image based on these generated prompts. Finally, we evaluate the performance of LMMs by comparing the original image with the generated one. Furthermore, we introduce MMGenBench-Test, a comprehensive benchmark developed to evaluate LMMs across 13 distinct image patterns, and MMGenBench-Domain, targeting the performance evaluation of LMMs within the generative image domain. A thorough evaluation involving over 50 popular LMMs demonstrates the effectiveness and reliability in both the pipeline and benchmark. Our observations indicate that numerous LMMs excelling in existing benchmarks fail to adequately complete the basic tasks, related to image understanding and description. This finding highlights the substantial potential for performance improvement in current LMMs and suggests avenues for future model optimization. Concurrently, our pipeline facilitates the efficient assessment of LMMs performance across diverse domains by using solely image inputs.


Co-domain Symmetry for Complex-Valued Deep Learning

arXiv.org Artificial Intelligence

We study complex-valued scaling as a type of symmetry natural and unique to complex-valued measurements and representations. Deep Complex Networks (DCN) extends real-valued algebra to the complex domain without addressing complex-valued scaling. SurReal takes a restrictive manifold view of complex numbers, adopting a distance metric to achieve complex-scaling invariance while losing rich complex-valued information. We analyze complex-valued scaling as a co-domain transformation and design novel equivariant and invariant neural network layer functions for this special transformation. We also propose novel complex-valued representations of RGB images, where complex-valued scaling indicates hue shift or correlated changes across color channels. Benchmarked on MSTAR, CIFAR10, CIFAR100, and SVHN, our co-domain symmetric (CDS) classifiers deliver higher accuracy, better generalization, robustness to co-domain transformations, and lower model bias and variance than DCN and SurReal with far fewer parameters.


C-SURE: Shrinkage Estimator and Prototype Classifier for Complex-Valued Deep Learning

arXiv.org Machine Learning

The James-Stein (JS) shrinkage estimator is a biased estimator that captures the mean of Gaussian random vectors.While it has a desirable statistical property of dominance over the maximum likelihood estimator (MLE) in terms of mean squared error (MSE), not much progress has been made on extending the estimator onto manifold-valued data. We propose C-SURE, a novel Stein's unbiased risk estimate (SURE) of the JS estimator on the manifold of complex-valued data with a theoretically proven optimum over MLE. Adapting the architecture of the complex-valued SurReal classifier, we further incorporate C-SURE into a prototype convolutional neural network (CNN) classifier. We compare C-SURE with SurReal and a real-valued baseline on complex-valued MSTAR and RadioML datasets. C-SURE is more accurate and robust than SurReal, and the shrinkage estimator is always better than MLE for the same prototype classifier. Like SurReal, C-SURE is much smaller, outperforming the real-valued baseline on MSTAR (RadioML) with less than 1 percent (3 percent) of the baseline size


SURREAL

#artificialintelligence

Our goal is to make Deep Reinforcement Learning accessible to everyone. We introduce Surreal, an open-source, reproducible, and scalable distributed reinforcement learning framework. Surreal provides a high-level abstraction for building distributed reinforcement learning algorithms. We implement our distributed variants of PPO and DDPG in the current release. Click to see detailed documentation!


Fei-Fei Li's Stanford Team Is Crowdsourcing Robot Training

#artificialintelligence

Sorting a bunch of differently coloured toy trucks and action figures seems like child's play, right? Unfortunately this remains a challenging task in the world of machine learning. So why not have humans simply show the machines how to do it? This is the inspiration behind a new research project led by Stanford Artificial Intelligence Lab Director Fei-Fei Li and her husband, Stanford Associate Professor Silvio Savarese. The project introduces two new global platforms -- RoboTurk and Surreal -- designed to provide high-quality task demonstration data to help researchers working in robotic manipulation.


Humans help robots learn tasks

#artificialintelligence

Bender is one of the robot arms that a team of Stanford researchers is using to test two frameworks that, together, could make it faster and easier to teach robots basic skills. The RoboTurk framework allows people to direct the robot arms in real time with a smartphone and a browser by showing the robot how to carry out tasks like picking up objects. SURREAL speeds the learning process by running multiple experiences at once, essentially allowing the robots to learn from many experiences simultaneously. "With RoboTurk and SURREAL, we can push the boundary of what robots can do by combining lots of data collected by humans and coupling that with large-scale reinforcement learning," said Mandlekar, a member of the team that developed the frameworks. The group will be presenting RoboTurk and SURREAL Oct. 29 at the conference on robot learning in Zurich, Switzerland.


SURREAL: SUbgraph Robust REpresentAtion Learning

arXiv.org Machine Learning

The success of graph embeddings or node representation learning in a variety of downstream tasks, such as node classification, link prediction, and recommendation systems, has led to their popularity in recent years. Representation learning algorithms aim to preserve local and global network structure by identifying node neighborhood notions. However, many existing algorithms generate embeddings that fail to properly preserve the network structure, or lead to unstable representations due to random processes (e.g., random walks to generate context) and, thus, cannot generate to multi-graph problems. In this paper, we propose a robust graph embedding using connection subgraphs algorithm, entitled: SURREAL, a novel, stable graph embedding algorithmic framework. SURREAL learns graph representations using connection subgraphs by employing the analogy of graphs with electrical circuits. It preserves both local and global connectivity patterns, and addresses the issue of high-degree nodes. Further, it exploits the strength of weak ties and meta-data that have been neglected by baselines. The experiments show that SURREAL outperforms state-of-the-art algorithms by up to 36.85% on multi-label classification problem. Further, in contrast to baselines, SURREAL, being deterministic, is completely stable.