frame model
Improving action classification with brain-inspired deep networks
Aglinskas, Aidas, Anzellotti, Stefano
Action recognition is also key for applications ranging from robotics to healthcare monitoring. Action information can be extracted from the body pose and movements, as well as from the background scene. However, the extent to which deep neural networks (DNNs) make use of information about the body and information about the background remains unclear. Since these two sources of information may be correlated within a training dataset, DNNs might learn to rely predominantly on one of them, without taking full advantage of the other. Unlike DNNs, humans have domain-specific brain regions selective for perceiving bodies, and regions selective for perceiving scenes. The present work tests whether humans are thus more effective at extracting information from both body and background, and whether building brain-inspired deep network architectures with separate domain-specific streams for body and scene perception endows them with more human-like performance. We first demonstrate that DNNs trained using the HAA500 dataset perform almost as accurately on versions of the stimuli that show both body and background and on versions of the stimuli from which the body was removed, but are at chance-level for versions of the stimuli from which the background was removed. Conversely, human participants (N=28) can recognize the same set of actions accurately with all three versions of the stimuli, and perform significantly better on stimuli that show only the body than on stimuli that show only the background. Finally, we implement and test a novel architecture patterned after domain specificity in the brain with separate streams to process body and background information. We show that 1) this architecture improves action recognition performance, and 2) its accuracy across different versions of the stimuli follows a pattern that matches more closely the pattern of accuracy observed in human participants.
Developing a Multi-Modal Machine Learning Model For Predicting Performance of Automotive Hood Frames
Indupally, Abhishek, Ramnath, Satchit
Is there a way for a designer to evaluate the performance of a given hood frame geometry without spending significant time on simulation setup? This paper seeks to address this challenge by developing a multimodal machine-learning (MMML) architecture that learns from different modalities of the same data to predict performance metrics. It also aims to use the MMML architecture to enhance the efficiency of engineering design processes by reducing reliance on computationally expensive simulations. The proposed architecture accelerates design exploration, enabling rapid iteration while maintaining high-performance standards, especially in the concept design phase. The study also presents results that show that by combining multiple data modalities, MMML outperforms traditional single-modality approaches. Two new frame geometries, not part of the training dataset, are also used for prediction using the trained MMML model to showcase the ability to generalize to unseen frame models. The findings underscore MMML's potential in supplementing traditional simulation-based workflows, particularly in the conceptual design phase, and highlight its role in bridging the gap between machine learning and real-world engineering applications. This research paves the way for the broader adoption of machine learning techniques in engineering design, with a focus on refining multimodal approaches to optimize structural development and accelerate the design cycle.
FRAMED: An AutoML Approach for Structural Performance Prediction of Bicycle Frames
Regenwetter, Lyle, Weaver, Colin, Ahmed, Faez
This paper demonstrates how Automated Machine Learning (AutoML) methods can be used as effective surrogate models in engineering design problems. To do so, we consider the challenging problem of structurally-performant bicycle frame design and demonstrate across-the-board dominance by AutoML in regression and classification surrogate modeling tasks. We also introduce FRAMED -- a parametric dataset of 4500 bicycle frames based on bicycles designed by practitioners and enthusiasts worldwide. Accompanying these frame designs, we provide ten structural performance values such as weight, displacements under load, and safety factors computed using finite element simulations for all the bicycle frame designs. We formulate two challenging test problems: a performance-prediction regression problem and a feasibility-prediction classification problem. We then systematically search for optimal surrogate models using Bayesian hyperparameter tuning and neural architecture search. Finally, we show how a state-of-the-art AutoML method can be effective for both regression and classification problems. We demonstrate that the proposed AutoML models outperform the strongest gradient boosting and neural network surrogates identified through Bayesian optimization by an improved F1 score of 24\% for classification and reduced mean absolute error by 12.5\% for regression. Our work introduces a dataset for bicycle design practitioners, provides two benchmark problems for surrogate modeling researchers, and demonstrates the advantages of AutoML in machine learning tasks. The dataset and code are provided at \url{http://decode.mit.edu/projects/framed/}.
Learning FRAME Models Using CNN Filters
Lu, Yang (University of California, Los Angeles) | Zhu, Song-Chun (University of California, Los Angeles) | Wu, Ying Nian (University of California, Los Angeles)
The convolutional neural network (ConvNet or CNN) has proven to be very successful in many tasks such as those in computer vision. In this conceptual paper, we study the generative perspective of the discriminative CNN. In particular, we propose to learn the generative FRAME (Filters, Random field, And Maximum Entropy) model using the highly expressive filters pre-learned by the CNN at the convolutional layers. We show that the learning algorithm can generate realistic and rich object and texture patterns in natural scenes. We explain that each learned model corresponds to a new CNN unit at a layer above the layer of filters employed by the model. We further show that it is possible to learn a new layer of CNN units using a generative CNN model, which is a product of experts model, and the learning algorithm admits an EM interpretation with binary latent variables.
Modeling Neural Population Spiking Activity with Gibbs Distributions
Wood, Frank, Roth, Stefan, Black, Michael J.
Probabilistic modeling of correlated neural population firing activity is central to understanding the neural code and building practical decoding algorithms. No parametric models currently exist for modeling multivariate correlated neural data and the high dimensional nature of the data makes fully nonparametric methods impractical. To address these problems we propose an energy-based model in which the joint probability of neural activity is represented using learned functions of the 1D marginal histograms of the data. The parameters of the model are learned using contrastive divergence and an optimization procedure for finding appropriate marginal directions. We evaluate the method using real data recorded from a population of motor cortical neurons. In particular, we model the joint probability of population spiking times and 2D hand position and show that the likelihood of test data under our model is significantly higher than under other models. These results suggest that our model captures correlations in the firing activity. Our rich probabilistic model of neural population activity is a step towards both measurement of the importance of correlations in neural coding and improved decoding of population activity.
Modeling Neural Population Spiking Activity with Gibbs Distributions
Wood, Frank, Roth, Stefan, Black, Michael J.
Probabilistic modeling of correlated neural population firing activity is central to understanding the neural code and building practical decoding algorithms. No parametric models currently exist for modeling multivariate correlated neural data and the high dimensional nature of the data makes fully nonparametric methods impractical. To address these problems we propose an energy-based model in which the joint probability of neural activity is represented using learned functions of the 1D marginal histograms of the data. The parameters of the model are learned using contrastive divergence and an optimization procedure for finding appropriate marginal directions. We evaluate the method using real data recorded from a population of motor cortical neurons. In particular, we model the joint probability of population spiking times and 2D hand position and show that the likelihood of test data under our model is significantly higher than under other models. These results suggest that our model captures correlations in the firing activity. Our rich probabilistic model of neural population activity is a step towards both measurement of the importance of correlations in neural coding and improved decoding of population activity.