The category of visual stimuli has been reliably decoded from patterns of neural activity in extrastriate visual cortex . It has yet to be seen whether object identity can be inferred from this activity. We present fMRI data measuring responses in human extrastriate cortex to a set of 12 distinct object images. We use a simple winner-take-all classifier, using half the data from each recording session as a training set, to evaluate encoding of object identity across fMRI voxels. Since this approach is sensitive to the inclusion of noisy voxels, we describe two methods for identifying subsets of voxels in the data which optimally distinguish object identity. One method characterizes the reliability of each voxel within subsets of the data, while another estimates the mutual information of each voxel with the stimulus set. We find that both metrics can identify subsets of the data which reliably encode object identity, even when noisy measurements are artificially added to the data. The mutual information metric is less efficient at this task, likely due to constraints in fMRI data.
With a lifetime of observing the world informing our perceptions, we're all pretty good at inferring the overall shape of something we only see from the side, or for a brief moment. Computers, however, are just plain bad at it. Fortunately, a clever shortcut created by a Berkeley AI researcher may seriously improve their performance. It's useful to be able to see something in 2D and guess accurately the actual volume it takes up -- it would help with object tracking in AR and VR, creative workflows, and so on. Going up a dimension means you've got a lot more data to think about.
In many areas of research, scientists need to analyze three dimensional data in a variety of forms, such as medical brain scans, the structures of cells and molecules, geological features on our own planet and others, or the hearts of exploding stars. The ability to visually analyze most 3D data still mainly relies on the use of two-dimensional (2D) computer displays. This presents a limitation because although we have been physically representing information in two dimensions since the first cave paintings and probably earlier, our brains have evolved for stereoscopic vision: we're hardwired for 3D. Moreover, when 3D images are represented in 2D, even when you can spin them one way and another on a computer screen, it can sometimes be difficult to see or appreciate the detailed 3 dimensional relationships between objects, be it the way two atoms interact or two binary stars. If, for example, you are designing a drug to fit cozily into a tiny gap in an enzyme, or if you are trying to trace the connections between neurons in a complex network of brain circuitry, then the more you understand the 3D environment the better.
Digitally reconstructing 3D geometry from images is a core problem in computer vision. There are various applications, such as movie productions, content generation for video games, virtual and augmented reality, 3D printing and many more. The task discussed in this blog post is reconstructing high quality 3D geometry from a single color image of an object as shown in the figure below. Humans have the ability to effortlessly reason about the shapes of objects and scenes even if we only see a single image. Note that the binocular arrangement of our eyes allows us to perceive depth, but it is not required to understand 3D geometry.
Learning from 3D Data is a fascinating idea which is well explored and studied in computer vision. This allows one to learn from very sparse LiDAR data, point cloud data as well as 3D objects in terms of CAD models and surfaces etc. Most of the approaches to learn from such data are limited to uniform 3D volume occupancy grids or octree representations. A major challenge in learning from 3D data is that one needs to define a proper resolution to represent it in a voxel grid and this becomes a bottleneck for the learning algorithms. Specifically, while we focus on learning from 3D data, a fine resolution is very important to capture key features in the object and at the same time the data becomes sparser as the resolution becomes finer. There are numerous applications in computer vision where a multi-resolution representation is used instead of a uniform grid representation in order to make the applications memory efficient. Though such methods are difficult to learn from, they are much more efficient in representing 3D data. In this paper, we explore the challenges in learning from such data representation. In particular, we use a multi-level voxel representation where we define a coarse voxel grid that contains information of important voxels(boundary voxels) and multiple fine voxel grids corresponding to each significant voxel of the coarse grid. A multi-level voxel representation can capture important features in the 3D data in a memory efficient way in comparison to an octree representation. Consequently, learning from a 3D object with high resolution, which is paramount in feature recognition, is made efficient.