AITopics | Vision

Collaborating Authors

Vision

"What exactly is computer vision then? Computer vision is a research field working to equip computers with the ability to process and understand visual data, as sighted humans can. Human brains process the gigabytes of data passing through our eyes every second and translate that data into sight - that is, into discrete objects and entities we can recognise or understand. Similarly, computer vision aims to give computers the ability to understand what they are seeing, and act intelligently on that knowledge."
– Computer vision: Cheat Sheet. ZDNet.com (December 6, 2011), by Natasha Lomas.

News Overviews Instructional Materials AI-Alerts Classics

An Algorithmic Theory of Dependent Regularizers, Part 1: Submodular Structure

Koepke, Hoyt, Meila, Marina

arXiv.org Machine LearningDec-6-2013

We present an exploration of the rich theoretical connections between several classes of regularized models, network flows, and recent results in submodular function theory. This work unifies key aspects of these problems under a common theory, leading to novel methods for working with several important models of interest in statistics, machine learning and computer vision. In Part 1, we review the concepts of network flows and submodular function optimization theory foundational to our results. We then examine the connections between network flows and the minimum-norm algorithm from submodular optimization, extending and improving several current results. This leads to a concise representation of the structure of a large class of pairwise regularized models important in machine learning, statistics and computer vision. In Part 2, we describe the full regularization path of a class of penalized regression problems with dependent variables that includes the graph-guided LASSO and total variation constrained models. This description also motivates a practical algorithm. This allows us to efficiently find the regularization path of the discretized version of TV penalized models. Ultimately, our new algorithms scale up to high-dimensional problems with millions of variables.

algorithm, optimization problem, survey article, (21 more...)

arXiv.org Machine Learning

1312.197

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Colorado (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.92)
(3 more...)

Add feedback

Cross-Domain Sparse Coding

Wang, Jim Jing-Yan

arXiv.org Machine LearningNov-27-2013

Sparse coding has shown its power as an effective data representation method. However, up to now, all the sparse coding approaches are limited within the single domain learning problem. In this paper, we extend the sparse coding to cross domain learning problem, which tries to learn from a source domain to a target domain with significant different distribution. We impose the Maximum Mean Discrepancy (MMD) criterion to reduce the cross-domain distribution difference of sparse codes, and also regularize the sparse codes by the class labels of the samples from both domains to increase the discriminative ability. The encouraging experiment results of the proposed cross-domain sparse coding algorithm on two challenging tasks --- image classification of photograph and oil painting domains, and multiple user spam detection --- show the advantage of the proposed method over other cross-domain data representation methods.

artificial intelligence, image understanding, target domain, (18 more...)

arXiv.org Machine Learning

1311.708

Country: North America > United States > New York > Erie County > Buffalo (0.14)

Genre: Research Report (0.50)

Industry: Education > Focused Education > Special Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.35)

Add feedback

From Maxout to Channel-Out: Encoding Information on Sparse Pathways

Wang, Qi, JaJa, Joseph

arXiv.org Machine LearningNov-18-2013

Motivated by an important insight from neural science, we propose a new framework for understanding the success of the recently proposed "maxout" networks. The framework is based on encoding information on sparse pathways and recognizing the correct pathway at inference time. Elaborating further on this insight, we propose a novel deep network architecture, called "channel-out" network, which takes a much better advantage of sparse pathway encoding. In channel-out networks, pathways are not only formed a posteriori, but they are also actively selected according to the inference outputs from the lower layers. From a mathematical perspective, channel-out networks can represent a wider class of piece-wise continuous functions, thereby endowing the network with more expressive power than that of maxout networks. We test our channel-out networks on several well-known image classification benchmarks, setting new state-of-the-art performance on CIFAR-100 and STL-10, which represent some of the "harder" image classification benchmarks.

channel-out network, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1312.1909

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Vision (0.87)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)

Add feedback

Towards a Neurocognitive Model of Visual Perception

Chakraborty, Arpan (North Carolina State University) | Amant, Robert St. (North Carolina State University)

AAAI ConferencesNov-14-2013

Natural and artificial vision systems differ considerably in their underlying hardware and their method of information processing. Nevertheless, biological concepts are relevant, adaptable and useful in solving hard computer vision problems. This paper presents a biologically-inspired active vision framework that emulates early visual processing at the neuronal level to accomplish a range of visual tasks. Its emergent behavior is found to be qualitatively similar to humans in certain contexts, and performance is shown to be comparable to computer vision algorithms on a saliency detection task. A neurocognitive model of visual perception based on this framework is motivated.

neurocognitive model, visual perception

AAAI Conferences

2013 AAAI Fall Symposium Series

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Boosting OCR Accuracy Using Crowdsourcing

Wang, Shuo-Yang (Academia Sinica) | Wang, Ming-Hung (National Taiwan University) | Chen, Kuan-Ta (Academia Sinica)

AAAI ConferencesNov-5-2013

Book digitizing is an important work in preserving ancient heritages. However, digitizing books contains a series of labor-intensive works, and one of them is to verify optical character recognition (OCR) outcomes. In this paper, we propose a crowdsourceable OCR verification method. Using our method, content holders are able to leverage the power of crowds to complete verification tasks and avoid content leakage. From the experiment results, our method is more efficient and reliable than the traditional method.

crowdsourcing, ocr accuracy

AAAI Conferences

First AAAI Conference on Human Computation and Crowdsourcing

Technology: Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)

Add feedback

Visual-Semantic Scene Understanding by Sharing Labels in a Context Network

Chakraborty, Ishani, Elgammal, Ahmed

arXiv.org Machine LearningSep-15-2013

We consider the problem of naming objects in complex, natural scenes containing widely varying object appearance and subtly different names. Informed by cognitive research, we propose an approach based on sharing context based object hypotheses between visual and lexical spaces. To this end, we present the Visual Semantic Integration Model (VSIM) that represents object labels as entities shared between semantic and visual contexts and infers a new image by updating labels through context switching. At the core of VSIM is a semantic Pachinko Allocation Model and a visual nearest neighbor Latent Dirichlet Allocation Model. For inference, we derive an iterative Data Augmentation algorithm that pools the label probabilities and maximizes the joint label posterior of an image. Our model surpasses the performance of state-of-art methods in several visual tasks on the challenging SUN09 dataset.

artificial intelligence, inference, text processing, (19 more...)

arXiv.org Machine Learning

1309.3809

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Geodesic-based Salient Object Detection

Jiang, Richard M

arXiv.org Artificial IntelligenceAug-23-2013

Saliency detection has been an intuitive way to provide useful cues for object detection and segmentation, as desired for many vision and graphics applications. In this paper, we provided a robust method for salient object detection and segmentation. Other than using various pixel-level contrast definitions, we exploited global image structures and proposed a new geodesic method dedicated for salient object detection. In the proposed approach, a new geodesic scheme, namely geodesic tunneling is proposed to tackle with textures and local chaotic structures. With our new geodesic approach, a geodesic saliency map is estimated in correspondence to spatial structures in an image. Experimental evaluation on a salient object benchmark dataset validated that our algorithm consistently outperformed a number of the state-of-art saliency methods, yielding higher precision and better recall rates. With the robust saliency estimation, we also present an unsupervised hierarchical salient object cut scheme simply using adaptive saliency thresholding, which attained the highest score in our F-measure test. We also applied our geodesic cut scheme to a number of image editing tasks as demonstrated in additional experiments.

geodesic-based salient object detection

arXiv.org Artificial Intelligence

1302.6557

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Towards Adapting ImageNet to Reality: Scalable Domain Adaptation with Implicit Low-rank Transformations

Rodner, Erik, Hoffman, Judy, Donahue, Jeff, Darrell, Trevor, Saenko, Kate

arXiv.org Machine LearningAug-19-2013

Images seen during test time are often not from the same distribution as images used for learning. This problem, known as domain shift, occurs when training classifiers from object-centric internet image databases and trying to apply them directly to scene understanding tasks. The consequence is often severe performance degradation and is one of the major barriers for the application of classifiers in real-world systems. In this paper, we show how to learn transform-based domain adaptation classifiers in a scalable manner. The key idea is to exploit an implicit rank constraint, originated from a max-margin domain adaptation formulation, to make optimization tractable. Experiments show that the transformation between domains can be very efficiently learned from data and easily applied to new categories. This begins to bridge the gap between large-scale internet image collections and object images captured in everyday life environments.

artificial intelligence, category, optimization problem, (18 more...)

arXiv.org Machine Learning

1308.42

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Learning Features and their Transformations by Spatial and Temporal Spherical Clustering

Dutta, Jayanta K., Banerjee, Bonny

arXiv.org Artificial IntelligenceAug-10-2013

Learning features invariant to arbitrary transformations in the data is a requirement for any recognition system, biological or artificial. It is now widely accepted that simple cells in the primary visual cortex respond to features while the complex cells respond to features invariant to different transformations. We present a novel two-layered feedforward neural model that learns features in the first layer by spatial spherical clustering and invariance to transformations in the second layer by temporal spherical clustering. Learning occurs in an online and unsupervised manner following the Hebbian rule. When exposed to natural videos acquired by a camera mounted on a cat's head, the first and second layer neurons in our model develop simple and complex cell-like receptive field properties. The model can predict by learning lateral connections among the first layer neurons. A topographic map to their spatial features emerges by exponentially decaying the flow of activation with distance from one neuron to another in the first layer that fire in close temporal proximity, thereby minimizing the pooling length in an online manner simultaneously with feature learning.

deep learning, neural network, neuron, (23 more...)

arXiv.org Artificial Intelligence

1308.235

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Histogram of Oriented Displacements (HOD): Describing Trajectories of Human Joints for Action Recognition

Gowayyed, Mohammad Abdelaziz (Alexandria University) | Torki, Marwan (Alexandria University) | Hussein, Mohammed Elsayed (Alexandria University) | El-Saban, Motaz (Microsoft Research)

AAAI ConferencesAug-3-2013

Creating descriptors for trajectories has many applications in robotics/human motion analysis and video copy detection. Here, we propose a novel descriptor for 2D trajectories: Histogram of Oriented Displacements (HOD). Each displacement in the trajectory votes with its length in a histogram of orientation angles. 3D trajectories are described by the HOD of their three projections. We use HOD to describe the 3D trajectories of body joints to recognize human actions, which is a challenging machine vision task, with applications in human-robot/machine interaction, interactive entertainment, multimedia information retrieval, and surveillance. The descriptor is fixed-length, scale-invariant and speed-invariant. Experiments on MSR-Action3D and HDM05 datasets show that the descriptor outperforms the state-of-the-art when using off-the-shelf classification tools.

action recognition, human joint, oriented displacement, (3 more...)

AAAI Conferences

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.87)

Add feedback