AITopics | Vision

Collaborating Authors

Vision

"What exactly is computer vision then? Computer vision is a research field working to equip computers with the ability to process and understand visual data, as sighted humans can. Human brains process the gigabytes of data passing through our eyes every second and translate that data into sight - that is, into discrete objects and entities we can recognise or understand. Similarly, computer vision aims to give computers the ability to understand what they are seeing, and act intelligently on that knowledge."
– Computer vision: Cheat Sheet. ZDNet.com (December 6, 2011), by Natasha Lomas.

News Overviews Instructional Materials AI-Alerts Classics

AI@NICTA

AI MagazineOct-11-2012

NICTA is Australia's Information and Communications Technology (ICT) Centre of Excellence. It is the largest organization in Australia dedicated to ICT research. While it has close links with local universities, it is in fact an independent but not-for-profit company in the business of doing research, commercializing that research and training PhD students to do that research. Much of the work taking place at NICTA involves various topics in artificial intelligence. In this article, we survey some of the AI work being undertaken at NICTA.

constraint-based reasoning, logic programming, nicta, (23 more...)

AI Magazine

Country:

Oceania > Australia (1.00)
Europe (1.00)
North America > United States > California (0.29)

Genre: Instructional Material (0.69)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.93)
Information Technology (0.93)
Transportation (0.68)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(6 more...)

Add feedback

Towards an Empathizing and Adaptive Storyteller System

Bae, Byung Chull (IT University of Copenhagen) | Brunete, Alberto (Carlos III University) | Malik, Usman (National University of Sciences and Technology) | Dimara, Evanthia (Université Paris-Sud) | Jermsurawong, Jermsak (New York University Abu Dhabi) | Mavridis, Nikolaos ( New York University Abu Dhabi )

AAAI ConferencesOct-7-2012

This paper describes our ongoing effort to build an empathizing and adaptive storyteller system. The system under development aims to utilize emotional expressions generated from an avatar or a humanoid robot in addition to the listener’s responses which are monitored in real time, in order to deliver a story in an effective manner. We conducted a pilot study and the results were analyzed in two ways: first, through a survey questionnaire analysis based on the participant’s subjective ratings; second, through automated video analysis based on the participant’s emotional facial expression and eye blinking. The survey questionnaire results show that male participants have a tendency of more empathizing with a story character when a virtual storyteller is present, as compared to audio-only narration. The video analysis results show that the number of eye blinking of the participants is thought to be reciprocal to their attention.

artificial intelligence, listener, storyteller, (15 more...)

AAAI Conferences

Eighth Artificial Intelligence and Interactive Digital Entertainment Conference

Country:

Europe > France (0.15)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
Europe > Denmark (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.55)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (0.50)

Add feedback

Examples of Artificial Perceptions in Optical Character Recognition and Iris Recognition

Noaica, Cristina M., Badea, Robert, Motoc, Iulia M., Ghica, Claudiu G., Rosoiu, Alin C., Popescu-Bodorin, Nicolaie

arXiv.org Artificial IntelligenceSep-27-2012

This paper assumes the hypothesis that human learning is perception based, and consequently, the learning process and perceptions should not be represented and investigated independently or modeled in different simulation spaces. In order to keep the analogy between the artificial and human learning, the former is assumed here as being based on the artificial perception. Hence, instead of choosing to apply or develop a Computational Theory of (human) Perceptions, we choose to mirror the human perceptions in a numeric (computational) space as artificial perceptions and to analyze the interdependence between artificial learning and artificial perception in the same numeric space, using one of the simplest tools of Artificial Intelligence and Soft Computing, namely the perceptrons. As practical applications, we choose to work around two examples: Optical Character Recognition and Iris Recognition. In both cases a simple Turing test shows that artificial perceptions of the difference between two characters and between two irides are fuzzy, whereas the corresponding human perceptions are, in fact, crisp.

fuzzy logic, neural network, perception, (15 more...)

arXiv.org Artificial Intelligence

1209.6195

Country:

Europe > Romania (0.15)
Europe > Hungary (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.38)

Add feedback

A Bayesian Nonparametric Approach to Image Super-resolution

Polatkan, Gungor, Zhou, Mingyuan, Carin, Lawrence, Blei, David, Daubechies, Ingrid

arXiv.org Machine LearningSep-22-2012

Super-resolution methods form high-resolution images from low-resolution images. In this paper, we develop a new Bayesian nonparametric model for super-resolution. Our method uses a beta-Bernoulli process to learn a set of recurring visual patterns, called dictionary elements, from the data. Because it is nonparametric, the number of elements found is also determined from the data. We test the results on both benchmark and natural images, comparing with several other models from the research literature. We perform large-scale human evaluation experiments to assess the visual quality of the results. In a first implementation, we use Gibbs sampling to approximate the posterior. However, this algorithm is not feasible for large-scale data. To circumvent this, we then develop an online variational Bayes (VB) algorithm. This algorithm finds high quality dictionaries in a fraction of the time needed by the Gibbs sampler.

algorithm, artificial intelligence, bayesian inference, (17 more...)

arXiv.org Machine Learning

1209.5019

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Contextually Guided Semantic Labeling and Search for 3D Point Clouds

Anand, Abhishek, Koppula, Hema Swetha, Joachims, Thorsten, Saxena, Ashutosh

arXiv.org Artificial IntelligenceSep-5-2012

RGB-D cameras, which give an RGB image to- gether with depths, are becoming increasingly popular for robotic perception. In this paper, we address the task of detecting commonly found objects in the 3D point cloud of indoor scenes obtained from such cameras. Our method uses a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships. With a large number of object classes and relations, the model's parsimony becomes important and we address that by using multiple types of edge potentials. We train the model using a maximum-margin learning approach. In our experiments over a total of 52 3D scenes of homes and offices (composed from about 550 views), we get a performance of 84.06% and 73.38% in labeling office and home scenes respectively for 17 object classes each. We also present a method for a robot to search for an object using the learned model and the contextual information available from the current labelings of the scene. We applied this algorithm successfully on a mobile robot for the task of finding 12 object classes in 10 different offices and achieved a precision of 97.56% with 78.43% recall.

image understanding, point cloud, spatial reasoning, (23 more...)

arXiv.org Artificial Intelligence

1111.5358

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(4 more...)

Add feedback

A Unified Approach for Modeling and Recognition of Individual Actions and Group Activities

Qiu, Qiang, Chellappa, Rama

arXiv.org Machine LearningAug-21-2012

Recognizing group activities is challenging due to the difficulties in isolating individual entities, finding the respective roles played by the individuals and representing the complex interactions among the participants. Individual actions and group activities in videos can be represented in a common framework as they share the following common feature: both are composed of a set of low-level features describing motions, e.g., optical flow for each pixel or a trajectory for each feature point, according to a set of composition constraints in both temporal and spatial dimensions. In this paper, we present a unified model to assess the similarity between two given individual or group activities. Our approach avoids explicit extraction of individual actors, identifying and representing the inter-person interactions. With the proposed approach, retrieval from a video database can be performed through Query-by-Example; and activities can be recognized by querying videos containing known activities. The suggested video matching process can be performed in an unsupervised manner. We demonstrate the performance of our approach by recognizing a set of human actions and football plays.

artificial intelligence, natural language, trajectory, (19 more...)

arXiv.org Machine Learning

1208.4398

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Football (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

Information-theoretic Dictionary Learning for Image Classification

Qiu, Qiang, Patel, Vishal M., Chellappa, Rama

arXiv.org Machine LearningAug-17-2012

We present a two-stage approach for learning dictionaries for object classification tasks based on the principle of information maximization. The proposed method seeks a dictionary that is compact, discriminative, and generative. In the first stage, dictionary atoms are selected from an initial dictionary by maximizing the mutual information measure on dictionary compactness, discrimination and reconstruction. In the second stage, the selected dictionary atoms are updated for improved reconstructive and discriminative power using a simple gradient ascent algorithm on mutual information. Experiments using real datasets demonstrate the effectiveness of our approach for image classification tasks.

artificial intelligence, image understanding, representation, (17 more...)

arXiv.org Machine Learning

1208.3687

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.61)

Add feedback

Multidimensional Membership Mixture Models

Jiang, Yun, Lim, Marcus, Saxena, Ashutosh

arXiv.org Machine LearningAug-2-2012

We present the multidimensional membership mixture (M3) models where every dimension of the membership represents an independent mixture model and each data point is generated from the selected mixture components jointly. This is helpful when the data has a certain shared structure. For example, three unique means and three unique variances can effectively form a Gaussian mixture model with nine components, while requiring only six parameters to fully describe it. In this paper, we present three instantiations of M3 models (together with the learning and inference algorithms): infinite, finite, and hybrid, depending on whether the number of mixtures is fixed or not. They are built upon Dirichlet process mixture models, latent Dirichlet allocation, and a combination respectively. We then consider two applications: topic modeling and learning 3D object arrangements. Our experiments show that our M3 models achieve better performance using fewer topics than many classic topic models. We also observe that topics from the different dimensions of M3 models are meaningful and orthogonal to each other.

artificial intelligence, bayesian inference, mixture model, (18 more...)

arXiv.org Machine Learning

1208.0402

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Learning Games from Videos Guided by Descriptive Complexity

Kaiser, Lukasz (LIAFA, CNRS and Universite Paris Diderot)

AAAI ConferencesJul-21-2012

In recent years, several systems have been proposed that learn the rules of a simple card or board game solely from visual demonstration. These systems were constructed for specific games and rely on substantial background knowledge. We introduce a general system for learning board game rules from videos and demonstrate it on several well-known games. The presented algorithm requires only a few demonstrations and minimal background knowledge, and, having learned the rules, automatically derives position evaluation functions and can play the learned games competitively. Our main technique is based on descriptive complexity, i.e. the logical means necessary to define a set of interest. We compute formulas defining allowed moves and final positions in a game in different logics and select the most adequate ones. We show that this method is well-suited for board games and there is strong theoretical evidence that it will generalize to other problems.

artificial intelligence, formula, logic programming, (19 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: Europe > France (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.95)

Add feedback

Visual Saliency Map from Tensor Analysis

Li, Bing (Chinese Academy of Sciences) | Xiong, Weihua (Omnivision Corporation) | Hu, Weiming (Chinese Academy of Sciences)

AAAI ConferencesJul-21-2012

Modeling visual saliency map of an image provides important information for image semantic understanding in many applications. Most existing computational visual saliency models follow a bottom-up framework that generates independent saliency map in each selected visual feature space and combines them in a proper way. Two big challenges to be addressed explicitly in these methods are (1) which features should be extracted for all pixels of the input image and (2) how to dynamically determine importance of the saliency map generated in each feature space. In order to address these problems, we present a novel saliency map computational model based on tensor decomposition and reconstruction. Tensor representation and analysis not only explicitly represent image's color values but also imply two important relationships inherent to color image. One is reflecting spatial correlations between pixels and the other one is representing interplay between color channels. Therefore, saliency map generator based on the proposed model can adaptively find the most suitable features and their combinational coefficients for each pixel. Experiments on a synthetic image set and a real image set show that our method is superior or comparable to other prevailing saliency map models.

artificial intelligence, image understanding, saliency map, (17 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback