The Odd One Out test of intelligence consists of 3x3 matrix reasoning problems organized in 20 levels of difficulty. Addressing problems on this test appears to require integration of multiple cognitive abilities usually associated with creativity, including visual encoding, similarity assessment, pattern detection, and analogical transfer. We describe a novel fractal strategy for addressing visual analogy problems on the Odd One Out test. In our strategy, the relationship between images is encoded fractally, capturing important aspects of similarity as well as inherent self-similarity. The strategy starts with fractal representations encoded at a high level of resolution, but, if that is not sufficient to resolve ambiguity, it automatically adjusts itself to the right level of resolution for addressing a given problem. Similarly, the strategy starts with searching for fractally-derived similarity between simpler relationships, but, if that is not sufficient to resolve ambiguity, it automatically shifts to search for such similarity between higher-order relationships. We present preliminary results and initial analysis from applying the fractal technique on nearly 3,000 problems from the Odd One Out test.
We present a fractal technique for addressing geometric analogy problems from the Raven's Standard Progressive Matrices test of general intelligence. In this method, an image is represented fractally, capturing its inherent self-similarity. We apply these fractal representations to problems from the Raven's test, and show how these representations afford a new method for solving complex geometric analogy problems. We present results using the fractal algorithm on all 60 problems from the Standard Progressive Matrices version of the Raven's test.
The Raven's Progressive Matrices (RPM) test is a commonly used test of general human intelligence. The RPM is somewhat unique as a general intelligence test in that it focuses on visual problem solving, and in particular, on visual similarity and analogy. We are developing a small set of methods for problem solving in the RPM which use propositional, imagistic, and multimodal representations, respectively, to investigate how different representations can contribute to visual problem solving and how the effects of their use might emerge in behavior.
We address two domains of skill transfer problems encountered by an autonomous robot: within-domain adaptation and cross-domain transfer. Our aim is to provide skill representations which enable transfer in each problem classification. As such, we explore two approaches to skill representation which address each problem classification separately. The first representation, based on mimicking, encodes the full demonstration and is well suited for within-domain adaptation. The second representation is based on imitation and serves to encode a set of key points along the trajectory, which represent the goal points most relevant to the successful completion of the skill. This representation enables both within-domain and cross-domain transfer. A planner is then applied to these constraints, generating a domain-specific trajectory which addresses the transfer task.
We cast visual imitation as a visual correspondence problem. Our robotic agent is rewarded when its actions result in better matching of relative spatial configurations for corresponding visual entities detected in its workspace and teacher's demonstration. We build upon recent advances in Computer Vision,such as human finger keypoint detectors, object detectors trained on-the-fly with synthetic augmentations, and point detectors supervised by viewpoint changes and learn multiple visual entity detectors for each demonstration without human annotations or robot interactions. We empirically show the proposed factorized visual representations of entities and their spatial arrangements drive successful imitation of a variety of manipulation skills within minutes, using a single demonstration and without any environment instrumentation. It is robust to background clutter and can effectively generalize across environment variations between demonstrator and imitator, greatly outperforming unstructured non-factorized full-frame CNN encodings of previous works.