Problem Solving
A Knowledge Compilation Map for Conditional Preference Statements-based Languages
Fargier, Hélène, Mengin, Jérôme
Conditional preference statements have been used to compactly represent preferences over combinatorial domains. They are at the core of CP-nets and their generalizations, and lexicographic preference trees. Several works have addressed the complexity of some queries (optimization, dominance in particular). We extend in this paper some of these results, and study other queries which have not been addressed so far, like equivalence, thereby contributing to a knowledge compilation map for languages based on conditional preference statements. We also introduce a new parameterised family of languages, which enables to balance expressiveness against the complexity of some queries.
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Kim, Wonjae, Son, Bokyung, Kim, Ildoo
Vision-and-Language Pretraining (VLP) has improved performance on various joint vision-and-language downstream tasks. Current approaches for VLP heavily rely on image feature extraction processes, most of which involve region supervisions (e.g., object detection) and the convolutional architecture (e.g., ResNet). Although disregarded in the literature, we find it problematic in terms of both (1) efficiency/speed, that simply extracting input features requires much more computation than the actual multimodal interaction steps; and (2) expressive power, as it is upper bounded to the expressive power of the visual encoder and its predefined visual vocabulary. In this paper, we present a minimal VLP model, Vision-and-Language Transformer (ViLT), monolithic in the sense that processing of visual inputs is drastically simplified to just the same convolution-free manner that we process textual inputs. We show that ViLT is up to 60 times faster than previous VLP models, yet with competitive or better downstream task performance.
Finally, I can solve a Rubik's Cube
The Rubik's Cube has been around for decades. I've toyed with the cube, probably in the very late '80s or early '90s, but never even imagined being able to solve one; from entirely shuffled, to perfectly ordered. But wouldn't it be satisfying if I could? Fortunately, the internet makes solving what was originally an architecture puzzle, doable for most of us. The world record for solving a cube has plummeted since 2000 from 20 seconds to under five, as pros and enthusiasts synthesized high-speed solutions and turn combinations (called algorithms) and shared them with the world.
Senic Friends of Hue Outdoor Smart Switch review: A versatile problem solver
Installing any kind of light switch where no wiring is already present is difficult to do on your own and expensive if you pay an electrician to do the work. If you're looking to control Philips Hue smart lighting, you might find Senic's Friends of Hue Outdoor Smart Switch to be pricey at $79, but you can install it yourself in less time than it takes to fetch a screwdriver. The secret behind the easy installation is that the Friends of Hue Outdoor doesn't connect to your home's electrical wiring--it doesn't even rely on batteries. Instead, the switch harvests the kinetic energy generated by pressing one of the switch's four buttons (one at the top and one at the bottom of each paddle). This generates enough energy to send a radio signal to a second-generation Philips Hue Bridge, using the Bridge's Zigbee mesh network.
Learning Abstract Representations through Lossy Compression of Multi-Modal Signals
Wilmot, Charles, Triesch, Jochen
Abstract--A key competence for open-ended learning is the formation of increasingly abstract representations useful for driving complex behavior. Abstract representations ignore specific details and facilitate generalization. Here we consider the learning of abstract representations in a multi-modal setting with two or more input modalities. We treat the problem as a lossy compression problem and show that generic lossy compression of multimodal sensory input naturally extracts abstract representations that tend to strip away modalitiy specific details and preferentially retain information that is shared across the different modalities. Furthermore, we propose an architecture to learn abstract representations by identifying and retaining only the information that is shared across multiple modalities while discarding any modality specific information.
This AI can explain how it solves Rubik's Cube--and that's a big deal
However, these AI algorithms cannot explain the thought processes behind their decisions. A computer that masters protein folding and also tells researchers more about the rules of biology is much more useful than a computer that folds proteins without explanation. Therefore, AI researchers like me are now turning our efforts toward developing AI algorithms that can explain themselves in a manner that humans can understand. If we can do this, I believe that AI will be able to uncover and teach people new facts about the world that have not yet been discovered, leading to new innovations. One field of AI, called reinforcement learning, studies how computers can learn from their own experiences.
HySTER: A Hybrid Spatio-Temporal Event Reasoner
Sautory, Theophile, Cingillioglu, Nuri, Russo, Alessandra
The task of Video Question Answering (VideoQA) consists in answering natural language questions about a video and serves as a proxy to evaluate the performance of a model in scene sequence understanding. Most methods designed for VideoQA up-to-date are end-to-end deep learning architectures which struggle at complex temporal and causal reasoning and provide limited transparency in reasoning steps. We present the HySTER: a Hybrid Spatio-Temporal Event Reasoner to reason over physical events in videos. Our model leverages the strength of deep learning methods to extract information from video frames with the reasoning capabilities and explainability of symbolic artificial intelligence in an answer set programming framework. We define a method based on general temporal, causal and physics rules which can be transferred across tasks. We apply our model to the CLEVRER dataset and demonstrate state-of-the-art results in question answering accuracy. This work sets the foundations for the incorporation of inductive logic programming in the field of VideoQA.
Memetics and Neural Models of Conspiracy Theories
Conspiracy theories, or in general seriously distorted beliefs, are widespread. How and why are they formed in the brain is still more a matter of speculation rather than science. In this paper one plausible mechanisms is investigated: rapid freezing of high neuroplasticity (RFHN). Emotional arousal increases neuroplasticity and leads to creation of new pathways spreading neural activation. Using the language of neurodynamics a meme is defined as quasi-stable associative memory attractor state. Depending on the temporal characteristics of the incoming information and the plasticity of the network, memory may self-organize creating memes with large attractor basins, linking many unrelated input patterns. Memes with fake rich associations distort relations between memory states. Simulations of various neural network models trained with competitive Hebbian learning (CHL) on stationary and non-stationary data lead to the same conclusion: short learning with high plasticity followed by rapid decrease of plasticity leads to memes with large attraction basins, distorting input pattern representations in associative memory. Such system-level models may be used to understand creation of distorted beliefs and formation of conspiracy memes, understood as strong attractor states of the neurodynamics.
Understanding in Artificial Intelligence
Maetschke, Stefan, Iraola, David Martinez, Barnard, Pieter, ShafieiBavani, Elaheh, Zhong, Peter, Xu, Ying, Yepes, Antonio Jimeno
However, this progress is largely driven by increased computational power, namely GPU's, and bigger data sets but not due to radically new algorithms or knowledge representations. Artificial Neural Networks and Stochastic Gradient Descent, popularized in the 80's [3], remain the fundamental building blocks for most modern AI systems. While very successful for many applications, especially in vision, the purely deep-learning based approach has significant weaknesses. For instance, CNN's struggle with same-different relations [4], fail when long-chained reasoning is needed [5], are non-decomposable, cannot easily incorporate symbolic knowledge, and are hampered by a lack of model interpretability. Many current methods essentially compute higher order statistics over basic elements such as pixels, phonemes, letters or words to process inputs but do not explicitly model the building blocks and their relations in a (de)composable and interpretable way.