Since the early years of artificial intelligence, scientists have dreamed of creating computers that can "see" the world. As vision plays a key role in many things we do every day, cracking the code of computer vision seemed to be one of the major steps toward developing artificial general intelligence. But like many other goals in AI, computer vision has proven to be easier said than done. In 1966, scientists at MIT launched "The Summer Vision Project," a two-month effort to create a computer system that could identify objects and background areas in images. But it took much more than a summer break to achieve those goals. In fact, it wasn't until the early 2010s that image classifiers and object detectors were flexible and reliable enough to be used in mainstream applications.
Welcome to AI book reviews, a series of posts that explore the latest literature on artificial intelligence. Since the early years of artificial intelligence, scientists have dreamed of creating computers that can "see" the world. As vision plays a key role in many things we do every day, cracking the code of computer vision seemed to be one of the major steps toward developing artificial general intelligence. But like many other goals in AI, computer vision has proven to be easier said than done. In 1966, scientists at MIT launched "The Summer Vision Project," a two-month effort to create a computer system that could identify objects and background areas in images.
Affect recognition based on subjects' facial expressions has been a topic of major research in the attempt to generate machines that can understand the way subjects feel, act and react. In the past, due to the unavailability of large amounts of data captured in real-life situations, research has mainly focused on controlled environments. However, recently, social media and platforms have been widely used. Moreover, deep learning has emerged as a means to solve visual analysis and recognition problems. This paper exploits these advances and presents significant contributions for affect analysis and recognition in-the-wild. Affect analysis and recognition can be seen as a dual knowledge generation problem, involving: i) creation of new, large and rich in-the-wild databases and ii) design and training of novel deep neural architectures that are able to analyse affect over these databases and to successfully generalise their performance on other datasets. The paper focuses on large in-the-wild databases, i.e., Aff-Wild and Aff-Wild2 and presents the design of two classes of deep neural networks trained with these databases. The first class refers to uni-task affect recognition, focusing on prediction of the valence and arousal dimensional variables. The second class refers to estimation of all main behavior tasks, i.e. valence-arousal prediction; categorical emotion classification in seven basic facial expressions; facial Action Unit detection. A novel multi-task and holistic framework is presented which is able to jointly learn and effectively generalize and perform affect recognition over all existing in-the-wild databases. Large experimental studies illustrate the achieved performance improvement over the existing state-of-the-art in affect recognition. HIS paper presents recent developments and research directions in affective behavior analysis in-the-wild, which is a major targeted characteristic of human computer interaction systems in real life applications. Such systems, machines and robots, should be able to automatically sense and interpret facial and audio-visual signals relevant to emotions, appraisals and intentions; thus, being able to interact in a'human-centered' and engaging manner with people, as their digital assistants in the home, work, operational or industrial environment. Through human affect recognition, the reactions of the machine, or robot, will be consistent with people's expectations and emotions; their verbal and non-verbal interactions will be positively received by humans. Moreover, this interaction should not be dependent on the respective context, nor the human's age, sex, ethnicity, educational level, profession, or social position. As a consequence, the development of intelligent systems able to analyze human behavior in-the-wild can contribute to generation of trust, understanding and closeness between humans and machines in real life environments.
Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. For example, in the case of adversarial attacks, where adding small amounts of noise to an image, including an object, can lead to strong misclassification of that object. But for humans, the noise is often invisible. If vulnerability to adversarial noise cannot be fixed, DCNNs cannot be taken as serious models of human vision. Many studies have tried to add features of the human visual system to DCNNs to make them robust against adversarial attacks. However, it is not fully clear whether human vision inspired components increase robustness because performance evaluations of these novel components in DCNNs are often inconclusive. We propose a set of criteria for proper evaluation and analyze different models according to these criteria. We finally sketch future efforts to make DCCNs one step closer to the model of human vision.
In continual learning, a system learns from non-stationary data streams or batches without catastrophic forgetting. While this problem has been heavily studied in supervised image classification and reinforcement learning, continual learning in neural networks designed for abstract reasoning has not yet been studied. Here, we study continual learning of analogical reasoning. Analogical reasoning tests such as Raven's Progressive Matrices (RPMs) are commonly used to measure non-verbal abstract reasoning in humans, and recently offline neural networks for the RPM problem have been proposed. In this paper, we establish experimental baselines, protocols, and forward and backward transfer metrics to evaluate continual learners on RPMs. We employ experience replay to mitigate catastrophic forgetting. Prior work using replay for image classification tasks has found that selectively choosing the samples to replay offers little, if any, benefit over random selection. In contrast, we find that selective replay can significantly outperform random selection for the RPM task.
Rapid, accurate and robust detection of looming objects in cluttered moving backgrounds is a significant and challenging problem for robotic visual systems to perform collision detection and avoidance tasks. Inspired by the neural circuit of elementary motion vision in the mammalian retina, this paper proposes a bioinspired approach-sensitive neural network (ASNN) that contains three main contributions. Firstly, a direction-selective visual processing module is built based on the spatiotemporal energy framework, which can estimate motion direction accurately via only two mutually perpendicular spatiotemporal filtering channels. Secondly, a novel approach-sensitive neural network is modeled as a push-pull structure formed by ON and OFF pathways, which responds strongly to approaching motion while insensitivity to lateral motion. Finally, a method of directionally selective inhibition is introduced, which is able to suppress the translational backgrounds effectively. Extensive synthetic and real robotic experiments show that the proposed model is able to not only detect collision accurately and robustly in cluttered and dynamic backgrounds but also extract more collision information like position and direction, for guiding rapid decision making.
The study of attentional processing in vision has a long and deep history. Recently, several papers have presented insightful perspectives into how the coordination of multiple attentional functions in the brain might occur. These begin with experimental observations and the authors propose structures, processes, and computations that might explain those observations. Here, we consider a perspective that past works have not, as a complementary approach to the experimentally-grounded ones. We approach the same problem as past authors but from the other end of the computational spectrum, from the problem nature, as Marr's Computational Level would prescribe. What problem must the brain solve when orchestrating attentional processes in order to successfully complete one of the myriad possible visuospatial tasks at which we as humans excel? The hope, of course, is for the approaches to eventually meet and thus form a complete theory, but this is likely not soon. We make the first steps towards this by addressing the necessity of attentional control, examining the breadth and computational difficulty of the visuospatial and attentional tasks seen in human behavior, and suggesting a sketch of how attentional control might arise in the brain. The key conclusions of this paper are that an executive controller is necessary for human attentional function in vision, and that there is a 'first principles' computational approach to its understanding that is complementary to the previous approaches that focus on modelling or learning from experimental observations directly.
Artificial Intelligence (AI) is not just a buzzword, but a crucial part of the technology landscape. AI is changing every industry and business function, which results in increased interest in its applications, subdomains and related fields. This makes AI companies the top leaders driving the technology swift. AI helps us to optimise and automate crucial business processes, gather essential data and transform the world, one step at a time. From Google and Amazon to Apple and Microsoft, every major tech company is dedicating resources to breakthroughs in artificial intelligence. As big enterprises are busy acquiring or merging with other emerging inventions, small AI companies are also working hard to develop their own intelligent technology and services. By leveraging artificial intelligence, organizations get an innovative edge in the digital age. AI consults are also working to provide companies with expertise that can help them grow. In this digital era, AI is also a significant place for investment. AI companies are constantly developing the latest products to provide the simplest solutions. Henceforth, Analytics Insight brings you the list of top 100 AI companies that are leading the technology drive towards a better tomorrow. AEye develops advanced vision hardware, software, and algorithms that act as the eyes and visual cortex of autonomous vehicles. AEye is an artificial perception pioneer and creator of iDAR, a new form of intelligent data collection that acts as the eyes and visual cortex of autonomous vehicles. Since its demonstration of its solid state LiDAR scanner in 2013, AEye has pioneered breakthroughs in intelligent sensing. Their mission was to acquire the most information with the fewest ones and zeros. This would allow AEye to drive the automotive industry into the next realm of autonomy. Algorithmia invented the AI Layer.
In the last years, AI safety gained international recognition in the light of heterogeneous safety-critical and ethical issues that risk overshadowing the broad beneficial impacts of AI. In this context, the implementation of AI observatory endeavors represents one key research direction. This paper motivates the need for an inherently transdisciplinary AI observatory approach integrating diverse retrospective and counterfactual views. We delineate aims and limitations while providing hands-on-advice utilizing concrete practical examples. Distinguishing between unintentionally and intentionally triggered AI risks with diverse socio-psycho-technological impacts, we exemplify a retrospective descriptive analysis followed by a retrospective counterfactual risk analysis. Building on these AI observatory tools, we present near-term transdisciplinary guidelines for AI safety. As further contribution, we discuss differentiated and tailored long-term directions through the lens of two disparate modern AI safety paradigms. For simplicity, we refer to these two different paradigms with the terms artificial stupidity (AS) and eternal creativity (EC) respectively. While both AS and EC acknowledge the need for a hybrid cognitive-affective approach to AI safety and overlap with regard to many short-term considerations, they differ fundamentally in the nature of multiple envisaged long-term solution patterns. By compiling relevant underlying contradistinctions, we aim to provide future-oriented incentives for constructive dialectics in practical and theoretical AI safety research.
Neural networks today often recognize objects as well as people do, and thus might serve as models of the human recognition process. However, most such networks provide their answer after a fixed computational effort, whereas human reaction time varies, e.g. from 0.2 to 10 s, depending on the properties of stimulus and task. To model the effect of difficulty on human reaction time, we considered a classification network that uses early-exit classifiers to make anytime predictions. Comparing human and MSDNet accuracy in classifying CIFAR-10 images in added Gaussian noise, we find that the network equivalent input noise SD is 15 times higher than human, and that human efficiency is only 0.6\% that of the network. When appropriate amounts of noise are present to bring the two observers (human and network) into the same accuracy range, they show very similar dependence on duration or FLOPS, i.e. very similar speed-accuracy tradeoff. We conclude that Anytime classification (i.e. early exits) is a promising model for human reaction time in recognition tasks.