representational
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Spatio-temporal representational learning has been widely adopted in various fields such as action recognition, video object segmentation, and action anticipation.Previous spatio-temporal representational learning approaches primarily employ ConvNets or sequential models, e.g., LSTM, to learn the intra-frame and inter-frame features. Recently, Transformer models have successfully dominated the study of natural language processing (NLP), image classification, etc. However, the pure-Transformer based spatio-temporal learning can be prohibitively costly on memory and computation to extract fine-grained features from a tiny patch. To tackle the training difficulty and enhance the spatio-temporal learning, we construct a shifted chunk Transformer with pure self-attention blocks. Leveraging the recent efficient Transformer design in NLP, this shifted chunk Transformer can learn hierarchical spatio-temporal features from a local tiny patch to a global videoclip. Our shifted self-attention can also effectively model complicated inter-frame variances. Furthermore, we build a clip encoder based on Transformer to model long-term temporal dependencies. We conduct thorough ablation studies to validate each component and hyper-parameters in our shifted chunk Transformer, and it outperforms previous state-of-the-art approaches on Kinetics-400, Kinetics-600,UCF101, and HMDB51.
Duality of Bures and Shape Distances with Implications for Comparing Neural Representations
Harvey, Sarah E., Larsen, Brett W., Williams, Alex H.
A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape. Most of these measures fall into one of two categories. First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit mappings between neural units to quantify similarity while accounting for expected invariances. Second, measures such as representational similarity analysis (RSA), centered kernel alignment (CKA), and normalized Bures similarity (NBS) all quantify similarity in summary statistics, such as stimulus-by-stimulus kernel matrices, which are already invariant to expected symmetries. Here, we take steps towards unifying these two broad categories of methods by observing that the cosine of the Riemannian shape distance (from category 1) is equal to NBS (from category 2). We explore how this connection leads to new interpretations of shape distances and NBS, and draw contrasts of these measures with CKA, a popular similarity measure in the deep learning literature.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Brain-like representational straightening of natural movies in robust feedforward neural networks
Toosi, Tahereh, Issa, Elias B.
Representational straightening refers to a decrease in curvature of visual feature representations of a sequence of frames taken from natural movies. Prior work established straightening in neural representations of the primate primary visual cortex (V1) and perceptual straightening in human behavior as a hallmark of biological vision in contrast to artificial feedforward neural networks which did not demonstrate this phenomenon as they were not explicitly optimized to produce temporally predictable movie representations. Here, we show robustness to noise in the input image can produce representational straightening in feedforward neural networks. Both adversarial training (AT) and base classifiers for Random Smoothing (RS) induced remarkably straightened feature codes. Demonstrating their utility within the domain of natural movies, these codes could be inverted to generate intervening movie frames by linear interpolation in the feature space even though they were not trained on these trajectories. Demonstrating their biological utility, we found that AT and RS training improved predictions of neural data in primate V1 over baseline models providing a parsimonious, bio-plausible mechanism -- noise in the sensory input stages -- for generating representations in early visual cortex. Finally, we compared the geometric properties of frame representations in these networks to better understand how they produced representations that mimicked the straightening phenomenon from biology. Overall, this work elucidating emergent properties of robust neural networks demonstrates that it is not necessary to utilize predictive objectives or train directly on natural movie statistics to achieve models supporting straightened movie representations similar to human perception that also predict V1 neural responses.
What A Situated Language-Using Agent Must be Able to Do: A Top-Down Analysis
Even in our increasingly text-intensive times, the primary site of language use is situated, co-present interaction. It is primary ontogenetically and phylogenetically, and it is arguably also still primary in negotiating everyday social situations. Situated interaction is also the final frontier of Natural Language Processing, where, compared to the area of text processing, very little progress has been made in the past decade, and where a myriad of practical applications is waiting to be unlocked. While the usual approach in the field is to reach, bottom-up, for the ever next "adjacent possible", in this paper I attempt a top-down analysis of what the demands are that unrestricted situated interaction makes on the participating agent, and suggest ways in which this analysis can structure computational models and research on them. Specifically, I discuss representational demands (the building up and application of world model, language model, situation model, discourse model, and agent model) and what I call anchoring processes (incremental processing, incremental learning, conversational grounding, multimodal grounding) that bind the agent to the here, now, and us.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- (12 more...)
Efficient Feature Representations for Cricket Data Analysis using Deep Learning based Multi-Modal Fusion Model
Alaka, Souridas, Sreekumar, Rishikesh, Shalu, Hrithwik
Data analysis has become a necessity in the modern era of cricket. Everything from effective team management to match win predictions use some form of analytics. Meaningful data representations are necessary for efficient analysis of data. In this study we investigate the use of adaptive (learnable) embeddings to represent inter-related features (such as players, teams, etc). The data used for this study is collected from a classical T20 tournament IPL (Indian Premier League). To naturally facilitate the learning of meaningful representations of features for accurate data analysis, we formulate a deep representation learning framework which jointly learns a custom set of embeddings (which represents our features of interest) through the minimization of a contrastive loss. We base our objective on a set of classes obtained as a result of hierarchical clustering on the overall run rate of an innings. It's been assessed that the framework ensures greater generality in the obtained embeddings, on top of which a task based analysis of overall run rate prediction was done to show the reliability of the framework.
Cognitive Perspectives on Context-based Decisions and Explanations
Westberg, Marcus, Främling, Kary
In this paper we and Cognitive Science, there is a pervasive idea that argue that explanations, while not always perfectly accurate humans employ mental representations in order to in regards to reality (due to the existence of hidden factors), navigate the world and make predictions about outcomes are best structured and involve the same conceptual basis as of future actions. By understanding how decisions, and that when trying to understand an explanation these representational structures work, we not only we do so by simulating the decision-making process through understand more about human cognition but also the explanation provided. In other words, what makes a good gain a better understanding for how humans rationalise explanation of an agent-based action is that it presents us with and explain decisions. This has an influencing a reasoning structure that we can follow and relate to our own effect on explainable AI, where the goal is to decision-making processes. It is thus imperative that the explanations provide explanations of computer decision-making provided by artificial agents, in the context of XAI, for a human audience. We show that the Contextual not only provide deliberations that we can follow, but more Importance and Utility method for XAI share importantly provide them in a conceptual framework which an overlap with the current new wave of actionoriented facilitates retreading the deliberation and is context-sensitive.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Minnesota (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Sweden > Västerbotten County > Umeå (0.04)
Learning to Remember from a Multi-Task Teacher
Xiong, Yuwen, Ren, Mengye, Urtasun, Raquel
Recent studies on catastrophic forgetting during sequential learning typically focus on fixing the accuracy of the predictions for a previously learned task. In this paper we argue that the outputs of neural networks are subject to rapid changes when learning a new data distribution, and networks that appear to "forget" everything still contain useful representation towards previous tasks. Instead of enforcing the output accuracy to stay the same, we propose to reduce the effect of catastrophic forgetting on the representation level, as the output layer can be quickly recovered later with a small number of examples. Towards this goal, we propose an experimental setup that measures the amount of representational forgetting, and develop a novel meta-learning algorithm to overcome this issue. The proposed meta-learner produces weight updates of a sequential learning network, mimicking a multi-task teacher network's representation. We show that our meta-learner can improve its learned representations on new tasks, while maintaining a good representation for old tasks.
College Rankings Revisited: What Might an Artificial Intelligence Think?
This post is from guest contributor Steve Lattanzio from MetaMetrics. While we do not tend to cover college rankings at e-Literate, we do care about transparency in usage of data as well as understanding opportunities where technology and data might inform students, faculty, administrators and the general educational community. The following post is an interesting exploration in the usage of the full set of College Scorecard data in a way that is understandable and usable. Ranking colleges has become a bit of a national pastime. There are many organizations that publish "overall" rankings for our institutions of higher education (such as Forbes, Niche, Times Higher Education, and US News & World Report), each with their own methodologies. We don't typically get the complete and precise picture of how these rankings are constructed.
- North America > United States > California (0.29)
- North America > United States > Virginia (0.05)
- North America > United States > Pennsylvania (0.04)
- (2 more...)
The Mind at AI: Horseless Carriage to Clock
Commentators on AI converge on two goals they believe define the field: (1) to better understand the mind by specifying computational models and (2) to construct computer systems that perform actions traditionally regarded as mental. We should recognize that AI has a third, hidden, more basic aim; that the first two goals are special cases of the third; and that the actual technical substance of AI concerns only this more basic aim. This third aim is to establish new computation-based representational media, media in which human intellect can come to express itself with different clarity and force. This article articulates this proposal by showing how the intellectual activity we label AI can be likened in revealing ways to each of five familiar technologies. AI is not about building artificial intelligences, nor is it about understanding the human mind or any other kind of mind.
- Information Technology > Software (1.00)
- Health & Medicine (1.00)
Book Review
The idea is that although an AI system without the frame problem might, say, read an echocardiogram and diagnose a heart defect, a really smart autonomous robot will arrive only if, like us humans, it can handle the frame problem. The highlight … is an entertaining go-round between two pugilists trading blows in civil but gloves-off style, reminiscent of a net discussion. We're still confronted by a difficult question: Is there a solution to it? If not, then R2D2 might forever be but a creature of fiction. If, however, the frame problem is solvable, we must confront yet another question: Is there a general solution to the frame problem, or is the best that can be mustered a so-called domain-dependent solution?