AITopics | Siskind, Jeffrey Mark

Collaborating Authors

Siskind, Jeffrey Mark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tricks from Deep Learning

Baydin, Atılım Güneş, Pearlmutter, Barak A., Siskind, Jeffrey Mark

arXiv.org Machine LearningNov-10-2016

The deep learning community has devised a diverse set of methods to make gradient optimization, using large datasets, of large and highly complex models with deeply cascaded nonlinearities, practical. Taken as a whole, these methods constitute a breakthrough, allowing computational structures which are quite wide, very deep, and with an enormous number and variety of free parameters to be effectively optimized. The result now dominates much of practical machine learning, with applications in machine translation, computer vision, and speech recognition. Many of these methods, viewed through the lens of algorithmic differentiation (AD), can be seen as either addressing issues with the gradient itself, or finding ways of achieving increased efficiency using tricks that are AD-related, but not provided by current AD systems. The goal of this paper is to explain not just those methods of most relevance to AD, but also the technical constraints and mindset which led to their discovery. After explaining this context, we present a "laundry list" of methods developed by the deep learning community. Two of these are discussed in further mathematical detail: a way to dramatically reduce the size of the tape when performing reverse-mode AD on a (theoretically) time-reversible process like an ODE integrator; and a new mathematical insight that allows for the implementation of a stochastic Newton's method.

deep learning, gradient, neural network, (15 more...)

arXiv.org Machine Learning

1611.03777

Country: Europe > France (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Describe Video with Weak Supervision by Exploiting Negative Sentential Information

Yu, Haonan (Purdue University) | Siskind, Jeffrey Mark (Purdue University)

AAAI ConferencesMar-6-2015

Most previous work on video description trains individualparts of speech independently. It is more appealing from a linguistic point of view, for word models for all parts of speech to be learned simultaneously from whole sentences, a hypothesis suggested by some linguists for child language acquisition. In this paper, we learn to describe video by discriminatively training positive sentential labels against negative ones in a weakly supervised fashion: the meaning representations (i.e., HMMs) of individual words in these labels are learned from whole sentences without any correspondence annotation of what those words denote in the video. Textual descriptions are then generated for new video using trained word models.

Add feedback

Conducting Neuroscience to Guide the Development of AI

Siskind, Jeffrey Mark (Purdue University)

AAAI ConferencesMar-6-2015

Study of the human brain through fMRI can potentially benefit the pursuit of artificial intelligence. Four examples are presented. First, fMRI decoding of the brain activity of subjects watching video clips yields higher accuracy than state-of-the-art computer-vision approaches to activity recognition. Second, novel methods are presented that decode aggregate representations of complex visual stimuli by decoding their independent constituents. Third, cross-modal studies demonstrate the ability to decode the brain activity induced in subjects watching video stimuli when trained on the brain activity induced in subjects seeing text or hearing speech stimuli and vice versa. Fourth, the time course of brain processing while watching video stimuli is probed with scanning that trades off the amount of the brain scanned for the frequency at which it is scanned. Techniques like these can be used to study how the human brain grounds language in visual perception and may motivate development of novel approaches in AI.

neural network, neurology, stimuli, (20 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report > Promising Solution (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Cognitive Science (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

Seeing What You're Told: Sentence-Guided Activity Recognition In Video

Siddharth, N., Barbu, Andrei, Siskind, Jeffrey Mark

arXiv.org Artificial IntelligenceMay-28-2014

We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. We show how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) in the form of whole sentential descriptions mediated by a grammar, guides the activity-recognition process. Further, the utility and expressiveness of our framework is demonstrated by performing three separate tasks in the domain of multi-activity videos: sentence-guided focus of attention, generation of sentential descriptions of video, and query-based video search, simply by leveraging the framework in different manners.

artificial intelligence, natural language, preposition, (18 more...)

arXiv.org Artificial Intelligence

1308.4189

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.46)

Add feedback

Simultaneous Object Detection, Tracking, and Event Recognition

Barbu, Andrei, Michaux, Aaron, Narayanaswamy, Siddharth, Siskind, Jeffrey Mark

arXiv.org Artificial IntelligenceApr-12-2012

The common internal structure and algorithmic organization of object detection, detection-based tracking, and event recognition facilitates a general approach to integrating these three components. This supports multidirectional information flow between these components allowing object detection to influence tracking and event recognition and event recognition to influence tracking and object detection. The performance of the combination can exceed the performance of the components in isolation. This can be done with linear asymptotic complexity.

artificial intelligence, detection, image understanding, (17 more...)

arXiv.org Artificial Intelligence

1204.2741

Country:

North America > United States > California (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)

Industry: Government > Military (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Seeing Unseeability to See the Unseeable

Narayanaswamy, Siddharth, Barbu, Andrei, Siskind, Jeffrey Mark

arXiv.org Artificial IntelligenceApr-12-2012

We present a framework that allows an observer to determine occluded portions of a structure by finding the maximum-likelihood estimate of those occluded portions consistent with visible image evidence and a consistency model. Doing this requires determining which portions of the structure are occluded in the first place. Since each process relies on the other, we determine a solution to both problems in tandem. We extend our framework to determine confidence of one's assessment of which portions of an observed structure are occluded, and the estimate of that occluded structure, by determining the sensitivity of one's assessment to potential new observations. We further extend our framework to determine a robotic action whose execution would allow a new observation that would maximally increase one's confidence.

artificial intelligence, log feature, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1204.2801

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Video In Sentences Out

Barbu, Andrei, Bridge, Alexander, Burchill, Zachary, Coroian, Dan, Dickinson, Sven, Fidler, Sanja, Michaux, Aaron, Mussman, Sam, Narayanaswamy, Siddharth, Salvi, Dhaval, Schmidt, Lara, Shangguan, Jiangnan, Siskind, Jeffrey Mark, Waggoner, Jarrell, Wang, Song, Wei, Jinlian, Yin, Yifan, Zhang, Zhiqi

arXiv.org Artificial IntelligenceApr-12-2012

We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the track-to-role assignments, and changing body posture.

action class, artificial intelligence, natural language, (20 more...)

arXiv.org Artificial Intelligence

1204.2742

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)

Industry: Government > Military (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback