AITopics | Brown, Matthew

Collaborating Authors

Brown, Matthew

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects

Krishnan, Akshay, Kundu, Abhijit, Maninis, Kevis-Kokitsi, Hays, James, Brown, Matthew

arXiv.org Artificial IntelligenceJul-11-2024

We propose OmniNOCS, a large-scale monocular dataset with 3D Normalized Object Coordinate Space (NOCS) maps, object masks, and 3D bounding box annotations for indoor and outdoor scenes. OmniNOCS has 20 times more object classes and 200 times more instances than existing NOCS datasets (NOCS-Real275, Wild6D). We use OmniNOCS to train a novel, transformer-based monocular NOCS prediction model (NOCSformer) that can predict accurate NOCS, instance masks and poses from 2D object detections across diverse classes. It is the first NOCS model that can generalize to a broad range of classes when prompted with 2D boxes. We evaluate our model on the task of 3D oriented bounding box prediction, where it achieves comparable results to state-of-the-art 3D detection methods such as Cube R-CNN. Unlike other 3D detection methods, our model also provides detailed and accurate 3D object shape and segmentation. We propose a novel benchmark for the task of NOCS prediction based on OmniNOCS, which we hope will serve as a useful baseline for future work in this area. Our dataset and code will be at the project website: https://omninocs.github.io.

artificial intelligence, machine learning, object-oriented architecture, (18 more...)

arXiv.org Artificial Intelligence

2407.08711

Country: North America > Canada (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Module-wise Adaptive Distillation for Multimodality Foundation Models

Liang, Chen, Yu, Jiahui, Yang, Ming-Hsuan, Brown, Matthew, Cui, Yin, Zhao, Tuo, Gong, Boqing, Zhou, Tianyi

arXiv.org Artificial IntelligenceOct-6-2023

Pre-trained multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes. One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer. Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently. Such an approach can be naturally formulated as a multi-armed bandit (MAB) problem, where modules and loss decrements are considered as arms and rewards, respectively. We then develop a modified-Thompson sampling algorithm named OPTIMA to address the nonstationarity of module contributions resulting from model updating. Specifically, we leverage the observed contributions in recent history to estimate the changing contribution of each module and select modules based on these estimations to maximize the cumulative contribution. We evaluate the effectiveness of OPTIMA through distillation experiments on various multimodal understanding and image captioning tasks, using the CoCa-Large model (Yu et al., 2022) as the teacher model.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.0455

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.50)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

MoViNets: Mobile Video Networks for Efficient Video Recognition

Kondratyuk, Dan, Yuan, Liangzhe, Li, Yandong, Zhang, Li, Tan, Mingxing, Brown, Matthew, Gong, Boqing

arXiv.org Artificial IntelligenceMar-21-2021

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference. 3D convolutional neural networks (CNNs) are accurate at video recognition but require large computation and memory budgets and do not support online inference, making them difficult to work on mobile devices. We propose a three-step approach to improve computational efficiency while substantially reducing the peak memory usage of 3D CNNs. First, we design a video network search space and employ neural architecture search to generate efficient and diverse 3D CNN architectures. Second, we introduce the Stream Buffer technique that decouples memory from video clip duration, allowing 3D CNNs to embed arbitrary-length streaming video sequences for both training and inference with a small constant memory footprint. Third, we propose a simple ensembling technique to improve accuracy further without sacrificing efficiency. These three progressive techniques allow MoViNets to achieve state-of-the-art accuracy and efficiency on the Kinetics, Moments in Time, and Charades video action recognition datasets. For instance, MoViNet-A5-Stream achieves the same accuracy as X3D-XL on Kinetics 600 while requiring 80% fewer FLOPs and 65% less memory. Code will be made available at https://github.com/tensorflow/models/tree/master/official/vision.

accuracy, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2103.11511

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Federated Visual Classification with Real-World Data Distribution

Hsu, Tzu-Ming Harry, Qi, Hang, Brown, Matthew

arXiv.org Machine LearningMar-18-2020

Federated Learning enables visual models to be trained on-device, bringing advantages for user privacy (data need never leave the device), but challenges in terms of data diversity and quality. Whilst typical models in the datacenter are trained using data that are independent and identically distributed (IID), data at source are typically far from IID. Furthermore, differing quantities of data are typically available at each device (imbalance). In this work, we characterize the effect these real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm. To do so, we introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits that simulate real-world edge learning scenarios. We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training.

accuracy, artificial intelligence, neural network, (16 more...)

arXiv.org Machine Learning

2003.08082

Genre: Research Report (1.00)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

Hsu, Tzu-Ming Harry, Qi, Hang, Brown, Matthew

arXiv.org Machine LearningSep-13-2019

Federated Learning enables visual models to be trained in a privacy-preserving way using real-world data from mobile devices. Given their distributed nature, the statistics of the data across these devices is likely to differ significantly. In this work, we look at the effect such non-identical data distributions has on visual classification via Federated Learning. We propose a way to synthesize datasets with a continuous range of identicalness and provide performance measures for the Federated Averaging algorithm. We show that performance degrades as distributions differ more, and propose a mitigation strategy via server momentum. Experiments on CIFAR-10 demonstrate improved classification performance over a range of non-identicalness, with classification accuracy improved from 30.1% to 76.9% in the most skewed settings.

accuracy, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1909.06335

Country: North America > United States (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Frame-Recurrent Video Super-Resolution

Sajjadi, Mehdi S. M., Vemulapalli, Raviteja, Brown, Matthew

arXiv.org Machine LearningMar-25-2018

Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames to generate high-quality images. Current state-of-the-art methods process a batch of LR frames to generate a single high-resolution (HR) frame and run this scheme in a sliding window fashion over the entire video, effectively treating the problem as a large number of separate multi-frame super-resolution tasks. This approach has two main weaknesses: 1) Each input frame is processed and warped multiple times, increasing the computational cost, and 2) each output frame is estimated independently conditioned on the input frames, limiting the system's ability to produce temporally consistent results. In this work, we propose an end-to-end trainable frame-recurrent video super-resolution framework that uses the previously inferred HR estimate to super-resolve the subsequent frame. This naturally encourages temporally consistent results and reduces the computational cost by warping only one image in each step. Furthermore, due to its recurrent nature, the proposed method has the ability to assimilate a large number of previous frames without increased computational demands. Extensive evaluations and comparisons with previous methods validate the strengths of our approach and demonstrate that the proposed framework is able to significantly outperform the current state of the art.

deep learning, neural network, video super-resolution, (20 more...)

arXiv.org Machine Learning

1801.0459

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

One Size Does Not Fit All: A Game-Theoretic Approach for Dynamically and Effectively Screening for Threats

Brown, Matthew (University of Southern California) | Sinha, Arunesh (University of Southern California) | Schlenker, Aaron (University of Southern California) | Tambe, Milind (University of Southern California)

AAAI ConferencesApr-19-2016

An effective way of preventing attacks in secure areas is to screen for threats (people, objects) before entry, e.g., screening of airport passengers. However, screening every entity at the same level may be both ineffective and undesirable. The challenge then is to find a dynamic approach for randomized screening, allowing for more effective use of limited screening resources, leading to improved security. We address this challenge with the following contributions: (1) a threat screening game (TSG) model for general screening domains; (2) an NP-hardness proof for computing the optimal strategy of TSGs; (3) a scheme for decomposing TSGs into subgames to improve scalability; (4) a novel algorithm that exploits a compact game representation to efficiently solve TSGs, providing the optimal solution under certain conditions; and (5) an empirical comparison of our proposed algorithm against the current state-of-the-art optimal approach for large-scale game-theoretic resource allocation problems.

air transportation, constraint, game theory, (22 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Transportation > Passenger (0.69)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Maximizing Flow as a Metacontrol in Angband

Mariusdottir, Thorey Maria (University of Alberta) | Bulitko, Vadim (University of Alberta) | Brown, Matthew (University of Alberta)

AAAI ConferencesNov-1-2015

Flow is a psychological state that is reported to improve people’s performance. Flow can emerge when the person’s skills and the challenges of their activity match. This paper applies this concept to artificial intelligence agents. We equip a decision-making agent with a metacontrol policy that guides the agent to activities where the agent’s skills match the activity difficulty. Consequently, we expect the agent’s performance to improve. We implement and evaluate this approach in the role-playing game of Angband.

agent, artificial intelligence, computer game, (19 more...)

AAAI Conferences

Eleventh Artificial Intelligence and Interactive Digital Entertainment Conference

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

STREETS: Game-Theoretic Traffic Patrolling with Exploration and Exploitation

Brown, Matthew (University of Southern California) | Saisubramanian, Sandhya (Singapore Management University) | Varakantham, Pradeep (Singapore Management University) | Tambe, Milind (University of Southern California)

AAAI ConferencesJul-14-2014

To dissuade reckless driving and mitigate accidents, cities deploy resources to patrol roads. In this paper, we present STREETS, an application developed for the city of Singapore, which models the problem of computing randomized traffic patrol strategies as a defender-attacker Stackelberg game. Previous work on Stackelberg security games has focused extensively on counter-terrorism settings. STREETS moves beyond counter-terrorism and represents the first use of Stackelberg games for traffic patrolling, in the process providing a novel algorithm for solving such games that addresses three major challenges in modeling and scale-up. First, there exists a high degree of unpredictability in travel times through road networks, which we capture using a Markov Decision Process for planning the patrols of the defender (the police) in the game. Second, modeling all possible police patrols and their interactions with a large number of adversaries (drivers) introduces a significant scalability challenge. To address this challenge we apply a compact game representation in a novel fashion combined with adversary and state sampling. Third, patrol strategies must balance exploitation (minimizing violations) with exploration (maximizing omnipresence), a tradeoff we model by solving a bi-objective optimization problem. We present experimental results using real-world traffic data from Singapore. This work is done in collaboration with the Singapore Ministry of Home Affairs and is currently being evaluated by the Singapore Police Force.

exploration and exploitation, game-theoretic traffic patrolling

AAAI Conferences

Twenty-Sixth IAAI Conference

Country: Asia > Singapore (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Energy > Oil & Gas > Upstream (0.40)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.53)

Add feedback