Goto

Collaborating Authors

 computer vision dataset


How hard are computer vision datasets? Calibrating dataset difficulty to viewing time

Neural Information Processing Systems

Humans outperform object recognizers despite the fact that models perform well on current datasets, including those explicitly designed to challenge machines with debiased images or distribution shift. This problem persists, in part, because we have no guidance on the absolute difficulty of an image or dataset making it hard to objectively assess progress toward human-level performance, to cover the range of human abilities, and to increase the challenge posed by a dataset. We develop a dataset difficulty metric MVT, Minimum Viewing Time, that addresses these three problems. Subjects view an image that flashes on screen and then classify the object in the image. Images that require brief flashes to recognize are easy, those which require seconds of viewing are hard. We compute the ImageNet and ObjectNet image difficulty distribution, which we find significantly undersamples hard images.


Review for NeurIPS paper: Principal Neighbourhood Aggregation for Graph Nets

Neural Information Processing Systems

Weaknesses: Methodological: The work here places importance on topology/structure. For example, the message scaling is dependent on node degree. Thus this method is apt for applications where the structure is paramount, e.g. one such application mentioned is reasoning about social networks where the degree of the nodes/users provides a lot of information about that node/user. Though useful in many domains, there are domains where GNNs are useful but topology is not important. This is reflected empirically for regular grid graph of the computer vision datasets where PNA does not significantly improve over other methods.


How hard are computer vision datasets? Calibrating dataset difficulty to viewing time

Neural Information Processing Systems

Humans outperform object recognizers despite the fact that models perform well on current datasets, including those explicitly designed to challenge machines with debiased images or distribution shift. This problem persists, in part, because we have no guidance on the absolute difficulty of an image or dataset making it hard to objectively assess progress toward human-level performance, to cover the range of human abilities, and to increase the challenge posed by a dataset. We develop a dataset difficulty metric MVT, Minimum Viewing Time, that addresses these three problems. Subjects view an image that flashes on screen and then classify the object in the image. Images that require brief flashes to recognize are easy, those which require seconds of viewing are hard. We compute the ImageNet and ObjectNet image difficulty distribution, which we find significantly undersamples hard images.


Master Data Integrity to Clean Your Computer Vision Datasets

#artificialintelligence

Data integrity is one of the biggest concerns for companies and engineers in the latest period. The amount of data we have to process and understand only gets more significant, and manually looking at millions of samples is not sustainable. Thus, we need tools that can help us navigate our datasets. This tutorial will present how to clean, visualize and understand Computer Vision datasets, such as videos or images. We will be working on a video of the most precious thing in my house, my cat.


DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations

arXiv.org Artificial Intelligence

With the recent development of Deep Learning applied to Computer Vision, sport video understanding has gained a lot of attention, providing much richer information for both sport consumers and leagues. This paper introduces DeepSportradar-v1, a suite of computer vision tasks, datasets and benchmarks for automated sport understanding. The main purpose of this framework is to close the gap between academic research and real world settings. To this end, the datasets provide high-resolution raw images, camera parameters and high quality annotations. DeepSportradar currently supports four challenging tasks related to basketball: ball 3D localization, camera calibration, player instance segmentation and player re-identification. For each of the four tasks, a detailed description of the dataset, objective, performance metrics, and the proposed baseline method are provided. To encourage further research on advanced methods for sport understanding, a competition is organized as part of the MMSports workshop from the ACM Multimedia 2022 conference, where participants have to develop state-of-the-art methods to solve the above tasks. The four datasets, development kits and baselines are publicly available.


Why Adversarial Image Attacks Are No Joke

#artificialintelligence

Attacking image recognition systems with carefully-crafted adversarial images has been considered an amusing but trivial proof-of-concept over the last five years. However, new research from Australia suggests that the casual use of highly popular image datasets for commercial AI projects could create an enduring new security problem. For a couple of years now, a group of academics at the University of Adelaide have been trying to explain something really important about the future of AI-based image recognition systems. It's something that would be difficult (and very expensive) to fix right now, and which would be unconscionably costly to remedy once the current trends in image recognition research have been fully developed into commercialized and industrialized deployments in 5-10 years' time. Before we get into it, let's have a look at a flower being classified as President Barack Obama, from one of the six videos that the team has published on the project page: In the above image, a facial recognition system that clearly knows how to recognize Barack Obama is fooled into 80% certainty that an anonymized man holding a crafted, printed adversarial image of a flower is also Barack Obama.


Model Rectification via Unknown Unknowns Extraction from Deployment Samples

arXiv.org Artificial Intelligence

Model deficiency that results from incomplete training data is a form of structural blindness that leads to costly errors, oftentimes with high confidence. During the training of classification tasks, underrepresented class-conditional distributions that a given hypothesis space can recognize results in a mismatch between the model and the target space. To mitigate the consequences of this discrepancy, we propose Random Test Sampling and Cross-Validation (RTSCV) as a general algorithmic framework that aims to perform a post-training model rectification at deployment time in a supervised way. RTSCV extracts unknown unknowns (u.u.s), i.e., examples from the class-conditional distributions that a classifier is oblivious to, and works in combination with a diverse family of modern prediction models. RTSCV augments the training set with a sample of the test set (or deployment data) and uses this redefined class layout to discover u.u.s via cross-validation, without relying on active learning or budgeted queries to an oracle. We contribute a theoretical analysis that establishes performance guarantees based on the design bases of modern classifiers. Our experimental evaluation demonstrates RTSCV's effectiveness, using 7 benchmark tabular and computer vision datasets, by reducing a performance gap as large as 41% from the respective pre-rectification models. Last we show that RTSCV consistently outperforms state-of-the-art approaches.


Microsoft partners with Team Gleason to build a computer vision dataset for ALS

#artificialintelligence

Microsoft and Team Gleason, the nonprofit organization founded by NFL player Steve Gleason, today launched Project Insight to create an open dataset of facial imagery of people with amyotrophic lateral sclerosis (ALS). The organizations hope to foster innovation in computer vision and broaden the potential for connectivity and communication for people with accessibility challenges. Microsoft and Team Gleason assert that existing machine learning datasets don't represent the diversity of people with ALS, a condition that affects as many as 30,000 people in the U.S. Project Insight will investigate how to use data and AI with the front-facing camera already present in many assistive devices to predict where a person is looking on a screen. Team Gleason will work with Microsoft's Health Next Enable team to gather images of people with ALS looking at their computer so it can train AI models more inclusively. Participants will be given a brief medical history questionnaire and be prompted through an app to submit images of themselves using their computer.


VisualData: A Search Engine for Computer Vision Datasets

#artificialintelligence

Algorithms, computation and visual data are the three pillars of computer vision (CV). Researchers, institutions and open source communities have produced sophisticated algorithms and open-sourced code; while global tech giants' supercharged cloud platforms provide all the computational power CV researchers require. However, efficiently sourcing visual data -- particularly images with high-quality annotations -- remains a challenge. Building large datasets is a time-consuming and labor-intensive task which challenges entities with limited budgets. There are hundreds of open visual datasets out there, but searching across them and their millions of entries is not a simple task.


How to Develop and Demonstrate Competence With Deep Learning for Computer Vision

#artificialintelligence

Computer vision is perhaps one area that has been most impacted by developments in deep learning. It can be difficult to both develop and to demonstrate competence with deep learning for problems in the field of computer vision. It is not clear how to get started, what the most important techniques are, and the types of problems and projects that can best highlight the value that deep learning can bring to the field. On approach is to systematically develop, and at the same time demonstrate competence with, data handling, modeling techniques, and application domains and present your results in a public portfolio of completed projects. This approach allows you to compound your skills from project to project.