Dataset Summarization by K Principal Concepts

Sep-29-2022–arXiv.org Artificial Intelligence

We propose the new task of K principal concept identification for dataset summarizarion. The objective is to find a set of K concepts that best explain the variation within the dataset. Concepts are high-level human interpretable terms such as "tiger", "kayaking" or "happy". The K concepts are selected from a (potentially long) input list of candidates, which we denote the concept-bank. The concept-bank may be taken from a generic dictionary or constructed by task-specific prior knowledge. An image-language embedding method (e.g. CLIP) is used to map the images and the concept-bank into a shared feature space. To select the K concepts that best explain the data, we formulate our problem as a K-uncapacitated facility location problem. An efficient optimization technique is used to scale the local search algorithm to very large concept-banks. The output of our method is a set of K principal concepts that summarize the dataset. Our approach provides a more explicit summary in comparison to selecting K representative images, which are often ambiguous. As a further application of our method, the K principal concepts can be used to classify the dataset into K groups. Extensive experiments demonstrate the efficacy of our approach.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Sep-29-2022

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Newfoundland and Labrador > Newfoundland (0.04)
- Asia > Middle East
  - Israel > Jerusalem District > Jerusalem (0.04)

Genre:
- Research Report (0.50)

Industry:
- Media > Music (0.68)
- Leisure & Entertainment (0.68)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language (1.00)
    - Machine Learning > Statistical Learning (1.00)
    - Representation & Reasoning
      - Optimization (1.00)
      - Search (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found