specifier
SupplementaryMaterialfor HandMeThat: Human-RobotCommunication inPhysicalandSocialEnvironments
In Section B, we summarize the statistics of the dataset. A.1 ObjectSpace Recall that HandMeThat uses an object-centric representation for states. Object hierarchy.HandMeThat classifies all categories into 5classes: location, receptacle, food, tool,andthing. Each class (except for"location") iscomposed ofmultiple subclasses, and each subclass contains several object categories. Intotal, there are155 object categories.
Supplementary Material for HandMeThat: Human-Robot Communication in Physical and Social Environments Y anming Wan
In Section A, we provide the detailed information for HandMeThat data generation and its textual interface. In Section B, we summarize the statistics of the dataset. Recall that HandMeThat uses an object-centric representation for states. "Location" consists of all non-movable entities. Each class (except for "location") is composed of multiple subclasses, and each subclass contains In total, there are 155 object categories. Each object category is also associated with several attributes.
Exploring Language Patterns of Prompts in Text-to-Image Generation and Their Impact on Visual Diversity
Palmini, Maria-Teresa De Rosa, Cetinic, Eva
Following the initial excitement, Text-to-Image (TTI) models are now being examined more critically. While much of the discourse has focused on biases and stereotypes embedded in large-scale training datasets, the sociotechnical dynamics of user interactions with these models remain underexplored. This study examines the linguistic and semantic choices users make when crafting prompts and how these choices influence the diversity of generated outputs. Analyzing over six million prompts from the Civiverse dataset on the CivitAI platform across seven months, we categorize users into three groups based on their levels of linguistic experimentation: consistent repeaters, occasional repeaters, and non-repeaters. Our findings reveal that as user participation grows over time, prompt language becomes increasingly homogenized through the adoption of popular community tags and descriptors, with repeated prompts comprising 40-50% of submissions. At the same time, semantic similarity and topic preferences remain relatively stable, emphasizing common subjects and surface aesthetics. Using Vendi scores to quantify visual diversity, we demonstrate a clear correlation between lexical similarity in prompts and the visual similarity of generated images, showing that linguistic repetition reinforces less diverse representations. These findings highlight the significant role of user-driven factors in shaping AI-generated imagery, beyond inherent model biases, and underscore the need for tools and practices that encourage greater linguistic and thematic experimentation within TTI systems to foster more inclusive and diverse AI-generated content.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Media (1.00)
- Leisure & Entertainment (1.00)
Abductive Symbolic Solver on Abstraction and Reasoning Corpus
Lim, Mintaek, Lee, Seokki, Abitew, Liyew Woletemaryam, Kim, Sundong
This paper addresses the challenge of enhancing artificial intelligence reasoning capabilities, focusing on logicality within the Abstraction and Reasoning Corpus (ARC). Humans solve such visual reasoning tasks based on their observations and hypotheses, and they can explain their solutions with a proper reason. However, many previous approaches focused only on the grid transition and it is not enough for AI to provide reasonable and human-like solutions. By considering the human process of solving visual reasoning tasks, we have concluded that the thinking process is likely the abductive reasoning process. Thus, we propose a novel framework that symbolically represents the observed data into a knowledge graph and extracts core knowledge that can be used for solution generation. This information limits the solution search space and helps provide a reasonable mid-process. Our approach holds promise for improving AI performance on ARC tasks by effectively narrowing the solution space and providing logical solutions grounded in core knowledge extraction.
Civiverse: A Dataset for Analyzing User Engagement with Open-Source Text-to-Image Models
Palmini, Maria-Teresa De Rosa, Wagner, Laura, Cetinic, Eva
Text-to-image (TTI) systems, particularly those utilizing open-source frameworks, have become increasingly prevalent in the production of Artificial Intelligence (AI)-generated visuals. While existing literature has explored various problematic aspects of TTI technologies, such as bias in generated content, intellectual property concerns, and the reinforcement of harmful stereotypes, open-source TTI frameworks have not yet been systematically examined from a cultural perspective. This study addresses this gap by analyzing the CivitAI platform, a leading open-source platform dedicated to TTI AI. We introduce the Civiverse prompt dataset, encompassing millions of images and related metadata. We focus on prompt analysis, specifically examining the semantic characteristics of text prompts, as it is crucial for addressing societal issues related to generative technologies. This analysis provides insights into user intentions, preferences, and behaviors, which in turn shape the outputs of these models. Our findings reveal a predominant preference for generating explicit content, along with a focus on homogenization of semantic content. These insights underscore the need for further research into the perpetuation of misogyny, harmful stereotypes, and the uniformity of visual culture within these models.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia (0.04)
- Research Report > New Finding (0.48)
- Instructional Material > Online (0.40)
- Instructional Material > Course Syllabus & Notes (0.40)
- Law (0.88)
- Media (0.68)
- Leisure & Entertainment (0.68)
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
No Longer Trending on Artstation: Prompt Analysis of Generative AI Art
McCormack, Jon, Llano, Maria Teresa, Krol, Stephen James, Rajcic, Nina
Image generation using generative AI is rapidly becoming a major new source of visual media, with billions of AI generated images created using diffusion models such as Stable Diffusion and Midjourney over the last few years. In this paper we collect and analyse over 3 million prompts and the images they generate. Using natural language processing, topic analysis and visualisation methods we aim to understand collectively how people are using text prompts, the impact of these systems on artists, and more broadly on the visual cultures they promote. Our study shows that prompting focuses largely on surface aesthetics, reinforcing cultural norms, popular conventional representations and imagery. We also find that many users focus on popular topics (such as making colouring books, fantasy art, or Christmas cards), suggesting that the dominant use for the systems analysed is recreational rather than artistic.
- Oceania > Australia (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (7 more...)
Translating Universal Scene Descriptions into Knowledge Graphs for Robotic Environment
Nguyen, Giang Hoang, Bessler, Daniel, Stelter, Simon, Pomarlan, Mihai, Beetz, Michael
Robots performing human-scale manipulation tasks require an extensive amount of knowledge about their surroundings in order to perform their actions competently and human-like. In this work, we investigate the use of virtual reality technology as an implementation for robot environment modeling, and present a technique for translating scene graphs into knowledge bases. To this end, we take advantage of the Universal Scene Description (USD) format which is an emerging standard for the authoring, visualization and simulation of complex environments. We investigate the conversion of USD-based environment models into Knowledge Graph (KG) representations that facilitate semantic querying and integration with additional knowledge sources.
Geometric Models for (Temporally) Attributed Description Logics
Bourgaux, Camille, Ozaki, Ana, Pan, Jeff Z.
In the search for knowledge graph embeddings that could capture ontological knowledge, geometric models of existential rules have been recently introduced. It has been shown that convex geometric regions capture the so-called quasi-chained rules. Attributed description logics (DL) have been defined to bridge the gap between DL languages and knowledge graphs, whose facts often come with various kinds of annotations that may need to be taken into account for reasoning. In particular, temporally attributed DLs are enriched by specific attributes whose semantics allows for some temporal reasoning. Considering that geometric models and (temporally) attributed DLs are promising tools designed for knowledge graphs, this paper investigates their compatibility, focusing on the attributed version of a Horn dialect of the DL-Lite family. We first adapt the definition of geometric models to attributed DLs and show that every satisfiable ontology has a convex geometric model. Our second contribution is a study of the impact of temporal attributes. We show that a temporally attributed DL may not have a convex geometric model in general but we can recover geometric satisfiability by imposing some restrictions on the use of the temporal attributes.
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Europe > Norway > Western Norway > Vestland > Bergen (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Research Report (0.90)
- Overview (0.54)