Generative AI
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development
Chen, Daoyuan, Wang, Haibin, Huang, Yilun, Ge, Ce, Li, Yaliang, Ding, Bolin, Zhou, Jingren
The emergence of large-scale multi-modal generative models has drastically advanced artificial intelligence, introducing unprecedented levels of performance and functionality. However, optimizing these models remains challenging due to historically isolated paths of model-centric and data-centric developments, leading to suboptimal outcomes and inefficient resource utilization. In response, we present a novel sandbox suite tailored for integrated data-model co-development. This sandbox provides a comprehensive experimental platform, enabling rapid iteration and insight-driven refinement of both data and models. Our proposed "Probe-Analyze-Refine" workflow, validated through applications on state-of-theart LLaVA-like and DiT-based models, yields significant performance boosts, such as topping the VBench leaderboard. We also uncover fruitful insights gleaned from exhaustive benchmarks, shedding light on the critical interplay between data quality, diversity, and model behavior. With the hope of fostering deeper understanding and future progress in multi-modal data and generative modeling, our codes, datasets, and models are maintained and accessible at https://github. The advent of multi-modal generative models has revolutionized artificial intelligence, pushing the boundaries of functionality and creativity across various domains (OpenAI, 2024a;b; Wang et al., 2024). Recognizing the pivotal role of training data in shaping model performance, there are fast-growing efforts to curate datasets of larger scales and higher quality (Jakubik et al., 2024). However, the development trajectories of these models and datasets have historically diverged, guided more by intuition than by systematic co-development methodologies. Recent advances in enhancing multi-modal generative models tend to be either model-centric or data-centric, rarely bridging the two aspects cohesively. For example, model-centric methods focus on algorithmic enhancements and architectural innovations under fixed data priors, while data-centric strategies usually concentrate on processing and cleaning datasets independently of specific model training contexts (Qin et al., 2024). Both approaches usually suffer from a lack of systematic guidance and cooperative synergy, relying heavily on heuristic exploration and single-perspective expertise. This fragmented landscape presents a significant barrier to achieving optimal model performance, as the interplay between data characteristics and model capabilities remains largely underexploited. Moreover, the practical implementation of multi-modal generative models is further complicated by infrastructure constraints, escalating computational costs, and the accelerating pace of development cycles (Xu et al., 2024b).
Imitation of human motion achieves natural head movements for humanoid robots in an active-speaker detection task
Ding, Bosong, Kirtay, Murat, Spigler, Giacomo
Head movements are crucial for social human-human interaction. They can transmit important cues (e.g., joint attention, speaker detection) that cannot be achieved with verbal interaction alone. This advantage also holds for human-robot interaction. Even though modeling human motions through generative AI models has become an active research area within robotics in recent years, the use of these methods for producing head movements in human-robot interaction remains underexplored. In this work, we employed a generative AI pipeline to produce human-like head movements for a Nao humanoid robot. In addition, we tested the system on a real-time active-speaker tracking task in a group conversation setting. Overall, the results show that the Nao robot successfully imitates human head movements in a natural manner while actively tracking the speakers during the conversation. Code and data from this study are available at https://github.com/dingdingding60/Humanoids2024HRI
Bringing AI Participation Down to Scale: A Comment on Open AIs Democratic Inputs to AI Project
Moats, David, Ganguly, Chandrima
This commentary piece reviews the recent Open AI Democratic Inputs programme, which funded 10 teams to design procedures for public participation in generative AI. While applauding the technical innovations in these projects, we identify several shared assumptions including the generality of LLMs, extracting abstract values, soliciting solutions not problems and equating participation with democracy. We call instead for AI participation which involves specific communities and use cases and solicits concrete problems to be remedied. We also find it important that these communities have a stake in the outcome, including ownership of data or models.
The Download: how AI affects creativity, and CRISPR babies
Generative AI models have made it simpler and quicker to produce everything from text passages and images to video clips and audio tracks. But while AI's output can certainly seem creative, do these models actually boost human creativity? That's what two researchers set out to explore by studying how people used OpenAI's large language model GPT-4 to write short stories. The model was helpful--but only to an extent. They found that while AI improved the output of less creative writers, it made little difference to the quality of the stories produced by writers who were already creative.
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Mueller, Phillip, Mikelsons, Lars
The synthesis of product design concepts stands at the crux of early-phase development processes for technical products, traditionally posing an intricate interdisciplinary challenge. The application of deep learning methods, particularly Deep Generative Models (DGMs), holds the promise of automating and streamlining manual iterations and therefore introducing heightened levels of innovation and efficiency. However, DGMs have yet to be widely adopted into the synthesis of product design concepts. This paper aims to explore the reasons behind this limited application and derive the requirements for successful integration of these technologies. We systematically analyze DGM-families (VAE, GAN, Diffusion, Transformer, Radiance Field), assessing their strengths, weaknesses, and general applicability for product design conception. Our objective is to provide insights that simplify the decision-making process for engineers, helping them determine which method might be most effective for their specific challenges. Recognizing the rapid evolution of this field, we hope that our analysis contributes to a fundamental understanding and guides practitioners towards the most promising approaches. This work seeks not only to illuminate current challenges but also to propose potential solutions, thereby offering a clear roadmap for leveraging DGMs in the realm of product design conception.
Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias
The interplay between humans and Generative AI (Gen AI) draws an insightful parallel with the dynamic relationship between giraffes and acacias on the African Savannah. Just as giraffes navigate the acacia's thorny defenses to gain nourishment, humans engage with Gen AI, maneuvering through ethical and operational challenges to harness its benefits. This paper explores how, like young giraffes that are still mastering their environment, humans are in the early stages of adapting to and shaping Gen AI. It delves into the strategies humans are developing and refining to help mitigate risks such as bias, misinformation, and privacy breaches, that influence and shape Gen AI's evolution. While the giraffe-acacia analogy aptly frames human-AI relations, it contrasts nature's evolutionary perfection with the inherent flaws of human-made technology and the tendency of humans to misuse it, giving rise to many ethical dilemmas. Through the HHH framework we identify pathways to embed values of helpfulness, honesty, and harmlessness in AI development, fostering safety-aligned agents that resonate with human values. This narrative presents a cautiously optimistic view of human resilience and adaptability, illustrating our capacity to harness technologies and implement safeguards effectively, without succumbing to their perils. It emphasises a symbiotic relationship where humans and AI continually shape each other for mutual benefit.
Conquering images and the basis of transformative action
Our rapid immersion into online life has made us all ill. Through the generation, personalization, and dissemination of enchanting imagery, artificial technologies commodify the minds and hearts of the masses with nauseating precision and scale. Online networks, artificial intelligence (AI), social media, and digital news feeds fine-tune our beliefs and pursuits by establishing narratives that subdivide and polarize our communities and identities. Meanwhile those commanding these technologies conquer the final frontiers of our interior lives, social relations, earth, and cosmos. In the Attention Economy, our agency is restricted and our vitality is depleted for their narcissistic pursuits and pleasures. Generative AI empowers the forces that homogenize and eradicate life, not through some stupid "singularity" event, but through devaluing human creativity, labor, and social life. Using a fractured lens, we will examine how narratives and networks influence us on mental, social, and algorithmic levels. We will discuss how atomizing imagery -- ideals and pursuits that alienate, rather than invigorate the individual -- hijack people's agency to sustain the forces that destroy them. We will discover how empires build digital networks that optimize society and embolden narcissists to enforce social binaries that perpetuate the ceaseless expansion of consumption, exploitation, and hierarchy. Structural hierarchy in the world is reified through hierarchy in our beliefs and thinking. Only by seeing images as images and appreciating the similarity shared by opposing narratives can we facilitate transformative action and break away from the militaristic systems plaguing our lives.
Beyond Generative Artificial Intelligence: Roadmap for Natural Language Generation
Maestre, Marรญa Mirรณ, Martรญnez-Murillo, Ivรกn, Martin, Tania J., Navarro-Colorado, Borja, Ferrรกndez, Antonio, Cueto, Armando Suรกrez, Lloret, Elena
Generative Artificial Intelligence has grown exponentially as a result of Large Language Models (LLMs). This has been possible because of the impressive performance of deep learning methods created within the field of Natural Language Processing (NLP) and its subfield Natural Language Generation (NLG), which is the focus of this paper. Within the growing LLM family are the popular GPT-4, Bard and more specifically, tools such as ChatGPT have become a benchmark for other LLMs when solving most of the tasks involved in NLG research. This scenario poses new questions about the next steps for NLG and how the field can adapt and evolve to deal with new challenges in the era of LLMs. To address this, the present paper conducts a review of a representative sample of surveys recently published in NLG. By doing so, we aim to provide the scientific community with a research roadmap to identify which NLG aspects are still not suitably addressed by LLMs, as well as suggest future lines of research that should be addressed going forward.
US financial watchdog urged to investigate NDAs at OpenAI
OpenAI whistleblowers have urged the US financial watchdog to investigate non-disclosure agreements at the startup after claiming the contracts included restrictions such as requiring employees to seek permission before contacting regulators. Non-disclosure agreements (NDAs) typically bar an employee from sharing company information with outside parties but a group of whistleblowers are arguing that OpenAI's agreements could have led to workers being punished for raising concerns about the company to federal authorities. San Francisco-based OpenAI is the developer of the ChatGPT chatbot and a key player in the artificial intelligence boom, which has been accompanied by expressions of concern from experts about the potential dangerous capabilities of the technology. "Given the well-documented potential risks posed by the irresponsible deployment of AI, we urge the Commissioners to immediately approve an investigation into OpenAI's prior NDAs, and to review current efforts apparently being undertaken by the company to ensure full compliance with SEC rules," the letter to Gary Gensler, the chair of the US Securities and Exchange Commission (SEC), said. The letter from whistleblower representatives was sent on 1 July and published by the Washington Post on Saturday after the news organisation obtained it from the office of the US senator Chuck Grassley.
Melon Fruit Detection and Quality Assessment Using Generative AI-Based Image Data Augmentation
Yoon, Seungri, Cho, Yunseong, Ahn, Tae In
Monitoring and managing the growth and quality of fruits are very important tasks. To effectively train deep learning models like YOLO for real-time fruit detection, high-quality image datasets are essential. However, such datasets are often lacking in agriculture. Generative AI models can help create high-quality images. In this study, we used MidJourney and Firefly tools to generate images of melon greenhouses and post-harvest fruits through text-to-image, pre-harvest image-to-image, and post-harvest image-to-image methods. We evaluated these AIgenerated images using PSNR and SSIM metrics and tested the detection performance of the YOLOv9 model. We also assessed the net quality of real and generated fruits. Our results showed that generative AI could produce images very similar to real ones, especially for post-harvest fruits. The YOLOv9 model detected the generated images well, and the net quality was also measurable. This shows that generative AI can create realistic images useful for fruit detection and quality assessment, indicating its great potential in agriculture. This study highlights the potential of AI-generated images for data augmentation in melon fruit detection and quality assessment and envisions a positive future for generative AI applications in agriculture.