Goto

Collaborating Authors

 Generative AI


AI Robots and Humanoid AI: Review, Perspectives and Directions

arXiv.org Artificial Intelligence

In the approximately century-long journey of robotics, humanoid robots made their debut around six decades ago. The rapid advancements in generative AI, large language models (LLMs), and large multimodal models (LMMs) have reignited interest in humanoids, steering them towards real-time, interactive, and multimodal designs and applications. This resurgence unveils boundless opportunities for AI robotics and novel applications, paving the way for automated, real-time and humane interactions with humanoid advisers, educators, medical professionals, caregivers, and receptionists. However, while current humanoid robots boast human-like appearances, they have yet to embody true humaneness, remaining distant from achieving human-like intelligence. In our comprehensive review, we delve into the intricate landscape of AI robotics and AI humanoid robots in particular, exploring the challenges, perspectives and directions in transitioning from human-looking to humane humanoids and fostering human-like robotics. This endeavour synergizes the advancements in LLMs, LMMs, generative AI, and human-level AI with humanoid robotics, omniverse, and decentralized AI, ushering in the era of AI humanoids and humanoid AI.


Can AI Outperform Human Experts in Creating Social Media Creatives?

arXiv.org Artificial Intelligence

Artificial Intelligence has outperformed human experts in functional tasks such as chess and baduk. How about creative tasks? This paper evaluates AI's capability in the creative domain compared to human experts, which little research has been conducted so far. We propose a novel Prompt-for-Prompt to generate social media creatives via prompt augmentation by Large Language Models. We take the most popular Instagram posts (with the biggest number of like clicks) in top brands' Instagram accounts to create social media creatives. We give GPT 4 several prompt instructions with text descriptions to generate the most effective prompts for cutting-edge text-to-image generators: Midjourney, DALL E 3, and Stable Diffusion. LLM-augmented prompts can boost AI's abilities by adding objectives, engagement strategy, lighting and brand consistency for social media image creation. We conduct an extensive human evaluation experiment, and find that AI excels human experts, and Midjourney is better than the other text-to-image generators. Surprisingly, unlike conventional wisdom in the social media industry, prompt instruction including eye-catching shows much poorer performance than those including natural. Regarding the type of creatives, AI improves creatives with animals or products but less with real people. Also, AI improves creatives with short text descriptions more than with long text descriptions, because there is more room for AI to augment prompts with shorter descriptions.


RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have showcased remarkable capabilities across various tasks in different domains. However, the emergence of biases and the potential for generating harmful content in LLMs, particularly under malicious inputs, pose significant challenges. Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently and effectively moderate harmful and unsafe inputs and outputs for LLMs. By employing a multi-faceted approach that includes energy-based training data augmentation through Langevin dynamics, optimizing a safe suffix for inputs via minimax optimization, and integrating a fusion-based model combining robust KNN with LLMs based on our data augmentation, RigorLLM offers a robust solution to harmful content moderation. Our experimental evaluations demonstrate that RigorLLM not only outperforms existing baselines like OpenAI API and Perspective API in detecting harmful content but also exhibits unparalleled resilience to jailbreaking attacks. The innovative use of constrained optimization and a fusion-based guardrail approach represents a significant step forward in developing more secure and reliable LLMs, setting a new standard for content moderation frameworks in the face of evolving digital threats.


Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

arXiv.org Artificial Intelligence

The rapid advancement of Generative Artificial Intelligence (GenAI) across diverse sectors raises significant environmental concerns, notably the carbon emissions from their cloud and high performance computing (HPC) infrastructure. This paper presents Sprout, an innovative framework designed to address these concerns by reducing the carbon footprint of generative Large Language Model (LLM) inference services. Sprout leverages the innovative concept of "generation directives" to guide the autoregressive generation process, thereby enhancing carbon efficiency. Our proposed method meticulously balances the need for ecological sustainability with the demand for high-quality generation outcomes. Employing a directive optimizer for the strategic assignment of generation directives to user prompts and an original offline quality evaluator, Sprout demonstrates a significant reduction in carbon emissions by over 40% in real-world evaluations using the Llama2 LLM and global electricity grid data. This research marks a critical step toward aligning AI technology with sustainable practices, highlighting the potential for mitigating environmental impacts in the rapidly expanding domain of generative artificial intelligence.


Automated data processing and feature engineering for deep learning and big data applications: a survey

arXiv.org Artificial Intelligence

Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data. This approach has achieved impressive results and has contributed significantly to the progress of AI, particularly in the sphere of supervised deep learning. It has also simplified the design of machine learning systems as the learning process is highly automated. However, not all data processing tasks in conventional deep learning pipelines have been automated. In most cases data has to be manually collected, preprocessed and further extended through data augmentation before they can be effective for training. Recently, special techniques for automating these tasks have emerged. The automation of data processing tasks is driven by the need to utilize large volumes of complex, heterogeneous data for machine learning and big data applications. Today, end-to-end automated data processing systems based on automated machine learning (AutoML) techniques are capable of taking raw data and transforming them into useful features for Big Data tasks by automating all intermediate processing stages. In this work, we present a thorough review of approaches for automating data processing tasks in deep learning pipelines, including automated data preprocessing--e.g., data cleaning, labeling, missing data imputation, and categorical data encoding--as well as data augmentation (including synthetic data generation using generative AI methods) and feature engineering--specifically, automated feature extraction, feature construction and feature selection. In addition to automating specific data processing tasks, we discuss the use of AutoML methods and tools to simultaneously optimize all stages of the machine learning pipeline.


Diffusion Model for Data-Driven Black-Box Optimization

arXiv.org Artificial Intelligence

Generative AI has redefined artificial intelligence, enabling the creation of innovative content and customized solutions that drive business practices into a new era of efficiency and creativity. In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables. Consider the practical scenario where one wants to optimize some structured design in a high-dimensional space, based on massive unlabeled data (representing design variables) and a small labeled dataset. We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons. The goal is to generate new designs that are near-optimal and preserve the designed latent structures. Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models for modeling complex distributions. In particular, we propose a reward-directed conditional diffusion model, to be trained on the mixed data, for sampling a near-optimal solution conditioned on high predicted rewards. Theoretically, we establish sub-optimality error bounds for the generated designs. The sub-optimality gap nearly matches the optimal guarantee in off-policy bandits, demonstrating the efficiency of reward-directed diffusion models for black-box optimization. Moreover, when the data admits a low-dimensional latent subspace structure, our model efficiently generates high-fidelity designs that closely respect the latent structure. We provide empirical experiments validating our model in decision-making and content-creation tasks.


Elon Musk Just Added a Wrinkle to the AI Race

The Atlantic - Technology

Yesterday afternoon, Elon Musk fired the latest shot in his feud with OpenAI: His new AI venture, xAI, now allows anyone to download and use the computer code for its flagship software. No fees, no restrictions, just Grok, a large language model that Musk has positioned against OpenAI's GPT-4, the model powering the most advanced version of ChatGPT. Sharing Grok's code is a thinly veiled provocation. Musk was one of OpenAI's original backers. He left in 2018 and recently sued for breach of contract, arguing that the start-up and its CEO, Sam Altman, have betrayed the organization's founding principles in pursuit of profit, transforming a utopian vision of technology that "benefits all of humanity" into yet another opaque corporation.


The Next Tech Backlash Will Be About Hygiene

TIME - Tech

For centuries it was biology that made humans sick. Today, it is often stress. So argues Dr Gabor Maté about the unrecognized toll that "normal" modern life has on your mental and physical health. Dr. Maté's research, which struck a chord in 2023, invites reflection on the roll out of generative AI into daily life in 2024. As half of British teens report feeling addicted to social media, and as the U.S. surgeon general offers a rare caution against its health risks, the infusion of generative AI into social media appears to threaten our basic hygiene, meaning "the conditions or practices conducive to maintaining health and preventing disease."


Of course Apple wants to bring Google's Gemini AI to iPhones

Engadget

Apple is reportedly in talks with Google to integrate its Gemini AI in iPhones, Bloomberg reports, a move that should help both companies compete with OpenAI and its (heavily invested) partner Microsoft. While it might seem like an admission that Apple is lagging behind on AI, the partnership fits if you think of generative AI models as an evolution of web searching, something Google already provides to all of Apple's devices. According to the report, Gemini could be the cloud-based generative AI engine for Siri and other iPhone apps, while Apple's models could be woven into the upcoming iOS 18 for on-device AI tasks. Bloomberg notes that Apple has also had discussions with OpenAI about using its own models, and it could still end up partnering with another AI outfit, like Anthropic. Apple could conceivably even work with multiple partners until its own generative models are up to snuff.


Using AI to spot edible mushrooms could kill you

Washington Post - Technology News

Despite the risks, budding foragers seem to increasingly turn to apps for help identifying mushroom species. According to Google Trends, three of the five top searches related to "mushroom identification" mention apps or software. A search for "mushroom" on OpenAI's GPT Store -- where users find specialized chatbots -- immediately surfaces suggestions such as Mushroom Guide, which claims to identify mushrooms from pictures and tell whether they're edible. On the Apple or Google apps stores you'll find dozens of apps claiming to identify mushrooms, some with "AI" in the names or descriptions.