Generative AI
Industry 6.0: New Generation of Industry driven by Generative AI and Swarm of Heterogeneous Robots
Lykov, Artem, Cabrera, Miguel Altamirano, Konenkov, Mikhail, Serpiva, Valerii, Gbagbe, Koffivi Fid`ele, Alabbas, Ali, Fedoseev, Aleksey, Moreno, Luis, Khan, Muhammad Haris, Guo, Ziang, Tsetserukou, Dzmitry
This paper presents the concept of Industry 6.0, introducing the world's first fully automated production system that autonomously handles the entire product design and manufacturing process based on user-provided natural language descriptions. By leveraging generative AI, the system automates critical aspects of production, including product blueprint design, component manufacturing, logistics, and assembly. A heterogeneous swarm of robots, each equipped with individual AI through integration with Large Language Models (LLMs), orchestrates the production process. The robotic system includes manipulator arms, delivery drones, and 3D printers capable of generating assembly blueprints. The system was evaluated using commercial and open-source LLMs, functioning through APIs and local deployment. A user study demonstrated that the system reduces the average production time to 119.10 minutes, significantly outperforming a team of expert human developers, who averaged 528.64 minutes (an improvement factor of 4.4). Furthermore, in the product blueprinting stage, the system surpassed human CAD operators by an unprecedented factor of 47, completing the task in 0.5 minutes compared to 23.5 minutes. This breakthrough represents a major leap towards fully autonomous manufacturing.
Deep Learning for Economists
Deep learning provides powerful methods to impute structured information from large-scale, unstructured text and image datasets. For example, economists might wish to detect the presence of economic activity in satellite images, or to measure the topics or entities mentioned in social media, the congressional record, or firm filings. This review introduces deep neural networks, covering methods such as classifiers, regression models, generative AI, and embedding models. Applications include classification, document digitization, record linkage, and methods for data exploration in massive scale text and image corpora. When suitable methods are used, deep learning models can be cheap to tune and can scale affordably to problems involving millions or billions of data points.. The review is accompanied by a companion website, EconDL, with user-friendly demo notebooks, software resources, and a knowledge base that provides technical details and additional applications.
SFR-RAG: Towards Contextually Faithful LLMs
Nguyen, Xuan-Phi, Pandit, Shrey, Purushwalkam, Senthil, Xu, Austin, Chen, Hailin, Ming, Yifei, Ke, Zixuan, Savarese, Silvio, Xong, Caiming, Joty, Shafiq
Retrieval Augmented Generation (RAG), a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance, has emerged as a pivotal area in generative AI. The LLMs used in RAG applications are required to faithfully and completely comprehend the provided context and users' questions, avoid hallucination, handle unanswerable, counterfactual or otherwise low-quality and irrelevant contexts, perform complex multi-hop reasoning and produce reliable citations. In this paper, we introduce SFR-RAG, a small LLM that is instruction-tuned with an emphasis on context-grounded generation and hallucination minimization. We also present ContextualBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks, such as HotpotQA and TriviaQA, with consistent RAG settings to ensure reproducibility and consistency in model assessments. Experimental results demonstrate that our SFR-RAG-9B model outperforms leading baselines such as Command-R+ (104B) and GPT-4o, achieving state-of-the-art results in 3 out of 7 benchmarks in ContextualBench with significantly fewer parameters. The model is also shown to be resilient to alteration in the contextual information and behave appropriately when relevant context is removed. Additionally, the SFR-RAG model maintains competitive performance in general instruction-following tasks and function-calling capabilities.
GLEAN: Generative Learning for Eliminating Adversarial Noise
Kim, Justin Lyu, Woo, Kyoungwan
In the age of powerful diffusion models such as DALL-E and Stable Diffusion, many in the digital art community have suffered style mimicry attacks due to fine-tuning these models on their works. The ability to mimic an artist's style via text-to-image diffusion models raises serious ethical issues, especially without explicit consent. Glaze, a tool that applies various ranges of perturbations to digital art, has shown significant success in preventing style mimicry attacks, at the cost of artifacts ranging from imperceptible noise to severe quality degradation. The release of Glaze has sparked further discussions regarding the effectiveness of similar protection methods. In this paper, we propose GLEAN- applying I2I generative networks to strip perturbations from Glazed images, evaluating the performance of style mimicry attacks before and after GLEAN on the results of Glaze. GLEAN aims to support and enhance Glaze by highlighting its limitations and encouraging further development.
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data
Gonzรกlez-Bustamante, Bastiรกn
This article benchmarked the ability of OpenAI's GPTs and a number of open-source LLMs to perform annotation tasks on political content. We used a novel protest event dataset comprising more than three million digital interactions and created a gold standard that includes ground-truth labels annotated by human coders about toxicity and incivility on social media. We included in our benchmark Google's Perspective algorithm, which, along with GPTs, was employed throughout their respective APIs while the open-source LLMs were deployed locally. The findings show that Perspective API using a laxer threshold, GPT-4o, and Nous Hermes 2 Mixtral outperform other LLM's zero-shot classification annotations. In addition, Nous Hermes 2 and Mistral OpenOrca, with a smaller number of parameters, are able to perform the task with high performance, being attractive options that could offer good trade-offs between performance, implementing costs and computing time. Ancillary findings using experiments setting different temperature levels show that although GPTs tend to show not only excellent computing time but also overall good levels of reliability, only open-source LLMs ensure full reproducibility in the annotation.
OpenAI is reportedly moving away from its complicated non-profit structure next year
Sam Altman has told OpenAI staff members during their weekly meeting that the company is changing its rather convoluted non-profit corporate structure next year, according to Fortune. The CEO said OpenAI will move away from being controlled by a non-profit entity and will transition into a more traditional for-profit organization. He didn't delve into the specifics of how the company will achieve that goal and what OpenAI's corporate structure will look like exactly. A spokesperson only told Fortune that it remains "focused on building AI that benefits everyone" and that non-profit is "core to [its] mission and will continue to exist." OpenAI started as a non-profit organization in 2015 that relied on money from donors.
Security News This Week: A Creative Trick Makes ChatGPT Spit Out Bomb-Making Instructions
After Apple's product launch event this week, WIRED did a deep dive on the company's new secure server environment, known as Private Cloud Compute, which attempts to replicate in the cloud the security and privacy of processing data locally on users' individual devices. The goal is to minimize possible exposure of data processed for Apple Intelligence, the company's new AI platform. In addition to hearing about PCC from Apple's senior vice president of software engineering, Craig Federighi, WIRED readers also received a first look at content generated by Apple Intelligence's "Image Playground" feature as part of crucial updates on the recent birthday of Federighi's dog Bailey. Turning to privacy protection of a very different kind in another new AI service, WIRED looked at how users of the social media platform X can keep their data from being slurped up by the "unhinged" generative AI tool from xAI known as Grok AI. And in other news about Apple products, researchers developed a technique for using eye tracking to discern passwords and PINs people typed using 3D Apple Vision Pro avatars--a sort of keylogger for mixed reality.
Enhancing LLM Problem Solving with REAP: Reflection, Explicit Problem Deconstruction, and Advanced Prompting
Lingo, Ryan, Arroyo, Martin, Chhajer, Rajeev
Large Language Models (LLMs) have transformed natural language processing, yet improving their problem-solving capabilities, particularly for complex, reasoning-intensive tasks, remains a persistent challenge. This paper introduces the REAP (Reflection, Explicit Problem Deconstruction, and Advanced Prompting) method, an innovative approach within the dynamic context generation framework. REAP guides LLMs through reflection on the query, deconstructing it into manageable components, and generating relevant context to enhance the solution process. We evaluated REAP using a dataset designed to expose LLM limitations, comparing zero-shot prompting with REAP-enhanced prompts across six state-of-the-art models: OpenAI's o1-preview, o1-mini, GPT-4o, GPT-4o-mini, Google's Gemini 1.5 Pro, and Claude 3.5 Sonnet. The results demonstrate notable performance gains, with o1-mini improving by 40.97%, GPT-4o by 66.26%, and GPT-4o-mini by 112.93%. Despite the already strong baseline performance of OpenAI's o1-preview, modest gains were observed. Beyond performance improvements, REAP offers a cost-effective solution; for example, GPT-4o-mini, which is approximately 100 times cheaper than o1-preview, delivered competitive results. REAP also improves the clarity of model outputs, making it easier for humans to understand the reasoning behind the results and simplifying the process of identifying and addressing any issues. These findings demonstrate REAP's potential to greatly improve the capabilities of LLMs, providing both better performance and increased cost-efficiency across a wide range of applications.
Keeping Humans in the Loop: Human-Centered Automated Annotation with Generative AI
Pangakis, Nicholas, Wolken, Samuel
Automated text annotation is a compelling use case for generative large language models (LLMs) in social media research. Recent work suggests that LLMs can achieve strong performance on annotation tasks; however, these studies evaluate LLMs on a small number of tasks and likely suffer from contamination due to a reliance on public benchmark datasets. Here, we test a human-centered framework for responsibly evaluating artificial intelligence tools used in automated annotation. We use GPT-4 to replicate 27 annotation tasks across 11 password-protected datasets from recently published computational social science articles in high-impact journals. For each task, we compare GPT-4 annotations against human-annotated ground-truth labels and against annotations from separate supervised classification models fine-tuned on human-generated labels. Although the quality of LLM labels is generally high, we find significant variation in LLM performance across tasks, even within datasets. Our findings underscore the importance of a human-centered workflow and careful evaluation standards: Automated annotations significantly diverge from human judgment in numerous scenarios, despite various optimization strategies such as prompt tuning. Grounding automated annotation in validation labels generated by humans is essential for responsible evaluation.
The Morning After: OpenAI made its latest model slower, on purpose
OpenAI has unveiled yet another artificial intelligence model. This one is called o1, and the company claims it can perform complex reasoning tasks more effectively than its predecessors. Apparently, o1 was trained to "spend more time thinking through problems before they respond." According to the company: "[the models] learn to refine their thinking process, try different strategies and recognize their mistakes." That more considered response means it's significantly slower at processing prompts than GPT-4o.