Generative AI
OpenAI Threatens Bans as Users Probe Its 'Strawberry' AI Models
OpenAI truly does not want you to know what its latest AI model is "thinking." Since the company launched its "Strawberry" AI model family last week, touting so-called reasoning abilities with o1-preview and o1-mini, OpenAI has been sending out warning emails and threats of bans to any user who tries to probe how the model works. Unlike previous AI models from OpenAI, such as GPT-4o, the company trained o1 specifically to work through a step-by-step problem-solving process before generating an answer. When users ask an "o1" model a question in ChatGPT, users have the option of seeing this chain-of-thought process written out in the ChatGPT interface. However, by design, OpenAI hides the raw chain of thought from users, instead presenting a filtered interpretation created by a second AI model.
Why Sam Altman Is Leaving OpenAI's Safety Committee
OpenAI's CEO Sam Altman is stepping down from the internal committee that the company created to advise its board on "critical safety and security" decisions amid the race to develop ever more powerful artificial intelligence technology. The committee, formed in May, had been evaluating OpenAI's processes and safeguards over a 90-day period. OpenAI published the committee's recommendations following the assessment on Sept. 16. As such, Altman, who, in addition to serving OpenAI's board, oversees the company's business operations in his role as CEO, will no longer serve on the safety committee. In line with the committee's recommendations, OpenAI says the newly independent committee will be chaired by Zico Kolter, Director of the Machine Learning Department at Carnegie Mellon University, who joined OpenAI's board in August.
Here's how Google will start helping you figure out which images are AI generated
Google is trying to be more transparent about whether a piece of content was created or modified using generative AI (GAI) tools. After joining the Coalition for Content Provenance and Authenticity (C2PA) as a steering committee member earlier this year, Google has revealed how it will start implementing the group's digital watermarking standard. Alongside partners including Amazon, Meta, and OpenAI, Google has spent the past several months figuring out how to improve the tech used for watermarking GAI-created or modified content. The company says it helped to develop the latest version of Content Credentials, a technical standard used to protect metadata detailing how an asset was created, as well as information about what has been modified and how. Google says the current version of Content Credentials is more secure and tamperproof due to stricter validation methods.
The Download: OpenAI's latest model, and 4D printing's potential
Last week OpenAI released a new model called o1 (previously referred to under the code name "Strawberry" and, before that, Q*) that blows GPT-4o out of the water. Unlike previous models that are well suited for language tasks like writing and editing, OpenAI o1 is focused on multistep "reasoning," the type of process required for advanced mathematics, coding, or other STEM-based questions. The model is also trained to answer PhD-level questions in subjects ranging from astrophysics to organic chemistry. The bulk of LLM progress until now has been language-driven, but in addition to getting lots of facts wrong, such LLMs have failed to demonstrate the types of skills required to solve important problems in fields like drug discovery, materials science, coding, or physics. OpenAI's o1 is one of the first signs that LLMs might soon become genuinely helpful companions to human researchers in these fields.
OpenAI says the latest ChatGPT can 'think' โ and I have thoughts
We are fast approaching two years of the generative AI revolution, sparked by the November 2022 release of ChatGPT by OpenAI. So far it's been a mixed bag. OpenAI recently announced it had crossed 200 million weekly active users โ nothing to be sniffed at, but it got its first 100 million within two months of release. A recent YouGov study found that the inclusion of AI in a product is as likely to turn off a potential purchaser as much as it is to get them to hand over their cash. Nevertheless, money keeps flowing into the sector, and advances keep coming.
Why OpenAI's new model is such a big deal
I thought OpenAI's GPT-4o, its leading model at the time, would be perfectly suited to help. I asked it to create a short wedding-themed poem, with the constraint that each letter could only appear a certain number of times so we could make sure teams would be able to reproduce it with the provided set of tiles. The model repeatedly insisted that its poem worked within the constraints, even though it didn't. It would correctly count the letters only after the fact, while continuing to deliver poems that didn't fit the prompt. Without the time to meticulously craft the verses by hand, we ditched the poem idea and instead challenged guests to memorize a series of shapes made from colored tiles. However, last week OpenAI released a new model called o1 (previously referred to under the code name "Strawberry" and, before that, Q*) that blows GPT-4o out of the water for this type of purpose.
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Yang, Chao-Han Huck, Park, Taejin, Gong, Yuan, Li, Yuanchao, Chen, Zhehuai, Lin, Yen-Ting, Chen, Chen, Hu, Yuchen, Dhawan, Kunal, ลปelasko, Piotr, Zhang, Chao, Chen, Yun-Nung, Tsao, Yu, Balam, Jagadeesh, Ginsburg, Boris, Siniscalchi, Sabato Marco, Chng, Eng Siong, Bell, Peter, Lai, Catherine, Watanabe, Shinji, Stolcke, Andreas
Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model. To explore new capabilities in language modeling for speech processing, we introduce the generative speech transcription error correction (GenSEC) challenge. This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition. These tasks aim to emulate future LLM-based agents handling voice-based interfaces while remaining accessible to a broad audience by utilizing open pretrained language models or agent-based APIs. We also discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
Constructive Apraxia: An Unexpected Limit of Instructible Vision-Language Models and Analog for Human Cognitive Disorders
Noever, David, Noever, Samantha E. Miller
This study reveals an unexpected parallel between instructible vision-language models (VLMs) and human cognitive disorders, specifically constructive apraxia. We tested 25 state-of-the-art VLMs, including GPT-4 Vision, DALL-E 3, and Midjourney v5, on their ability to generate images of the Ponzo illusion, a task that requires basic spatial reasoning and is often used in clinical assessments of constructive apraxia. Remarkably, 24 out of 25 models failed to correctly render two horizontal lines against a perspective background, mirroring the deficits seen in patients with parietal lobe damage. The models consistently misinterpreted spatial instructions, producing tilted or misaligned lines that followed the perspective of the background rather than remaining horizontal. This behavior is strikingly similar to how apraxia patients struggle to copy or construct simple figures despite intact visual perception and motor skills. Our findings suggest that current VLMs, despite their advanced capabilities in other domains, lack fundamental spatial reasoning abilities akin to those impaired in constructive apraxia. This limitation in AI systems provides a novel computational model for studying spatial cognition deficits and highlights a critical area for improvement in VLM architecture and training methodologies.
Unlocking NACE Classification Embeddings with OpenAI for Enhanced Analysis and Processing
Vidali, Andrea, Jean, Nicola, Pera, Giacomo Le
The Statistical Classification of Economic Activities in the European Community (NACE) is the standard classification system for the categorization of economic and industrial activities within the European Union. This paper proposes a novel approach to transform the NACE classification into low-dimensional embeddings, using state-of-the-art models and dimensionality reduction techniques. The primary challenge is the preservation of the hierarchical structure inherent within the original NACE classification while reducing the number of dimensions. To address this issue, we introduce custom metrics designed to quantify the retention of hierarchical relationships throughout the embedding and reduction processes. The evaluation of these metrics demonstrates the effectiveness of the proposed methodology in retaining the structural information essential for insightful analysis. This approach not only facilitates the visual exploration of economic activity relationships, but also increases the efficacy of downstream tasks, including clustering, classification, integration with other classifications, and others. Through experimental validation, the utility of our proposed framework in preserving hierarchical structures within the NACE classification is showcased, thereby providing a valuable tool for researchers and policymakers to understand and leverage any hierarchical data.
Sparks of Artificial General Intelligence(AGI) in Semiconductor Material Science: Early Explorations into the Next Frontier of Generative AI-Assisted Electron Micrograph Analysis
Srinivas, Sakhinana Sagar, Sannidhi, Geethan, Gangasani, Sreeja, Ravuru, Chidaksh, Runkana, Venkataramana
Characterizing materials with electron micrographs poses significant challenges for automated labeling due to the complex nature of nanomaterial structures. To address this, we introduce a fully automated, end-to-end pipeline that leverages recent advances in Generative AI. It is designed for analyzing and understanding the microstructures of semiconductor materials with effectiveness comparable to that of human experts, contributing to the pursuit of Artificial General Intelligence (AGI) in nanomaterial identification. Our approach utilizes Large MultiModal Models (LMMs) such as GPT-4V, alongside text-to-image models like DALLE-3. We integrate a GPT-4 guided Visual Question Answering (VQA) method to analyze nanomaterial images, generate synthetic nanomaterial images via DALLE-3, and employ in-context learning with few-shot prompting in GPT-4V for accurate nanomaterial identification. Our method surpasses traditional techniques by enhancing the precision of nanomaterial identification and optimizing the process for high-throughput screening.