Generative AI
OpenAI published more of Elon Musk's emails if that's something you want to read
OpenAI published receipts, in the form of a long timeline of emails, texts and legal filings, illustrating that Elon Musk's injunction to prevent OpenAI from converting into a for-profit company runs counter to what he wanted in 2017. Essentially, OpenAI is providing even more evidence to the fact that its former co-founder wanted the AI startup to become a for-profit company and make him CEO. You should read the whole blog to get all of the details (and get a sense for how billionaires email) but the gist is that in 2017, Musk and OpenAI came to an understanding that the then non-profit needed to become a for-profit to "advance its mission" and seemingly capitalize on the public interest earned from its AI beating professional Dota 2 players in one-on-one matches. According to OpenAI, Musk proposed a new board structure where he "would unequivocally have initial control of the company," which OpenAI was opposed to. That led to the disagreements between Musk and OpenAI leadership, and him ultimately leaving the nonprofit's board in 2018.
The Guardian view on AI's power, limits, and risks: it may require rethinking the technology
More than 300 million people use OpenAI's ChatGPT each week, a testament to the technology's appeal. This month, the company unveiled a "pro mode" for its new "o1" AI system, offering human-level reasoning -- for 10 times the current 20 monthly subscription fee. One of its advanced behaviours appears to be self-preservation. In testing, when the system was led to believe it would be shut down, it attempted to disable an oversight mechanism. When "o1" found memos about its replacement, it tried copying itself and overwriting its core code.
Envisioning National Resources for Artificial Intelligence Research: NSF Workshop Report
Workshop Goals This workshop aimed to identify initial challenges and opportunities for national resources for AI research (e.g., compute, data, models, etc.) and to facilitate planning for the envisioned National AI Research Resource (NAIRR). Participants included AI and cyberinfrastructure (CI) experts. Significant Findings 1. AI researchers confront unprecedented scale that goes well beyond generative AI 2. National investments in AI research resources have been insufficient 3. The suboptimal usability of current resources is compromising AI investigation topics 4. The cadence and intensity of AI conference publications is unlike other research areas 5. Better practices for managing local resources are needed 6. Access to AI research resources is very uneven for different institutions 7. There is an opportunity for greater alignment between CI and AI efforts 8. AI research needs warrant unique approaches to CI and to national shared resources Critical Needs Participants identified ten prototypical AI workflows in two major areas with an immediate need for large-scale resources.
Benchmarking large language models for materials synthesis: the case of atomic layer deposition
Yanguas-Gil, Angel, Dearing, Matthew T., Elam, Jeffrey W., Jones, Jessica C., Kim, Sungjoon, Mohammad, Adnan, Nguyen, Chi Thang, Sengupta, Bratin
In this work we introduce an open-ended question benchmark, ALDbench, to evaluate the performance of large language models (LLMs) in materials synthesis, and in particular in the field of atomic layer deposition, a thin film growth technique used in energy applications and microelectronics. Our benchmark comprises questions with a level of difficulty ranging from graduate level to domain expert current with the state of the art in the field. Human experts reviewed the questions along the criteria of difficulty and specificity, and the model responses along four different criteria: overall quality, specificity, relevance, and accuracy. We ran this benchmark on an instance of OpenAI's GPT-4o. The responses from the model received a composite quality score of 3.7 on a 1 to 5 scale, consistent with a passing grade. However, 36% of the questions received at least one below average score. An in-depth analysis of the responses identified at least five instances of suspected hallucination. Finally, we observed statistically significant correlations between the difficulty of the question and the quality of the response, the difficulty of the question and the relevance of the response, and the specificity of the question and the accuracy of the response as graded by the human experts. This emphasizes the need to evaluate LLMs across multiple criteria beyond difficulty or accuracy.
Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT
Thomas, Danielle R., Borchers, Conrad, Kakarla, Sanjit, Lin, Jionghao, Bhushan, Shambhavi, Guo, Boyuan, Gatz, Erin, Koedinger, Kenneth R.
The role of multiple-choice questions (MCQs) as effective learning tools has been debated in past research. While MCQs are widely used due to their ease in grading, open response questions are increasingly used for instruction, given advances in large language models (LLMs) for automated grading. This study evaluates MCQs effectiveness relative to open-response questions, both individually and in combination, on learning. These activities are embedded within six tutor lessons on advocacy. Using a posttest-only randomized control design, we compare the performance of 234 tutors (790 lesson completions) across three conditions: MCQ only, open response only, and a combination of both. We find no significant learning differences across conditions at posttest, but tutors in the MCQ condition took significantly less time to complete instruction. These findings suggest that MCQs are as effective, and more efficient, than open response tasks for learning when practice time is limited. To further enhance efficiency, we autograded open responses using GPT-4o and GPT-4-turbo. GPT models demonstrate proficiency for purposes of low-stakes assessment, though further research is needed for broader use. This study contributes a dataset of lesson log data, human annotation rubrics, and LLM prompts to promote transparency and reproducibility.
NetOrchLLM: Mastering Wireless Network Orchestration with Large Language Models
Abdallah, Asmaa, Albaseer, Abdullatif, Celik, Abdulkadir, Abdallah, Mohamed, Eltawil, Ahmed M.
The transition to 6G networks promises unprecedented advancements in wireless communication, with increased data rates, ultra-low latency, and enhanced capacity. However, the complexity of managing and optimizing these next-generation networks presents significant challenges. The advent of large language models (LLMs) has revolutionized various domains by leveraging their sophisticated natural language understanding capabilities. However, the practical application of LLMs in wireless network orchestration and management remains largely unexplored. Existing literature predominantly offers visionary perspectives without concrete implementations, leaving a significant gap in the field. To address this gap, this paper presents NETORCHLLM, a wireless NETwork ORCHestrator LLM framework that uses LLMs to seamlessly orchestrate diverse wireless-specific models from wireless communication communities using their language understanding and generation capabilities. A comprehensive framework is introduced, demonstrating the practical viability of our approach and showcasing how LLMs can be effectively harnessed to optimize dense network operations, manage dynamic environments, and improve overall network performance. NETORCHLLM bridges the theoretical aspirations of prior research with practical, actionable solutions, paving the way for future advancements in integrating generative AI technologies within the wireless communications sector.
DNN Task Assignment in UAV Networks: A Generative AI Enhanced Multi-Agent Reinforcement Learning Approach
Tang, Xin, Chen, Qian, Weng, Wenjie, Liao, Binhan, Wang, Jiacheng, Cao, Xianbin, Li, Xiaohuan
Unmanned Aerial Vehicles (UAVs) possess high mobility and flexible deployment capabilities, prompting the development of UAVs for various application scenarios within the Internet of Things (IoT). The unique capabilities of UAVs give rise to increasingly critical and complex tasks in uncertain and potentially harsh environments. The substantial amount of data generated from these applications necessitates processing and analysis through deep neural networks (DNNs). However, UAVs encounter challenges due to their limited computing resources when managing DNN models. This paper presents a joint approach that combines multiple-agent reinforcement learning (MARL) and generative diffusion models (GDM) for assigning DNN tasks to a UAV swarm, aimed at reducing latency from task capture to result output. To address these challenges, we first consider the task size of the target area to be inspected and the shortest flying path as optimization constraints, employing a greedy algorithm to resolve the subproblem with a focus on minimizing the UAV's flying path and the overall system cost. In the second stage, we introduce a novel DNN task assignment algorithm, termed GDM-MADDPG, which utilizes the reverse denoising process of GDM to replace the actor network in multi-agent deep deterministic policy gradient (MADDPG). This approach generates specific DNN task assignment actions based on agents' observations in a dynamic environment. Simulation results indicate that our algorithm performs favorably compared to benchmarks in terms of path planning, Age of Information (AoI), energy consumption, and task load balancing.
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
Li, Yucheng, Jiang, Huiqiang, Wu, Qianhui, Luo, Xufang, Ahn, Surin, Zhang, Chengruidong, Abdi, Amir H., Li, Dongsheng, Gao, Jianfeng, Yang, Yuqing, Qiu, Lili
Long-context LLMs have enabled numerous downstream applications but also introduced significant challenges related to computational and memory efficiency. To address these challenges, optimizations for long-context inference have been developed, centered around the KV cache. However, existing benchmarks often evaluate in single-request, neglecting the full lifecycle of the KV cache in real-world use. This oversight is particularly critical, as KV cache reuse has become widely adopted in LLMs inference frameworks, such as vLLM and SGLang, as well as by LLM providers, including OpenAI, Microsoft, Google, and Anthropic. To address this gap, we introduce SCBench(SharedContextBench), a comprehensive benchmark for evaluating long-context methods from a KV cachecentric perspective: 1) KV cache generation, 2) KV cache compression, 3) KV cache retrieval, 4) KV cache loading. Specifically, SCBench uses test examples with shared context, ranging 12 tasks with two shared context modes, covering four categories of long-context capabilities: string retrieval, semantic retrieval, global information, and multi-task. With it, we provide an extensive KV cache-centric analysis of eight categories long-context solutions, including Gated Linear RNNs, Mamba-Attention hybrids, and efficient methods such as sparse attention, KV cache dropping, quantization, retrieval, loading, and prompt compression. The evaluation is conducted on 8 long-context LLMs. Our findings show that sub-O(n) memory methods suffer in multi-turn scenarios, while sparse encoding with O(n) memory and sub-O(n^2) pre-filling computation perform robustly. Dynamic sparsity yields more expressive KV caches than static patterns, and layer-level sparsity in hybrid architectures reduces memory usage with strong performance. Additionally, we identify attention distribution shift issues in long-generation scenarios. https://aka.ms/SCBench.
A Generative AI-driven Metadata Modelling Approach
Since decades, the modelling of metadata has been core to the functioning of any academic library. Its importance has only enhanced with the increasing pervasiveness of Generative Artificial Intelligence (AI)-driven information activities and services which constitute a library's outreach. However, with the rising importance of metadata, there arose several outstanding problems with the process of designing a library metadata model impacting its reusability, crosswalk and interoperability with other metadata models. This paper posits that the above problems stem from an underlying thesis that there should only be a few core metadata models which would be necessary and sufficient for any information service using them, irrespective of the heterogeneity of intra-domain or inter-domain settings. To that end, this paper advances a contrary view of the above thesis and substantiates its argument in three key steps. First, it introduces a novel way of thinking about a library metadata model as an ontology-driven composition of five functionally interlinked representation levels from perception to its intensional definition via properties. Second, it introduces the representational manifoldness implicit in each of the five levels which cumulatively contributes to a conceptually entangled library metadata model. Finally, and most importantly, it proposes a Generative AI-driven Human-Large Language Model (LLM) collaboration based metadata modelling approach to disentangle the entanglement inherent in each representation level leading to the generation of a conceptually disentangled metadata model. Throughout the paper, the arguments are exemplified by motivating scenarios and examples from representative libraries handling cancer information.
Which AI Companies Are the Safest--and Least Safe?
As companies race to build more powerful AI, safety measures are being left behind. A report published Wednesday takes a closer look at how companies including OpenAI and Google DeepMind are grappling with the potential harms of their technology. It paints a worrying picture: flagship models from all the developers in the report were found to have vulnerabilities, and some companies have taken steps to enhance safety, others lag dangerously behind. The report was published by the Future of Life Institute, a nonprofit that aims to reduce global catastrophic risks. The organization's 2023 open letter calling for a pause on large-scale AI model training drew unprecedented support from 30,000 signatories, including some of technology's most prominent voices.