Goto

Collaborating Authors

 Generative AI


The Download: following DeepSeek's lead, and OpenAI's new research agent

MIT Technology Review

When the Chinese firm DeepSeek dropped a large language model called R1 two weeks ago, it sent shock waves through the US tech industry. Not only did R1 match the best of the homegrown competition, it was built for a fraction of the cost--and given away for free. DeepSeek has now suddenly become the company to beat. What exactly did it do to rattle the tech world so fully? And what can we learn from the buzz about what's coming next?


OpenAI's new agent can compile detailed reports on practically any topic

MIT Technology Review

OpenAI claims the tool represents a significant step toward its overarching goal of developing artificial general intelligence (AGI) that matches (or surpasses) human performance. It says that what takes the tool "tens of minutes" would take a human many hours. In response to a single query, such as "Draw me up a competitive analysis between streaming platforms," Deep Research will search the web, analyze the information it encounters, and compile a detailed report that cites its sources. It's also able to draw from files uploaded by users. OpenAI developed Deep Research using the same "chain of thought" reinforcement-learning methods it used to create its o1 multistep reasoning model. But while o1 was designed to focus primarily on mathematics, coding, or other STEM-based tasks, Deep Research can tackle a far broader range of subjects.


SoftBank forms joint venture with OpenAI in enterprise play

The Japan Times

SoftBank Group will spend 3 billion a year to adopt and deploy OpenAI technology throughout its operations, while the two companies have agreed to form a joint venture to market the artificial intelligence as an enterprise solution. "This initiative will not only transform the way SoftBank Group operates but also revolutionize the way companies work in Japan and around the globe," SoftBank CEO Masayoshi Son said in a statement Monday. The technology, which the company describes as an advanced enterprise AI called Cristal intelligence, will be used at all companies under the SoftBank group, including Arm, Line and PayPay, to improve productivity and drive innovation. For instance, SoftBank's telecom unit plans to make more than 100 million workflows automated, the company said in the press release.


ChatGPT's Deep Research tool can create reports from hundreds of online sources

Engadget

Two days after releasing o3-mini to the world, the company made a surprise announcement on Sunday evening, revealing Deep Research. The new feature allows ChatGPT to find, analyze and synthesize hundreds of websites and online sources to create reports "at the level of a research analyst." The chatbot will then take "anywhere from 5 to 30 minutes" to compile an answer, a side panel documenting the agent's progress and citations as it works. "It accomplishes in tens of minutes what would take a human many hours," OpenAI says of the new feature. "Our ultimate aspiration is a model that can uncover and discover new knowledge for itself," said Mark Chen, chief research officer at OpenAI, during the company's reveal livestream.


s1: Simple test-time scaling

arXiv.org Artificial Intelligence

Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model's thinking process or lengthening it by appending "Wait" multiple times to the model's generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps. After supervised finetuning the Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24). Further, scaling s1-32B with budget forcing allows extrapolating beyond its performance without test-time intervention: from 50% to 57% on AIME24. Our model, data, and code are open-source at https://github.com/simplescaling/s1


VILP: Imitation Learning with Latent Video Planning

arXiv.org Artificial Intelligence

In the era of generative AI, integrating video generation models into robotics opens new possibilities for the general-purpose robot agent. This paper introduces imitation learning with latent video planning (VILP). We propose a latent video diffusion model to generate predictive robot videos that adhere to temporal consistency to a good degree. Our method is able to generate highly time-aligned videos from multiple views, which is crucial for robot policy learning. Our video generation model is highly time-efficient. For example, it can generate videos from two distinct perspectives, each consisting of six frames with a resolution of 96x160 pixels, at a rate of 5 Hz. In the experiments, we demonstrate that VILP outperforms the existing video generation robot policy across several metrics: training costs, inference speed, temporal consistency of generated videos, and the performance of the policy. We also compared our method with other imitation learning methods. Our findings indicate that VILP can rely less on extensive high-quality task-specific robot action data while still maintaining robust performance. In addition, VILP possesses robust capabilities in representing multi-modal action distributions. Our paper provides a practical example of how to effectively integrate video generation models into robot policies, potentially offering insights for related fields and directions. For more details, please refer to our open-source repository https://github.com/ZhengtongXu/VILP.


Towards Safer Chatbots: A Framework for Policy Compliance Evaluation of Custom GPTs

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have gained unprecedented prominence, achieving widespread adoption across diverse domains and integrating deeply into society. The capability to fine-tune general-purpose LLMs, such as Generative Pre-trained Transformers (GPT), for specific tasks has facilitated the emergence of numerous Custom GPTs. These tailored models are increasingly made available through dedicated marketplaces, such as OpenAI's GPT Store. However, their black-box nature introduces significant safety and compliance risks. In this work, we present a scalable framework for the automated evaluation of Custom GPTs against OpenAI's usage policies, which define the permissible behaviors of these systems. Our framework integrates three core components: (1) automated discovery and data collection of models from the GPT store, (2) a red-teaming prompt generator tailored to specific policy categories and the characteristics of each target GPT, and (3) an LLM-as-a-judge technique to analyze each prompt-response pair for potential policy violations. We validate our framework with a manually annotated ground truth, and evaluate it through a large-scale study with 782 Custom GPTs across three categories: Romantic, Cybersecurity, and Academic GPTs. Our manual annotation process achieved an F1 score of 0.975 in identifying policy violations, confirming the reliability of the framework's assessments. The results reveal that 58.7% of the analyzed models exhibit indications of non-compliance, exposing weaknesses in the GPT store's review and approval processes. Furthermore, our findings indicate that a model's popularity does not correlate with compliance, and non-compliance issues largely stem from behaviors inherited from base models rather than user-driven customizations. We believe this approach is extendable to other chatbot platforms and policy domains, improving LLM-based systems safety.


Dance recalibration for dance coherency with recurrent convolution block

arXiv.org Artificial Intelligence

With the recent advancements in generative AI such as GAN, Diffusion, and VAE, the use of generative AI for dance generation has seen significant progress and received considerable interest. In this study, We propose R-Lodge, an enhanced version of Lodge. R-Lodge incorporates Recurrent Sequential Representation Learning named Dance Recalibration to original coarse-to-fine long dance generation model. R-Lodge utilizes Dance Recalibration method using $N$ Dance Recalibration Block to address the lack of consistency in the coarse dance representation of the Lodge model. By utilizing this method, each generated dance motion incorporates a bit of information from the previous dance motions. We evaluate R-Lodge on FineDance dataset and the results show that R-Lodge enhances the consistency of the whole generated dance motions.


Single-neuron deep generative model uncovers underlying physics of neuronal activity in Ca imaging data

arXiv.org Artificial Intelligence

Calcium imaging has become a powerful alternative to electrophysiology for studying neuronal activity, offering spatial resolution and the ability to measure large populations of neurons in a minimally invasive manner. This technique has broad applications in neuroscience, neuroengineering, and medicine, enabling researchers to explore the relationship between neuron location and activity. Recent advancements in deep generative models (DGMs) have facilitated the modeling of neuronal population dynamics, uncovering latent representations that provide insights into behavior prediction and neuronal variance. However, these models often rely on spike inference algorithms and primarily focus on population-level dynamics, limiting their applicability for single-neuron analyses. To address this gap, we propose a novel framework for single-neuron representation learning using autoregressive variational autoencoders (AVAEs). Our approach embeds individual neurons' spatiotemporal signals into a reduced-dimensional space without the need for spike inference algorithms. The AVAE excels over traditional linear methods by generating more informative and discriminative latent representations, improving tasks such as visualization, clustering, and the understanding of neuronal activity. Additionally, the reconstruction performance of the AVAE outperforms the state of the art, demonstrating its ability to accurately recover the original fluorescence signal from the learned representation. Using realistic simulations, we show that our model captures underlying physical properties and connectivity patterns, enabling it to distinguish between different firing and connectivity types. These findings position the AVAE as a versatile and powerful tool for advancing single-neuron analysis and lays the groundwork for future integration of multimodal single-cell datasets in neuroscience.


Standardizing Intelligence: Aligning Generative AI for Regulatory and Operational Compliance

arXiv.org Artificial Intelligence

Technical standards, or simply standards, are established documented guidelines and rules that facilitate the interoperability, quality, and accuracy of systems and processes. In recent years, we have witnessed an emerging paradigm shift where the adoption of generative AI (GenAI) models has increased tremendously, spreading implementation interests across standard-driven industries, including engineering, legal, healthcare, and education. In this paper, we assess the criticality levels of different standards across domains and sectors and complement them by grading the current compliance capabilities of state-of-the-art GenAI models. To support the discussion, we outline possible challenges and opportunities with integrating GenAI for standard compliance tasks while also providing actionable recommendations for entities involved with developing and using standards. Overall, we argue that aligning GenAI with standards through computational methods can help strengthen regulatory and operational compliance. We anticipate this area of research will play a central role in the management, oversight, and trustworthiness of larger, more powerful GenAI-based systems in the near future.