Goto

Collaborating Authors

 Generative AI


Estimating Committor Functions via Deep Adaptive Sampling on Rare Transition Paths

arXiv.org Machine Learning

The committor functions are central to investigating rare but important events in molecular simulations. It is known that computing the committor function suffers from the curse of dimensionality. Recently, using neural networks to estimate the committor function has gained attention due to its potential for high-dimensional problems. Training neural networks to approximate the committor function needs to sample transition data from straightforward simulations of rare events, which is very inefficient. The scarcity of transition data makes it challenging to approximate the committor function. To address this problem, we propose an efficient framework to generate data points in the transition state region that helps train neural networks to approximate the committor function. We design a Deep Adaptive Sampling method for TRansition paths (DASTR), where deep generative models are employed to generate samples to capture the information of transitions effectively. In particular, we treat a non-negative function in the integrand of the loss functional as an unnormalized probability density function and approximate it with the deep generative model. The new samples from the deep generative model are located in the transition state region and fewer samples are located in the other region. This distribution provides effective samples for approximating the committor function and significantly improves the accuracy. We demonstrate the effectiveness of the proposed method through both simulations and realistic examples.


How Chinese AI Startup DeepSeek Made a Model that Rivals OpenAI

WIRED

On January 20, DeepSeek, a relatively unknown AI research lab from China, released an open source model that's quickly become the talk of the town in Silicon Valley. According to a paper authored by the company, DeepSeek-R1 beats the industry's leading models like OpenAI o1 on several math and reasoning benchmarks. In fact, on many metrics that matter--capability, cost, openness--DeepSeek is giving Western AI giants a run for their money. US export controls have severely curtailed the ability of Chinese tech firms to compete on AI in the Western way--that is, infinitely scaling up by buying more chips and training for a longer period of time. As a result, most Chinese companies have focused on downstream applications rather than building their own models.


Review for NeurIPS paper: Further Analysis of Outlier Detection with Deep Generative Models

Neural Information Processing Systems

Summary and Contributions: ----- Update ----- I have read the author response as well as the other reviews. I agree with some of the concerns raised by the other reviewers, but also do not find them to be significant to question the overall value and insights in this paper. I would still vote for accept, but lower my score to 7. ------------------ This work further analyzes the recently observed issue that Deep Generative Models (DGMs) regularly assign higher likelihood to out-of-distribution (OOD) samples/outliers. Based on the phenomenon that typical sets (regions of largest probability mass where samples likely fall into) must not coincide with density level sets (high-density/likelihood regions) in high dimensions, a novel white noise test for outlier detection is proposed. This test shows a marked improvement in detection performance over previous tests *using the same models* on common benchmarks (CIFAR-10, SVHN, CelebA, TinyImagNet in-/out-of-distribution combinations), thereby suggesting that DGMs are not necessarily uncalibrated, but rather that existing likelihood-based test might be improperly formulated/applied.


Review for NeurIPS paper: Further Analysis of Outlier Detection with Deep Generative Models

Neural Information Processing Systems

The paper investigates out-of-distribution behavior of deep generative models, specifically the counter intuitive results reported in prior work where deep generative models were shown to assign higher likelihood to out-of-distribution inputs. The authors propose a new white noise test (WN test), theoretically motivate the proposed test and show that it outperforms likelihood and likelihood ratios. The reviewers raised concerns about experimental setup (other datasets and models), WN assumption and connections to other related methods such as typicality test. This was a borderline paper. During the discussion, majority of the reviewers agreed that the author rebuttal addresses their major concerns except for R2.


Regulating Multifunctionality

arXiv.org Artificial Intelligence

Forthcoming in Philipp Hacker, Andreas Engel, Sarah Hammer and Brent Mittelstadt (eds) The Oxford Handbook on the Foundations and Regulation of Generative AI (Oxford University Press) Abstract Foundation models and generative artificial intelligence (AI) exacerbate a core regulatory challenge associated with AI: its heterogeneity. By their very nature, foundation models and generative AI can perform multiple functions for their users, thus presenting a vast array of different risks. This multifunctionality means that prescriptive, one-size-fits-all regulation will not be a viable option. Even performance standards and ex post liability--regulatory approaches that usually afford flexibility--are unlikely to be strong candidates for responding to multifunctional AI's risks, given challenges in monitoring and enforcement. Regulators will do well instead to promote proactive risk management on the part of developers and users by using management-based regulation, an approach that has proven effective in other contexts of heterogeneity. Regulators will also need to maintain ongoing vigilance and agility. More than in other contexts, regulators of multifunctional AI will need sufficient resources, top human talent and leadership, and organizational cultures committed to regulatory excellence. Consider one of humanity's most primal of tools: the knife [30]. The knife is not a singular tool; rather, it comes in many different varieties that serve many functions, each of which can generate value for society. Knives are used in the kitchen to prepare delicious meals, and then they are used by diners to consume those same meals. Knives carve objects, cut rope, and open packages. They clear paths through forests and jungles, and they help in harvesting seasonal crops. Knives can be used, of course, to injure or kill people. But in the hands of surgeons, knives are routinely used to save lives. And even though knives take many different forms and are often designed for many different purposes--think of, for example, the many types and sizes of surgical scalpels, woodcarver's chisels, and kitchen implements, among others--knives designed for one purpose also can be adapted for different uses, as anyone who has used a dinner knife to open a postal letter can attest. Many knives, though, are deliberately intended to serve multiple functions, as is the case with a simple pocketknife or, even more emblematically, the classic Swiss army knife, some models of which boast a combination of more than 30 different tools in one. The proliferation of functions performed by different knives has led over the years to different forms and sources of rules governing their manufacture, sale, and deployment.


Exploring the Collaborative Co-Creation Process with AI: A Case Study in Novice Music Production

arXiv.org Artificial Intelligence

Artificial intelligence is reshaping creative domains, yet its co-creative processes, especially in group settings with novice users, remain under explored. To bridge this gap, we conducted a case study in a college-level course where nine undergraduate students were tasked with creating three original music tracks using AI tools over 10 weeks. The study spanned the entire creative journey from ideation to releasing these songs on Spotify. Participants leveraged AI for music and lyric production, cover art, and distribution. Our findings highlight how AI transforms creative workflows: accelerating ideation but compressing the traditional preparation stage, and requiring novices to navigate a challenging idea selection and validation phase. We also identified a new "collaging and refinement" stage, where participants creatively combined diverse AI-generated outputs into cohesive works. Furthermore, AI influenced group social dynamics and role division among human creators. Based on these insights, we propose the Human-AI Co-Creation Stage Model and the Human-AI Agency Model, offering new perspectives on collaborative co-creation with AI.


Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study

arXiv.org Artificial Intelligence

The use of chatbots in language learning has evolved significantly since the 1960s, becoming more sophisticated platforms as generative AI emerged. These tools now simulate natural conversations, adapting to individual learners' needs, including those studying Chinese. Our study explores how learners can use specific prompts to engage Large Language Models (LLM) as personalized chatbots, aiming to target their language level based on the Common European Framework of Reference for Languages (CEFR) and the European Benchmarking Chinese Language (EBCL) project. Focusing on A1, A1+ and A2 levels, we examine the teaching of Chinese, which presents unique challenges due to its logographic writing system. Our goal is to develop prompts that integrate oral and written skills, using high-frequency character lists and controlling oral lexical productions. These tools, powered by generative AI, aim to enhance language practice by crossing lexical and sinographic recurrence. While generative AI shows potential as a personalized tutor, further evaluation is needed to assess its effectiveness. We conducted a systematic series of experiments using ChatGPT models to evaluate their adherence to constraints specified in the prompts. The results indicate that incorporating level A1 and A1+ characters, along with the associated reference list, significantly enhances compliance with the EBCL character set. Properly prompted, LLMs can increase exposure to the target language and offer interactive exchanges to develop language skills.


Review for NeurIPS paper: Offline Imitation Learning with a Misspecified Simulator

Neural Information Processing Systems

Summary and Contributions: The authors are proposing an improvement on existing approaches for imitation learning of policies for embodied agents. The approach is a hybrid between sim-to-real RL approaches (which require a simulator closely matching the real world) and real world imitation learning approaches such as GAIL. The general idea of the paper is that there is a simulator, which, however is allowed to have a different dynamics than the "real world". In particular, the assumption is that two policies can reach the same goal state from the same starting point within H steps in the real-world. The algorithm is tested on the OpenAI Gym environment, where both the real world and the simulator environment are simulations (with different parametrization).


The Download: OpenAI's agent, and what to expect from robotics

MIT Technology Review

What's new: After weeks of buzz, OpenAI has released Operator, its first AI agent. Operator is a web app that can carry out simple online tasks in a browser, such as booking concert tickets or filling an online grocery order. The app is powered by a new model called Computer-Using Agent--CUA for short--built on top of OpenAI's multimodal large language model GPT-4o. Why it matters: OpenAI claims that Operator outperforms similar rival tools, including Anthropic's Computer Use and Google DeepMind's Mariner. The fact that three of the world's top AI firms have converged on the same vision of what agent-based models could be makes one thing clear.


OpenAI's Approach to External Red Teaming for AI Models and Systems

arXiv.org Artificial Intelligence

Red teaming has emerged as a critical practice in assessing the possible risks of AI models and systems. It aids in the discovery of novel risks, stress testing possible gaps in existing mitigations, enriching existing quantitative safety metrics, facilitating the creation of new safety measurements, and enhancing public trust and the legitimacy of AI risk assessments. This white paper describes OpenAI's work to date in external red teaming and draws some more general conclusions from this work. We describe the design considerations underpinning external red teaming, which include: selecting composition of red team, deciding on access levels, and providing guidance required to conduct red teaming. Additionally, we show outcomes red teaming can enable such as input into risk assessment and automated evaluations. We also describe the limitations of external red teaming, and how it can fit into a broader range of AI model and system evaluations. Through these contributions, we hope that AI developers and deployers, evaluation creators, and policymakers will be able to better design red teaming campaigns and get a deeper look into how external red teaming can fit into model deployment and evaluation processes. These methods are evolving and the value of different methods continues to shift as the ecosystem around red teaming matures and models themselves improve as tools for red teaming.