generation
ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image
We present a novel method for reconstructing 3D objects from a single RGB image. Our method leverages the latest image generation models to infer the hidden 3D structure while remaining faithful to the input image. While existing methods obtain impressive results in generating 3D models from text prompts, they do not provide an easy approach for conditioning on input RGB data. Naive extensions of these methods often lead to improper alignment in appearance between the input image and the 3D reconstructions. We address these challenges by introducing Image Constrained Radiance Fields (ConRad), a novel variant of neural radiance fields. ConRad is an efficient 3D representation that explicitly captures the appearance of an input image in one viewpoint. We propose a training algorithm that leverages the single RGB image in conjunction with pretrained Diffusion Models to optimize the parameters of a ConRad representation. Extensive experiments show that ConRad representations can simplify preservation of image details while producing a realistic 3D reconstruction. Compared to existing state-of-the-art baselines, we show that our 3D reconstructions remain more faithful to the input and produce more consistent 3D models while demonstrating significantly improved quantitative performance on a ShapeNet object benchmark.
Roundtables: Trump's Impact on the Next Generation of Innovators
Watch a subscriber-only conversation on how researchers and entrepreneurs are faring under the new administration. Every year, MIT Technology Review recognizes dozens of young researchers on our Innovators Under 35 list. We checked back in with recent honorees to see how they're faring amid sweeping changes to science and technology policy within the US. Learn about the complex realities of what life has been like for those aiming to build their labs and companies in today's political climate. How Trump's policies are affecting early-career scientists--in their own words It's surprisingly easy to stumble into a relationship with an AI chatbot Rhiannon Williams Therapists are secretly using ChatGPT. How these two brothers became go-to experts on America's "mystery drone" invasion Matthew Phelan It's surprisingly easy to stumble into a relationship with an AI chatbot Therapists are secretly using ChatGPT.
Tetrahedron Splatting for 3D Generation
As a flexible representation, NeRF has been first adopted for 3D representation. With density-based volumetric rendering, it however suffers both intensive computational overhead and inaccurate mesh extraction. Using a signed distance field and Marching Tetrahedra, DMTet allows for precise mesh extraction and real-time rendering but is limited in handling large topological changes in meshes, leading to optimization challenges. Alternatively, 3D Gaussian Splatting (3DGS) is favored in both training and rendering efficiency while falling short in mesh extraction. In this work, we introduce a novel 3D representation, Tetrahedron Splatting (TeT-Splatting), that supports easy convergence during optimization, precise mesh extraction, and real-time rendering simultaneously.
Wyckoff Transformer: Generation of Symmetric Crystals
Kazeev, Nikita, Nong, Wei, Romanov, Ignat, Zhu, Ruiming, Ustyuzhanin, Andrey, Yamazaki, Shuya, Hippalgaonkar, Kedar
Symmetry rules that atoms obey when they bond together to form an ordered crystal play a fundamental role in determining their physical, chemical, and electronic properties such as electrical and thermal conductivity, optical and polarization behavior, and mechanical strength. Almost all known crystalline materials have internal symmetry. Consistently generating stable crystal structures is still an open challenge, specifically because such symmetry rules are not accounted for. To address this issue, we propose WyFormer, a generative model for materials conditioned on space group symmetry. We use Wyckoff positions as the basis for an elegant, compressed, and discrete structure representation. To model the distribution, we develop a permutation-invariant autoregressive model based on the Transformer and an absence of positional encoding. WyFormer has a unique and powerful synergy of attributes, proven by extensive experimentation: best-in-class symmetry-conditioned generation, physics-motivated inductive bias, competitive stability of the generated structures, competitive material property prediction quality, and unparalleled inference speed.
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
Zhou, Zhenglin, Xia, Xiaobo, Ma, Fan, Fan, Hehe, Yang, Yi, Chua, Tat-Seng
Text-to-3D generation automates 3D content creation from textual descriptions, which offers transformative potential across various fields. However, existing methods often struggle to align generated content with human preferences, limiting their applicability and flexibility. To address these limitations, in this paper, we propose DreamDPO, an optimization-based framework that integrates human preferences into the 3D generation process, through direct preference optimization. Practically, DreamDPO first constructs pairwise examples, then compare their alignment with human preferences using reward or large multimodal models, and lastly optimizes the 3D representation with a preference-driven loss function. By leveraging pairwise comparison to reflect preferences, DreamDPO reduces reliance on precise pointwise quality evaluations while enabling fine-grained controllability through preference-guided optimization. Experiments demonstrate that DreamDPO achieves competitive results, and provides higher-quality and more controllable 3D content compared to existing methods. The code and models will be open-sourced.
A Survey on Generative Diffusion Model
Cao, Hanqun, Tan, Cheng, Gao, Zhangyang, Xu, Yilun, Chen, Guangyong, Heng, Pheng-Ann, Li, Stan Z.
Deep generative models have unlocked another profound realm of human creativity. By capturing and generalizing patterns within data, we have entered the epoch of all-encompassing Artificial Intelligence for General Creativity (AIGC). Notably, diffusion models, recognized as one of the paramount generative models, materialize human ideation into tangible instances across diverse domains, encompassing imagery, text, speech, biology, and healthcare. To provide advanced and comprehensive insights into diffusion, this survey comprehensively elucidates its developmental trajectory and future directions from three distinct angles: the fundamental formulation of diffusion, algorithmic enhancements, and the manifold applications of diffusion. Each layer is meticulously explored to offer a profound comprehension of its evolution. Structured and summarized approaches are presented in https://github.com/chq1155/A-Survey-on-Generative-Diffusion-Model.
How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench
Ye, Qinyuan, Fu, Harvey Yiyun, Ren, Xiang, Jia, Robin
We investigate the predictability of large language model (LLM) capabilities: given records of past experiments using different model families, numbers of parameters, tasks, and numbers of in-context examples, can we accurately predict LLM performance on new experiment configurations? Answering this question has practical implications for LLM users (e.g., deciding which models to try), developers (e.g., prioritizing evaluation on representative tasks), and the research community (e.g., identifying hard-to-predict capabilities that warrant further investigation). We study the performance prediction problem on experiment records from BIG-bench. On a random train-test split, an MLP-based predictor achieves an $R^2$ score greater than 95%, indicating the presence of learnable patterns within the experiment records. We then formulate the problem of searching for "small-bench," an informative subset of BIG-bench tasks from which the performance on the full set can be maximally recovered. We find a subset as informative as BIG-bench Hard for evaluating new model families, while being $3\times$ smaller. Additionally, we find competitive subsets by clustering task representations learned by our MLP-based predictor and selecting tasks close to cluster centroids, highlighting the importance of task diversity in constructing "small-bench."
Fine-tuned Language Models are Continual Learners
Scialom, Thomas, Chakrabarty, Tuhin, Muresan, Smaranda
Recent work on large language models relies on the intuition that most natural language processing tasks can be described via natural language instructions. Language models trained on these instructions show strong zero-shot performance on several standard datasets. However, these models even though impressive still perform poorly on a wide range of tasks outside of their respective training and evaluation sets. To address this limitation, we argue that a model should be able to keep extending its knowledge and abilities, without forgetting previous skills. In spite of the limited success of Continual Learning we show that Language Models can be continual learners. We empirically investigate the reason for this success and conclude that Continual Learning emerges from self-supervision pre-training. Our resulting model Continual-T0 (CT0) is able to learn diverse new tasks, while still maintaining good performance on previous tasks, spanning remarkably through 70 datasets in total. Finally, we show that CT0 is able to combine instructions in ways it was never trained for, demonstrating some compositionality.
The Next Generation of Threat Detection Will Require Both Human and Machine Expertise
There is a debate in the world of cybersecurity about whether to use human or machine expertise. However, this is a false dichotomy: Truly effective threat detection and response need both kinds of expertise working in tandem. It will be years before machines completely replace the humans who perform typical detection and response tasks. What we predict for the meantime is a symbiotic relationship between humans and machines. The combination means that detection of and response to threats can be faster and more intelligent.