generation
ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image
We present a novel method for reconstructing 3D objects from a single RGB image. Our method leverages the latest image generation models to infer the hidden 3D structure while remaining faithful to the input image. While existing methods obtain impressive results in generating 3D models from text prompts, they do not provide an easy approach for conditioning on input RGB data. Naive extensions of these methods often lead to improper alignment in appearance between the input image and the 3D reconstructions. We address these challenges by introducing Image Constrained Radiance Fields (ConRad), a novel variant of neural radiance fields. ConRad is an efficient 3D representation that explicitly captures the appearance of an input image in one viewpoint. We propose a training algorithm that leverages the single RGB image in conjunction with pretrained Diffusion Models to optimize the parameters of a ConRad representation. Extensive experiments show that ConRad representations can simplify preservation of image details while producing a realistic 3D reconstruction. Compared to existing state-of-the-art baselines, we show that our 3D reconstructions remain more faithful to the input and produce more consistent 3D models while demonstrating significantly improved quantitative performance on a ShapeNet object benchmark.
Roundtables: Trump's Impact on the Next Generation of Innovators
Watch a subscriber-only conversation on how researchers and entrepreneurs are faring under the new administration. Every year, MIT Technology Review recognizes dozens of young researchers on our Innovators Under 35 list. We checked back in with recent honorees to see how they're faring amid sweeping changes to science and technology policy within the US. Learn about the complex realities of what life has been like for those aiming to build their labs and companies in today's political climate. How Trump's policies are affecting early-career scientists--in their own words It's surprisingly easy to stumble into a relationship with an AI chatbot Rhiannon Williams Therapists are secretly using ChatGPT. How these two brothers became go-to experts on America's "mystery drone" invasion Matthew Phelan It's surprisingly easy to stumble into a relationship with an AI chatbot Therapists are secretly using ChatGPT.
Tetrahedron Splatting for 3D Generation
As a flexible representation, NeRF has been first adopted for 3D representation. With density-based volumetric rendering, it however suffers both intensive computational overhead and inaccurate mesh extraction. Using a signed distance field and Marching Tetrahedra, DMTet allows for precise mesh extraction and real-time rendering but is limited in handling large topological changes in meshes, leading to optimization challenges. Alternatively, 3D Gaussian Splatting (3DGS) is favored in both training and rendering efficiency while falling short in mesh extraction. In this work, we introduce a novel 3D representation, Tetrahedron Splatting (TeT-Splatting), that supports easy convergence during optimization, precise mesh extraction, and real-time rendering simultaneously.
GPT's Devastated and LLaMA's Content: Emotion Representation Alignment in LLMs for Keyword-based Generation
Choudhury, Shadab, Kumar, Asha, Martin, Lara J.
In controlled text generation using large language models (LLMs), gaps arise between the language model's interpretation and human expectations. We look at the problem of controlling emotions in keyword-based sentence generation for both GPT-4 and LLaMA-3. We selected four emotion representations: Words, Valence-Arousal-Dominance (VAD) dimensions expressed in both Lexical and Numeric forms, and Emojis. Our human evaluation looked at the Human-LLM alignment for each representation, as well as the accuracy and realism of the generated sentences. While representations like VAD break emotions into easy-to-compute components, our findings show that people agree more with how LLMs generate when conditioned on English words (e.g., "angry") rather than VAD scales. This difference is especially visible when comparing Numeric VAD to words. However, we found that converting the originally-numeric VAD scales to Lexical scales (e.g., +4.0 becomes "High") dramatically improved agreement. Furthermore, the perception of how much a generated sentence conveys an emotion is highly dependent on the LLM, representation type, and which emotion it is.
- North America > United States > Maryland (0.28)
- Asia > China (0.28)
- Oceania > Australia (0.14)
- (10 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Ext2Gen: Alignment through Unified Extraction and Generation for Robust Retrieval-Augmented Generation
Song, Hwanjun, Choi, Jeonghwan, Kim, Minseok
RAG has proven its effectiveness in reducing hallucinations We go beyond accurate retrieval to emphasize in LLMs, when their knowledge is incomplete, robust generation that remains resilient to forgetting outdated, or lacks sufficient detail to accurately and distraction by the two challenges. Our key address specific queries (Gao et al., 2023b; idea for enhancing robustness is an extract-thengenerate Fan et al., 2024). A critical aspect of RAG is the approach, Ext2Gen, where the model "retrieval" process, which involves identifying and first extracts query-relevant sentences from the retrieved selecting relevant text chunks. The quality of these chunks and then refine the information to retrieved chunks plays a pivotal role in the overall generate a precise answer. The extraction step here performance of RAG, as they form the basis serves as a chain-of-thought (CoT) process (Wei for generating factual and contextually relevant answers et al., 2022; Chu et al., 2023), where the model provides aligned with the query intent (Asai et al., the evidence first before generating the final 2024; Wang et al., 2023; Zhang et al., 2024).
RMG: Real-Time Expressive Motion Generation with Self-collision Avoidance for 6-DOF Companion Robotic Arms
Li, Jiansheng, Song, Haotian, Zhou, Jinni, Nie, Qiang, Cai, Yi
The six-degree-of-freedom (6-DOF) robotic arm has gained widespread application in human-coexisting environments. While previous research has predominantly focused on functional motion generation, the critical aspect of expressive motion in human-robot interaction remains largely unexplored. This paper presents a novel real-time motion generation planner that enhances interactivity by creating expressive robotic motions between arbitrary start and end states within predefined time constraints. Our approach involves three key contributions: first, we develop a mapping algorithm to construct an expressive motion dataset derived from human dance movements; second, we train motion generation models in both Cartesian and joint spaces using this dataset; third, we introduce an optimization algorithm that guarantees smooth, collision-free motion while maintaining the intended expressive style. Experimental results demonstrate the effectiveness of our method, which can generate expressive and generalized motions in under 0.5 seconds while satisfying all specified constraints.
- Health & Medicine (0.46)
- Transportation (0.41)
Graph of AI Ideas: Leveraging Knowledge Graphs and LLMs for AI Research Idea Generation
Gao, Xian, Zhang, Zongyun, Xie, Mingye, Liu, Ting, Fu, Yuzhuo
Reading relevant scientific papers and analyzing research development trends is a critical step in generating new scientific ideas. However, the rapid increase in the volume of research literature and the complex citation relationships make it difficult for researchers to quickly analyze and derive meaningful research trends. The development of large language models (LLMs) has provided a novel approach for automatically summarizing papers and generating innovative research ideas. However, existing paper-based idea generation methods either simply input papers into LLMs via prompts or form logical chains of creative development based on citation relationships, without fully exploiting the semantic information embedded in these citations. Inspired by knowledge graphs and human cognitive processes, we propose a framework called the Graph of AI Ideas (GoAI) for the AI research field, which is dominated by open-access papers. This framework organizes relevant literature into entities within a knowledge graph and summarizes the semantic information contained in citations into relations within the graph. This organization effectively reflects the relationships between two academic papers and the advancement of the AI research field. Such organization aids LLMs in capturing the current progress of research, thereby enhancing their creativity. Experimental results demonstrate the effectiveness of our approach in generating novel, clear, and effective research ideas.
- North America (0.14)
- Europe > United Kingdom > England (0.14)
- Asia > Thailand (0.14)
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
Jin, Jiho, Kang, Woosung, Myung, Junho, Oh, Alice
Measuring social bias in large language models (LLMs) is crucial, but existing bias evaluation methods struggle to assess bias in long-form generation. We propose a Bias Benchmark for Generation (BBG), an adaptation of the Bias Benchmark for QA (BBQ), designed to evaluate social bias in long-form generation by having LLMs generate continuations of story prompts. Building our benchmark in English and Korean, we measure the probability of neutral and biased generations across ten LLMs. We also compare our long-form story generation evaluation results with multiple-choice BBQ evaluation, showing that the two approaches produce inconsistent results.
- North America > United States (0.28)
- Asia > Thailand (0.14)
- North America > Canada (0.14)
- (3 more...)
Infinite Leagues Under the Sea: Photorealistic 3D Underwater Terrain Generation by Latent Fractal Diffusion Models
Zhang, Tianyi, Zhi, Weiming, Mangelson, Joshua, Johnson-Roberson, Matthew
This paper tackles the problem of generating representations of underwater 3D terrain. Off-the-shelf generative models, trained on Internet-scale data but not on specialized underwater images, exhibit downgraded realism, as images of the seafloor are relatively uncommon. To this end, we introduce DreamSea, a generative model to generate hyper-realistic underwater scenes. DreamSea is trained on real-world image databases collected from underwater robot surveys. Images from these surveys contain massive real seafloor observations and covering large areas, but are prone to noise and artifacts from the real world. We extract 3D geometry and semantics from the data with visual foundation models, and train a diffusion model that generates realistic seafloor images in RGBD channels, conditioned on novel fractal distribution-based latent embeddings. We then fuse the generated images into a 3D map, building a 3DGS model supervised by 2D diffusion priors which allows photorealistic novel view rendering. DreamSea is rigorously evaluated, demonstrating the ability to robustly generate large-scale underwater scenes that are consistent, diverse, and photorealistic. Our work drives impact in multiple domains, spanning filming, gaming, and robot simulation.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
VACT: A Video Automatic Causal Testing System and a Benchmark
Yang, Haotong, Zheng, Qingyuan, Gao, Yunjian, Yang, Yongkun, He, Yangbo, Lin, Zhouchen, Zhang, Muhan
With the rapid advancement of text-conditioned Video Generation Models (VGMs), the quality of generated videos has significantly improved, bringing these models closer to functioning as ``*world simulators*'' and making real-world-level video generation more accessible and cost-effective. However, the generated videos often contain factual inaccuracies and lack understanding of fundamental physical laws. While some previous studies have highlighted this issue in limited domains through manual analysis, a comprehensive solution has not yet been established, primarily due to the absence of a generalized, automated approach for modeling and assessing the causal reasoning of these models across diverse scenarios. To address this gap, we propose VACT: an **automated** framework for modeling, evaluating, and measuring the causal understanding of VGMs in real-world scenarios. By combining causal analysis techniques with a carefully designed large language model assistant, our system can assess the causal behavior of models in various contexts without human annotation, which offers strong generalization and scalability. Additionally, we introduce multi-level causal evaluation metrics to provide a detailed analysis of the causal performance of VGMs. As a demonstration, we use our framework to benchmark several prevailing VGMs, offering insight into their causal reasoning capabilities. Our work lays the foundation for systematically addressing the causal understanding deficiencies in VGMs and contributes to advancing their reliability and real-world applicability.
- Research Report > New Finding (0.45)
- Research Report > Experimental Study (0.45)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)