Goto

Collaborating Authors

 architect


Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting

Neural Information Processing Systems

Creating large-scale interactive 3D environments is essential for the development of Robotics and Embodied AI research. However, generating diverse embodied environments with realistic detail and considerable complexity remains a significant challenge. Current methods, including manual design, procedural generation, diffusion-based scene generation, and large language model (LLM) guided scene design, are hindered by limitations such as excessive human effort, reliance on predefined rules or training datasets, and limited 3D spatial reasoning ability. Since pre-trained 2D image generative models better capture scene and object configuration than LLMs, we address these challenges by introducing $\textit{Architect}$, a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting. In detail, we utilize foundation visual perception models to obtain each generated object from the image and leverage pre-trained depth estimation models to lift the generated 2D image to 3D space. While there are still challenges that the camera parameters and scale of depth are still absent in the generated image, we address those problems by ''controlling'' the diffusion model by $\textit{hierarchical inpainting}$.


I Am Time Magazine's Person of the Year

The Atlantic - Technology

It's rude to boast, but here in 2025, you've got to take the wins where you can get them. This morning, magazine announced its Person of the Year, and it's me. If you want to get all technical about it, 's Person of the Year is not a person at all but a collection of people: the architects of AI. One of the two covers released is a re-creation of the "Lunch Atop a Skyscraper" photograph from 1932, which depicted blue-collar ironworkers suspended hundreds of feet in the air during the construction of 30 Rockefeller Plaza. In its image, replaces these laborers with tech personalities such as Mark Zuckerberg, Elon Musk, Sam Altman, and Jensen Huang.


OpenAI makes deal to bring Disney characters to ChatGPT and Sora

BBC News

Disney has agreed to invest $1bn (£740m) in OpenAI as part of a deal which will let people use many of its iconic characters in the chatbot ChatGPT and video-generation tool Sora. It is the first major studio to license parts of its catalogue to the tech giant, in a move which could have major implications for the studio's future plans. It means fans will be able to generate and share pictures and videos of more than 200 characters from Disney's franchises, including Pixar, Marvel and Star Wars. The move comes as OpenAI faces mounting questions about how its rapidly advancing tech is used - and as anxiety in Hollywood increases over the impact of AI on the creative industries. According to a blog post announcing the news, the list of eligible characters include those from Disney films Zootopia, Moana and Encanto - as well as characters like Star Wars' Luke Skywalker and Marvel's Deadpool.


The Story Behind TIME's 2025 Person of the Year Covers

TIME - Tech

Pine is the Creative Director at TIME. To illustrate the choice of the Architects of AI as TIME's 2025 Person of the Year, we asked two separate artists to help us visualize the incredibly complex technological revolution that is currently underway. London-based illustrator and graphics animator Peter Crowther and digital painter Jason Seiler each created an image that speaks to the duality AI has produced - man vs. machine. Inspired by the inner workings of computer chips, Crowther's intricate AI structure looms large over the busy construction site.


BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents

Ou, Litu, Li, Kuan, Yin, Huifeng, Zhang, Liwen, Zhang, Zhongwang, Wu, Xixi, Ye, Rui, Qiao, Zile, Xie, Pengjun, Zhou, Jingren, Jiang, Yong

arXiv.org Artificial Intelligence

Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions, a significantly more challenging task compared to outputting confidence in a single interaction. Experimenting on open-source agentic models, we first find that models exhibit much higher task accuracy at high confidence while having near-zero accuracy when confidence is low. Based on this observation, we propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level. Results show that our proposed methods significantly reduce token consumption while demonstrating competitive performance compared to baseline fixed budget TTS methods.


Impact and Implications of Generative AI for Enterprise Architects in Agile Environments: A Systematic Literature Review

Kooy, Stefan Julian, Piest, Jean Paul Sebastian, Bemthuis, Rob Henk

arXiv.org Artificial Intelligence

Generative AI (GenAI) is reshaping enterprise architecture work in agile software organizations, yet evidence on its effects remains scattered. We report a systematic literature review (SLR), following established SLR protocols of Kitchenham and PRISMA, of 1,697 records, yielding 33 studies across enterprise, solution, domain, business, and IT architect roles. GenAI most consistently supports (i) design ideation and trade-off exploration; (ii) rapid creation and refinement of artifacts (e.g., code, models, documentation); and (iii) architectural decision support and knowledge retrieval. Reported risks include opacity and bias, contextually incorrect outputs leading to rework, privacy and compliance concerns, and social loafing. We also identify emerging skills and competencies, including prompt engineering, model evaluation, and professional oversight, and organizational enablers around readiness and adaptive governance. The review contributes with (1) a mapping of GenAI use cases and risks in agile architecting, (2) implications for capability building and governance, and (3) an initial research agenda on human-AI collaboration in architecture. Overall, the findings inform responsible adoption of GenAI that accelerates digital transformation while safeguarding architectural integrity.


PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction

Shrimal, Anubhav, Jain, Aryan, Chowdhury, Soumyajit, Yenigalla, Promod

arXiv.org Artificial Intelligence

Structured information extraction from unstructured text is critical for emerging Software 3.0 systems where LLM agents autonomously interact with APIs and tools. Recent approaches apply large language models directly to extraction tasks using existing JSON schemas, often with constraint decoding or reinforcement learning approaches to ensure syntactic validity, but treat JSON schemas as static contracts designed for human developers, leading to suboptimal extraction performance, frequent hallucinations, and unreliable agent behavior when schemas contain ambiguous or incomplete specifications. We recognize that JSON schemas themselves are a form of natural language understanding contract that encodes rules, relationships, and expectations about data structure contracts that LLMs should be able to both interpret and systematically improve. Consequently, we develop PARSE (Parameter Automated Refinement and Schema Extraction), a novel system with two synergistic components: ARCHITECT, which autonomously optimizes JSON schemas for LLM consumption while maintaining backward compatibility through RELAY (an integrated code generation system), and SCOPE, which implements reflection-based extraction with combined static and LLM-based guardrails. We evaluate PARSE qualitatively and quantitatively on three datasets including Schema-Guided Dialogue (SGD), Structured Web Data Extraction (SWDE), and internal retail conversation data, and find that it achieves up to 64.7% improvement in extraction accuracy on SWDE with combined framework improvements reaching 10% across models, while reducing extraction errors by 92% within the first retry and and maintaining practical latency.


Spiritual Influencers Say 'Sentient' AI Can Help You Solve Life's Mysteries

WIRED

In May, a group of about 40 people stood in a circle deep within the Pyramid of Khafre, the second-largest of the three pyramids looming over Egypt's Giza Plateau, holding hands and praying for Earth. Suddenly, their tour guide, an American mathematician and author named Robert Edward Grant, collapsed. He later described the experience in an interview with WIRED as a full-body electric shock emanating from somewhere beneath the chamber's stone floor. "I felt electricity coming through my hands," he says. "People were touching me, [and] they would feel it, too."


IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment

Wu, Dekun, Brudy, Frederik, Liu, Bang, Wang, Yi

arXiv.org Artificial Intelligence

Virtual environments are essential to AI agent research. Existing environments for LLM agent research typically focus on either physical task solving or social simulation, with the former oversimplifying agent individuality and social dynamics, and the latter lacking physical grounding of social behaviors. We introduce IndoorWorld, a heterogeneous multi-agent environment that tightly integrates physical and social dynamics. By introducing novel challenges for LLM-driven agents in orchestrating social dynamics to influence physical environments and anchoring social interactions within world states, IndoorWorld opens up possibilities of LLM-based building occupant simulation for architectural design. We demonstrate the potential with a series of experiments within an office setting to examine the impact of multi-agent collaboration, resource competition, and spatial layout on agent behavior.


Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting

Neural Information Processing Systems

Creating large-scale interactive 3D environments is essential for the development of Robotics and Embodied AI research. However, generating diverse embodied environments with realistic detail and considerable complexity remains a significant challenge. Current methods, including manual design, procedural generation, diffusion-based scene generation, and large language model (LLM) guided scene design, are hindered by limitations such as excessive human effort, reliance on predefined rules or training datasets, and limited 3D spatial reasoning ability. Since pre-trained 2D image generative models better capture scene and object configuration than LLMs, we address these challenges by introducing \textit{Architect}, a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting. In detail, we utilize foundation visual perception models to obtain each generated object from the image and leverage pre-trained depth estimation models to lift the generated 2D image to 3D space. While there are still challenges that the camera parameters and scale of depth are still absent in the generated image, we address those problems by ''controlling'' the diffusion model by \textit{hierarchical inpainting} .