Goto

Collaborating Authors

 composability


Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Neural Information Processing Systems

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challenges in understanding such complex texts and generating the corresponding images. Therefore, there is a growing need to enable more control modes beyond text description. In this paper, we introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls (e.g., edge maps, depth map, segmentation masks) and global controls (e.g., CLIP image embeddings) in a flexible and composable manner within one single model. Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models, eliminating the huge cost of training from scratch. Moreover, thanks to some dedicated adapter designs, Uni-ControlNet only necessitates a constant number (i.e., 2) of adapters, regardless of the number of local or global controls used. This not only reduces the fine-tuning costs and model size, making it more suitable for real-world deployment, but also facilitate composability of different conditions. Through both quantitative and qualitative comparisons, Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability.


ContraGAN (R1, R2, R3, R4), the novelty of the proposed 2C loss (R1, R2, R4), composability with modern

Neural Information Processing Systems

We thank the reviewers for the constructive comments. Every experiment and explanation in this rebuttal will be included in the paper. We will introduce the concept of data-to-data relations carefully. Our 2C loss can take advantage of the strengths of both losses. Compared with Eq. 7 loss, 2C loss considers cosine We conduct experiments to compare 2C loss with other losses.


LEGO-Compiler: Enhancing Neural Compilation Through Translation Composability

Zhang, Shuoming, Zhao, Jiacheng, Xia, Chunwei, Wang, Zheng, Chen, Yunji, Feng, Xiaobing, Cui, Huimin

arXiv.org Artificial Intelligence

Large language models (LLMs) have the potential to revolutionize how we design and implement compilers and code translation tools. However, existing LLMs struggle to handle long and complex programs. We introduce LEGO-Compiler, a novel neural compilation system that leverages LLMs to translate high-level languages into assembly code. Our approach centers on three key innovations: LEGO translation, which decomposes the input program into manageable blocks; breaking down the complex compilation process into smaller, simpler verifiable steps by organizing it as a verifiable LLM workflow by external tests; and a feedback mechanism for self-correction. Supported by formal proofs of translation composability, LEGO-Compiler demonstrates high accuracy on multiple datasets, including over 99% on ExeBench and 97.9% on industrial-grade AnsiBench. Additionally, LEGO-Compiler has also acheived near one order-of-magnitude improvement on compilable code size scalability. This work opens new avenues for applying LLMs to system-level tasks, complementing traditional compiler technologies.


Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Yang, Sohee, Kassner, Nora, Gribovskaya, Elena, Riedel, Sebastian, Geva, Mor

arXiv.org Artificial Intelligence

We evaluate how well Large Language Models (LLMs) latently recall and compose facts to answer multi-hop queries like "In the year Scarlett Johansson was born, the Summer Olympics were hosted in the country of". One major challenge in evaluating this ability is that LLMs may have developed shortcuts by encounters of the head entity "Scarlett Johansson" and the answer entity "United States" in the same training sequences or merely guess the answer based on frequency-based priors. To prevent shortcuts, we exclude test queries where the head and answer entities co-appear in pretraining corpora. Through careful selection of relations and facts and systematic removal of cases where models might guess answers or exploit partial matches, we construct an evaluation dataset SOCRATES (ShOrtCut-fRee lATent rEaSoning). We observe that LLMs demonstrate promising latent multi-hop reasoning abilities without exploiting shortcuts, but only for certain types of queries. For queries requiring latent recall of countries as the intermediate answer, the best models achieve 80% latent composability, but this drops to just 5% for the recall of years. Comparisons with Chain-of-Thought composability highlight a significant gap between the ability of models to reason latently versus explicitly. Analysis reveals that latent representations of the intermediate answer are constructed more often in queries with higher latent composability, and shows the emergence of latent multi-hop reasoning during pretraining.


Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Neural Information Processing Systems

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challenges in understanding such complex texts and generating the corresponding images. Therefore, there is a growing need to enable more control modes beyond text description. In this paper, we introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls (e.g., edge maps, depth map, segmentation masks) and global controls (e.g., CLIP image embeddings) in a flexible and composable manner within one single model.


A theory of understanding for artificial intelligence: composability, catalysts, and learning

Zhang, Zijian, Aronowitz, Sara, Aspuru-Guzik, Alán

arXiv.org Artificial Intelligence

Understanding is a crucial yet elusive concept in artificial intelligence (AI). This work proposes a framework for analyzing understanding based on the notion of composability. Given any subject (e.g., a person or an AI), we suggest characterizing its understanding of an object in terms of its ability to process (compose) relevant inputs into satisfactory outputs from the perspective of a verifier. This highly universal framework can readily apply to non-human subjects, such as AIs, non-human animals, and institutions. Further, we propose methods for analyzing the inputs that enhance output quality in compositions, which we call catalysts. We show how the structure of a subject can be revealed by analyzing its components that act as catalysts and argue that a subject's learning ability can be regarded as its ability to compose inputs into its inner catalysts. Finally we examine the importance of learning ability for AIs to attain general intelligence. Our analysis indicates that models capable of generating outputs that can function as their own catalysts, such as language models, establish a foundation for potentially overcoming existing limitations in AI understanding.


Composable Interventions for Language Models

Kolbeinsson, Arinbjorn, O'Brien, Kyle, Huang, Tianjin, Gao, Shanghua, Liu, Shiwei, Schwarz, Jonathan Richard, Vaidya, Anurag, Mahmood, Faisal, Zitnik, Marinka, Chen, Tianlong, Hartvigsen, Thomas

arXiv.org Artificial Intelligence

Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions. All of our code is public: https://github.com/hartvigsen-group/composable-interventions.


Why composability is key to scaling digital twins

#artificialintelligence

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Digital twins enable enterprises to model and simulate buildings, products, manufacturing lines, facilities and processes. This can improve performance, quickly flag quality errors and support better decision-making. Today, most digital twin projects are one-off efforts.


Composability set to be next big thing for digital experience platforms

#artificialintelligence

The pressure on organisations to offer excellent customer experience to each unique individual is growing. Today's consumers have much lower levels of patience for businesses that fail to meet their exacting expectations, whether around service, availability, delivery or choice. Rises in inflation rates and cost of living are driving a decrease in spending and consumers being more selective about what they buy, and where from. According to the latest EY Future Consumer Index, 42% of people will only buy from brands that align with their values, and 36% will only visit stores that offer great experiences. Against this backdrop, a digital experience platform (DXP) could prove a vital investment for organisations.


The Top Artificial Intelligence Prediction for 2022: Composable AI

#artificialintelligence

It's the key to nimbly adapting to the sometimes seismic shifts in business climates that unexpectedly arise. But according to Indico Data CEO Tom Wilde, it's something altogether else that could very well be of even more importance to firms today. "All organizations, it doesn't matter what industry you're in, recognize that their unique ability to codify the work that they do is a competitive advantage," Wilde explained. "That codification comes from the kind of investments they made in technology and the employee experience and customer experience." Investments in composable AI solutions enable the sort of codification Wilde referenced while allowing firms to build applications, workflows, and business processes with a modular approach that's rapidly interchangeable to suit the particularities of any use case--or business condition--that arises.