Goto

Collaborating Authors

 nemo



NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models

Bi, Xiaohan, Qi, Binhang, Sun, Hailong, Gao, Xiang, Yu, Yue, Liang, Xiaojun

arXiv.org Artificial Intelligence

With the growing incorporation of deep neural network (DNN) models into modern software systems, the prohibitive construction costs have become a significant challenge. Model reuse has been widely applied to reduce training costs, but indiscriminately reusing entire models may incur significant inference overhead. Consequently, DNN modularization has gained attention, enabling module reuse by decomposing DNN models. The emerging modularizing-while-training (MwT) paradigm, which incorporates modularization into training, outperforms modularizing-after-training approaches. However, existing MwT methods focus on small-scale CNN models at the convolutional kernel level and struggle with diverse DNNs and large-scale models, particularly Transformer-based models. To address these limitations, we propose NeMo, a scalable and generalizable MwT approach. NeMo operates at the neuron level fundamental component common to all DNNs-ensuring applicability to Transformers and various architectures. We design a contrastive learning-based modular training method with an effective composite loss function, enabling scalability to large-scale models. Comprehensive experiments on two Transformer-based models and four CNN models across two classification datasets demonstrate NeMo's superiority over state-of-the-art MwT methods. Results show average gains of 1.72% in module classification accuracy and 58.10% reduction in module size, demonstrating efficacy across both CNN and large-scale Transformer-based models. A case study on open-source projects shows NeMo's potential benefits in practical scenarios, offering a promising approach for scalable and generalizable DNN modularization.


Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models

Neural Information Processing Systems

Diffusion models (DMs) produce very detailed and high-quality images. Prior efforts prevent this issue by either changing the input to the diffusion process, thereby preventing the DM from generating memorized samples during inference, or removing the memorized data from training altogether. While those are viable solutions when the DM is developed and deployed in a secure and constantly monitored environment, they hold the risk of adversaries circumventing the safeguards and are not effective when the DM itself is publicly released. To solve the problem, we introduce NeMo, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers. Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples.


HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing

Chen, Jing, Zhu, Xinyu, Yang, Cheng, Shi, Chufan, Xi, Yadong, Zhang, Yuxiang, Wang, Junjie, Pu, Jiashu, Zhang, Rongsheng, Yang, Yujiu, Feng, Tian

arXiv.org Artificial Intelligence

Generative AI has demonstrated unprecedented creativity in the field of computer vision, yet such phenomena have not been observed in natural language processing. In particular, large language models (LLMs) can hardly produce written works at the level of human experts due to the extremely high complexity of literature writing. In this paper, we present HoLLMwood, an automated framework for unleashing the creativity of LLMs and exploring their potential in screenwriting, which is a highly demanding task. Mimicking the human creative process, we assign LLMs to different roles involved in the real-world scenario. In addition to the common practice of treating LLMs as ${Writer}$, we also apply LLMs as ${Editor}$, who is responsible for providing feedback and revision advice to ${Writer}$. Besides, to enrich the characters and deepen the plots, we introduce a role-playing mechanism and adopt LLMs as ${Actors}$ that can communicate and interact with each other. Evaluations on automatically generated screenplays show that HoLLMwood substantially outperforms strong baselines in terms of coherence, relevance, interestingness and overall quality.


Neural Elevation Models for Terrain Mapping and Path Planning

Dai, Adam, Gupta, Shubh, Gao, Grace

arXiv.org Artificial Intelligence

This work introduces Neural Elevations Models (NEMos), which adapt Neural Radiance Fields to a 2.5D continuous and differentiable terrain model. In contrast to traditional terrain representations such as digital elevation models, NEMos can be readily generated from imagery, a low-cost data source, and provide a lightweight representation of terrain through an implicit continuous and differentiable height field. We propose a novel method for jointly training a height field and radiance field within a NeRF framework, leveraging quantile regression. Additionally, we introduce a path planning algorithm that performs gradient-based optimization of a continuous cost function for minimizing distance, slope changes, and control effort, enabled by differentiability of the height field. We perform experiments on simulated and real-world terrain imagery, demonstrating NEMos ability to generate high-quality reconstructions and produce smoother paths compared to discrete path planning methods. Future work will explore the incorporation of features and semantics into the height field, creating a generalized terrain model.


Computation with Sequences in a Model of the Brain

Dabagia, Max, Papadimitriou, Christos H., Vempala, Santosh S.

arXiv.org Artificial Intelligence

Even as machine learning exceeds human-level performance on many applications, the generality, robustness, and rapidity of the brain's learning capabilities remain unmatched. How cognition arises from neural activity is a central open question in neuroscience, inextricable from the study of intelligence itself. A simple formal model of neural activity was proposed in Papadimitriou [2020] and has been subsequently shown, through both mathematical proofs and simulations, to be capable of implementing certain simple cognitive operations via the creation and manipulation of assemblies of neurons. However, many intelligent behaviors rely on the ability to recognize, store, and manipulate temporal sequences of stimuli (planning, language, navigation, to list a few). Here we show that, in the same model, time can be captured naturally as precedence through synaptic weights and plasticity, and, as a result, a range of computations on sequences of assemblies can be carried out. In particular, repeated presentation of a sequence of stimuli leads to the memorization of the sequence through corresponding neural assemblies: upon future presentation of any stimulus in the sequence, the corresponding assembly and its subsequent ones will be activated, one after the other, until the end of the sequence. Finally, we show that any finite state machine can be learned in a similar way, through the presentation of appropriate patterns of sequences. Through an extension of this mechanism, the model can be shown to be capable of universal computation. We support our analysis with a number of experiments to probe the limits of learning in this model in key ways. Taken together, these results provide a concrete hypothesis for the basis of the brain's remarkable abilities to compute and learn, with sequences playing a vital role.


Nemo: First Glimpse of a New Rule Engine

Ivliev, Alex, Ellmauthaler, Stefan, Gerlach, Lukas, Marx, Maximilian, Meißner, Matthias, Meusel, Simon, Krötzsch, Markus

arXiv.org Artificial Intelligence

This system demonstration presents Nemo, a new logic programming engine with a focus on reliability and performance. Nemo is built for data-centric analytic computations, modelled in a fully declarative Datalog dialect. Its scalability for these tasks matches or exceeds that of leading Datalog systems. We demonstrate uses in reasoning with knowledge graphs and ontologies with 10^5 to 10^8 input facts, all on a laptop. Nemo is written in Rust and available as a free and open source tool.


The Architecture of a Biologically Plausible Language Organ

Mitropolsky, Daniel, Papadimitriou, Christos H.

arXiv.org Artificial Intelligence

We present a simulated biologically plausible language organ, made up of stylized but realistic neurons, synapses, brain areas, plasticity, and a simplified model of sensory perception. We show through experiments that this model succeeds in an important early step in language acquisition: the learning of nouns, verbs, and their meanings, from the grounded input of only a modest number of sentences. Learning in this system is achieved through Hebbian plasticity, and without backpropagation. Our model goes beyond a parser previously designed in a similar environment, with the critical addition of a biologically plausible account for how language can be acquired in the infant's brain, not just processed by a mature brain.


NVIDIA unveils AI Foundations, its customizable Gen-AI cloud service

Engadget

The age of enterprise AI has come crashing down upon us in recent months. Public infatuation with ChatGPT since its release last November has opened the floodgates of corporate interest and set off an industry-wide land grab with every major tech entity vying to stake their claim in this burgeoning market by incorporating generative AI features into their existing products. Heavyweights including Google, Microsoft, Meta, and Baidu are already jockeying their Large Language Models (LLMs) for market dominance, while everybody else, from Adobe and AT&T to BMW and BYD, scrambles to find uses for the revolutionary technology. NVIDIA's newest cloud services offering, AI Foundations, will allow businesses lacking the time and money to develop their own models from scratch to "to build, refine and operate custom large language models and generative AI models that are trained with their own proprietary data and created for their unique domain-specific tasks." These models include NeMo, NVIDIA's text-to-image generation engine and DALL-E 2 competitor; BioNemo, a drug and molecule discovery-focused fork of the NeMo model built for the medical research community; and Picasso, an AI capable of generating images, video and "3D applications… to supercharge productivity for creativity, design and digital simulation," according to Tuesday's release.


10 interesting Deep learning libraries to checkout

#artificialintelligence

It has 10 tasks like retrieval, captioning, visual question answering, multimodal classification, Natural Language Visual Reasoning, Visual Dialogue, Video/Image-text Retrieval etc. It also contains 20 datasets and 30 pre-trained SOTA models for foundation language-vision models. NeMo: NVIDIA's NeMo is a is a conversational AI toolbox to work on automatic speech recognition (ASR), text-to-speech synthesis (TTS), large language models (LLMs), and natural language processing(NLP). NeMo's main goal is to assist researchers from industry and academia in reusing previous work (code and pretrained models) and to facilitate the development of new conversational AI models. Various model architectures are available for Object Detection, Instance Segmentation, Panoptic Segmentation, Contrastive Learning and Distillation. One can use existing or new datasets/models, also customize them for your problems.