modular
A Former Apple Luminary Sets Out to Create the Ultimate GPU Software
Demand for AI chips is booming--and so is the need for software to run them. Chris Lattner's startup Modular just raised $250 million to build the best developer tools for AI hardware. At a certain point between building Apple's developer tools, leading a core part of Google's AI infrastructure team, and clashing with Elon Musk during a stint as Tesla's Autopilot chief, Chris Lattner's vision for his life's work started to come into focus. AI was taking over the world, and demand was growing for the chips that powered it. But the software stack for those chips was dominated by just a few big companies.
- Banking & Finance > Capital Markets (0.48)
- Information Technology > Hardware (0.40)
How Modular should Neural Module Networks Be for Systematic Generalization?
Neural Module Networks (NMNs) aim at Visual Question Answering (VQA) via composition of modules that tackle a sub-task. NMNs are a promising strategy to achieve systematic generalization, i.e., overcoming biasing factors in the training distribution. However, the aspects of NMNs that facilitate systematic generalization are not fully understood. In this paper, we demonstrate that the degree of modularity of the NMN have large influence on systematic generalization. In a series of experiments on three VQA datasets (VQA-MNIST, SQOOP, and CLEVR-CoGenT), our results reveal that tuning the degree of modularity, especially at the image encoder stage, reaches substantially higher systematic generalization.
OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions
Zhang, Yi-Kai, Zhong, Xu-Xiang, Lu, Shiyin, Chen, Qing-Guo, Zhan, De-Chuan, Ye, Han-Jia
The rapid advancements in Large Language Models (LLMs) have significantly expanded their applications, ranging from multilingual support to domain-specific tasks and multimodal integration. In this paper, we present OmniEvalKit, a novel benchmarking toolbox designed to evaluate LLMs and their omni-extensions across multilingual, multidomain, and multimodal capabilities. Unlike existing benchmarks that often focus on a single aspect, OmniEvalKit provides a modular, lightweight, and automated evaluation system. It is structured with a modular architecture comprising a Static Builder and Dynamic Data Flow, promoting the seamless integration of new models and datasets. OmniEvalKit supports over 100 LLMs and 50 evaluation datasets, covering comprehensive evaluations across thousands of model-dataset combinations. OmniEvalKit is dedicated to creating an ultra-lightweight and fast-deployable evaluation framework, making downstream applications more convenient and versatile for the AI community.
From Modular to End-to-End Speaker Diarization
Speaker diarization is usually referred to as the task that determines ``who spoke when'' in a recording. Until a few years ago, all competitive approaches were modular. Systems based on this framework reached state-of-the-art performance in most scenarios but had major difficulties dealing with overlapped speech. More recently, the advent of end-to-end models, capable of dealing with all aspects of speaker diarization with a single model and better performing regarding overlapped speech, has brought high levels of attention. This thesis is framed during a period of co-existence of these two trends. We describe a system based on a Bayesian hidden Markov model used to cluster x-vectors (speaker embeddings obtained with a neural network), known as VBx, which has shown remarkable performance on different datasets and challenges. We comment on its advantages and limitations and evaluate results on different relevant corpora. Then, we move towards end-to-end neural diarization (EEND) methods. Due to the need for large training sets for training these models and the lack of manually annotated diarization data in sufficient quantities, the compromise solution consists in generating training data artificially. We describe an approach for generating synthetic data which resembles real conversations in terms of speaker turns and overlaps. We show how this method generating ``simulated conversations'' allows for better performance than using a previously proposed method for creating ``simulated mixtures'' when training the popular EEND with encoder-decoder attractors (EEND-EDA). We also propose a new EEND-based model, which we call DiaPer, and show that it can perform better than EEND-EDA, especially when dealing with many speakers and handling overlapped speech. Finally, we compare both VBx-based and DiaPer systems on a wide variety of corpora and comment on the advantages of each technique.
Language Models Need Inductive Biases to Count Inductively
Chang, Yingshan, Bisk, Yonatan
Counting is a fundamental example of generalization, whether viewed through the mathematical lens of Peano's axioms defining the natural numbers or the cognitive science literature for children learning to count. The argument holds for both cases that learning to count means learning to count infinitely. While few papers have tried to distill transformer "reasoning" to the simplest case of counting, investigating length generalization does occur throughout the literature. In the "train short, test long" paradigm of NLP, length refers to the training sentence length. In formal language recognition, length refers to the input sequence length, or the maximum stack size induced by a pushdown automata. In general problem solving, length refers to the number of hops in a deductive reasoning chain or the recursion depth. For all cases, counting is central to task success. And crucially, generalizing counting inductively is central to success on OOD instances. This work provides extensive empirical results on training language models to count. We experiment with architectures ranging from RNNs, Transformers, State-Space Models and RWKV. We present carefully-designed task formats, auxiliary tasks and positional embeddings to avoid limitations in generalization with OOD-position and OOD-vocabulary. We find that while traditional RNNs trivially achieve inductive counting, Transformers have to rely on positional embeddings to count out-of-domain. As counting is the basis for many arguments concerning the expressivity of Transformers, our finding calls for the community to reexamine the application scope of primitive functions defined in formal characterizations. Finally, modern RNNs also largely underperform traditional RNNs in generalizing counting inductively. We discuss how design choices that enable parallelized training of modern RNNs cause them to lose merits of a recurrent nature.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > Middle East > Jordan (0.04)
Revamping Python for an AI World
Python is one of the most popular programming languages in existence. Easy to learn and easy to use, it has been around for years, so there is a large community of Python developers to support each other, and it has built up an ecosystem of libraries that allow users to drop in the functionalities they need. It does, however, come with downsides: its programs tend to run slowly, and because it is inefficient at running processes in parallel, it is not well suited to some of the latest artificial intelligence (AI) programming. Hoping to overcome those difficulties, computer scientist Chris Lattner set out to create a new language, Mojo, which offers the ease of use of Python, but the performance of more complex languages such as C or Rust. He teamed up with Tim Davis, whom he had met when they both worked for Google, to form Modular in January 2022.
- Oceania > Australia > Queensland (0.05)
- North America > United States > Massachusetts > Middlesex County > Lowell (0.05)
- North America > United States > Illinois (0.05)
Modularity Trumps Invariance for Compositional Robustness
Mason, Ian, Sarkar, Anirban, Sasaki, Tomotake, Boix, Xavier
By default neural networks are not robust to changes in data distribution. This has been demonstrated with simple image corruptions, such as blurring or adding noise, degrading image classification performance. Many methods have been proposed to mitigate these issues but for the most part models are evaluated on single corruptions. In reality, visual space is compositional in nature, that is, that as well as robustness to elemental corruptions, robustness to compositions of corruptions is also needed. In this work we develop a compositional image classification task where, given a few elemental corruptions, models are asked to generalize to compositions of these corruptions. That is, to achieve compositional robustness. We experimentally compare empirical risk minimization with an invariance building pairwise contrastive loss and, counter to common intuitions in domain generalization, achieve only marginal improvements in compositional robustness by encouraging invariance. To move beyond invariance, following previously proposed inductive biases that model architectures should reflect data structure, we introduce a modular architecture whose structure replicates the compositional nature of the task. We then show that this modular approach consistently achieves better compositional robustness than non-modular approaches. We additionally find empirical evidence that the degree of invariance between representations of 'in-distribution' elemental corruptions fails to correlate with robustness to 'out-of-distribution' compositions of corruptions.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
A Modular, Adaptive, Deep-Learning-Based Brain-VR Interface
Brain-Computer Interfaces (BCIs) may open up new possibilities for Virtual Reality (VR) applications: BCIs may be used for active brain control of VR avatars, or to make VR content passively-adaptive based on information decoded from ongoing brain activity. Application domains for such Brain-VR Interfaces (BVRI) include medical and healthcare, entertainment, and education. Conversely, VR technology also opens up new possibilities for BCI research and development: E.g., gamified immersive BCI paradigms may improve subject engagement and long-term motivation, helping to study learning and adaptivity in the BCI-control context. Previously, we have demonstrated a first adaptive, deep-learning-based online BCI for the control of robotic assistants. Here, we describe the extension of this setup to a modular, extensible, VR-compatible online BCI setup.
Modular closes $30 seed round to simplify the process of developing AI systems – TechCrunch
But if you ask the co-founders of Modular, a startup emerging from stealth today, the software used to develop it is "monolithic," fractured into silos piled with layers of complexity. Big Tech companies have made helpful contributions, like TensorFlow and PyTorch -- AI development frameworks maintained by Google and Facebook, respectively. Modular aims to change that. Founded by former Apple and Google engineers and execs, the company today closed a large ($30 million) seed round led by GV (formerly Google Ventures), with participation from Greylock, The Factory and SV Angel to realize its vision of a streamlined, platform-agnostic AI system development platform. "The industry is struggling to maintain and scale fragmented, custom toolchains that differ across research and production, training and deployment, server and edge," Modular CEO Chris Lattner told TechCrunch in an email interview.
Modular, self-healing robot swarms are definitely a great idea
Robots are going to have to work together if they want to destroy us, their soft, fallible masters. But the current paradigm of having a Skynet-like (or rather, Zerglike) overmind control a set of semi-autonomous drones is too easy to beat -- take out the brain and the rest fail, right? Not if they're all the brain, which is the idea demonstrated in a wonderful new paper, "Mergeable nervous systems for robots." The admiration of the authors for our shining, pitiless destroyers is evident from the get-go. Robots have the potential to display a higher degree of lifetime morphological adaptation than natural organisms.