Goto

Collaborating Authors

 sch


Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models

Liang, Jingcong, Wang, Siyuan, Tian, Miren, Li, Yitong, Tang, Duyu, Wei, Zhongyu

arXiv.org Artificial Intelligence

Mixture-of-Experts (MoE) enables efficient scaling of large language models (LLMs) with sparsely activated experts during inference. To effectively deploy large MoE models on memory-constrained devices, many systems introduce *expert offloading* that caches a subset of experts in fast memory, leaving others on slow memory to run on CPU or load on demand. While some research has exploited the locality of expert activations, where consecutive tokens activate similar experts, the degree of this **local routing consistency** varies across models and remains understudied. In this paper, we propose two metrics to measure local routing consistency of MoE models: (1) **Segment Routing Best Performance (SRP)**, which evaluates how well a fixed group of experts can cover the needs of a segment of tokens, and (2) **Segment Cache Best Hit Rate (SCH)**, which measures the hit rate of an expert cache utilizing a length of future information under a cache limit. We analyze 20 MoE LLMs with diverse sizes and architectures and use toy models to verify key factors related to local routing consistency. We find a strong trade-off between local routing consistency and *local* load balance, while showing that *global* load balance can coexist with local routing consistency. Meanwhile, settings like shared experts that decrease expert combination space can lead to low local routing consistency. We further reveal that domain-specialized experts contribute more to routing consistency than vocabulary-specialized ones, and that most models balance between cache effectiveness and efficiency with cache sizes approximately twice the active experts. These findings pave the way for memory-efficient MoE design and deployment without compromising inference speed. We publish the code for replicating experiments at https://github.com/ljcleo/moe-lrc .


A Appendix

Neural Information Processing Systems

All CPU experiments are conducted on A WS C5.9xlarge instances with Intel Xeon Platinum 8124M Take TensorCore GPUs as an example. MetaSchedule makes an orthogonal contribution as it is a probabilistic language for composable search space construction rather than speeding up tuning. From frontend frameworks, for example, TensorFlow, PyTorch, or JAX, the tensor program to be optimized is generated from their computational graph. A.7 A vailable Transformations Primitives 17 Transformation Explanation split Split a loop into a sequence of consecutive loops fuse Fuse a sequence of consecutive loops into one reorder Reorder a sequence of loops parallel Parallelize a loop across CPU cores vectorize V ectorize a loop with SIMD unroll Unroll a loop bind Bind a loop to a GPU thread cache-read Create a block that reads a buffer region into a read cache cache-write Create a block that writes a buffer region into a write cache compute-at Move a producer block under the specific loop compute-inline Inline a block into its consumer(s) rfactor Factorize an associative reduction block by the specified loop storage-align Set alignment requirement for specific dimension of a buffer set-scope Set the storage scope of a buffer add-unit-loop Create a new unit loop on top of the specific block re-index



AI-Driven Strategies for Reducing Student Withdrawal -- A Study of EMU Student Stopout

Zhao, Yan, Otteson, Amy

arXiv.org Artificial Intelligence

Not everyone who enrolls in college will leave with a certificate or degree, but the number of people who drop out or take a break is much higher than experts previously believed. In December 2013, there were 29 million people with some college education but no degree. That number jumped to 36 million by December of 2018, according to a new report from the National Student Clearinghouse Research Center[1]. It is imperative to understand the underlying factors contributing to student withdrawal and to assist decision-makers to identify effective strategies to prevent it. By analyzing the characteristics and educational pathways of the stopout student population, our aim is to provide actionable insights that can benefit institutions facing similar challenges. Eastern Michigan University (EMU) faces significant challenges in student retention, with approximately 55% of its undergraduate students not completing their degrees within six years. As an institution committed to student success, EMU conducted a comprehensive study of student withdrawals to understand the influencing factors. And the paper revealed a high correlation between certain factors and withdrawals, even in the early stages of university attendance. Based on these findings, we developed a predictive model that employs artificial intelligence techniques to assess the potential risk that students abandon their studies. These models enable universities to implement early intervention strategies, support at-risk students, and improve overall higher education success.


Tensor Program Optimization with Probabilistic Programs

Shao, Junru, Zhou, Xiyou, Feng, Siyuan, Hou, Bohan, Lai, Ruihang, Jin, Hongyi, Lin, Wuwei, Masuda, Masahiro, Yu, Cody Hao, Chen, Tianqi

arXiv.org Artificial Intelligence

Automatic optimization for tensor programs becomes increasingly important as we deploy deep learning in various environments, and efficient optimization relies on a rich search space and effective search. Most existing efforts adopt a search space which lacks the ability to efficiently enable domain experts to grow the search space. This paper introduces MetaSchedule, a domain-specific probabilistic programming language abstraction to construct a rich search space of tensor programs. Our abstraction allows domain experts to analyze the program, and easily propose stochastic choices in a modular way to compose program transformation accordingly. We also build an end-to-end learning-driven framework to find an optimized program for a given search space. Experimental results show that MetaSchedule can cover the search space used in the state-of-the-art tensor program optimization frameworks in a modular way. Additionally, it empowers domain experts to conveniently grow the search space and modularly enhance the system, which brings 48% speedup on end-to-end deep learning workloads.


Multi-Agent Path Finding Based on Subdimensional Expansion with Bypass

Liu, Qingzhou, Wu, Feng

arXiv.org Artificial Intelligence

Multi-agent path finding (MAPF) is an active area in artificial intelligence, which has many real-world applications such as warehouse management, traffic control, robotics, etc. Recently, M* and its variants have greatly improved the ability to solve the MAPF problem. Although subdimensional expansion used in those approaches significantly decreases the dimensionality of the joint search space and reduces the branching factor, they do not make full use of the possible non-uniqueness of the optimal path of each agent. As a result, the updating of the collision sets may bring a large number of redundant computation. In this paper, the idea of bypass is introduced into subdimensional expansion to reduce the redundant computation. Specifically, we propose the BPM* algorithm, which is an implementation of subdimensional expansion with bypass in M*. In the experiments, we show that BPM* outperforms the state-of-the-art in solving several MAPF benchmark problems.


Extending Sticky-Datalog+/- via Finite-Position Selection Functions: Tractability, Algorithms, and Optimization

Bertossi, Leopoldo, Milani, Mostafa

arXiv.org Artificial Intelligence

Weakly-Sticky(WS) Datalog+/- is an expressive member of the family of Datalog+/- program classes that is defined on the basis of the conditions of stickiness and weak-acyclicity. Conjunctive query answering (QA) over the WS programs has been investigated, and its tractability in data complexity has been established. However, the design and implementation of practical QA algorithms and their optimizations have been open. In order to fill this gap, we first study Sticky and WS programs from the point of view of the behavior of the chase procedure. We extend the stickiness property of the chase to that of generalized stickiness of the chase (GSCh) modulo an oracle that selects (and provides) the predicate positions where finitely values appear during the chase. Stickiness modulo a selection function S that provides only a subset of those positions defines sch(S), a semantic subclass of GSCh. Program classes with selection functions include Sticky and WS, and another syntactic class that we introduce and characterize, namely JWS, of jointly-weakly-sticky programs, which contains WS. The selection functions for these last three classes are computable, and no external, possibly non-computable oracle is needed. We propose a bottom-up QA algorithm for programs in the class sch(S), for a general selection function S. As a particular case, we obtain a polynomial-time QA algorithm for JWS and weakly-sticky programs. Unlike WS, JWS turns out to be closed under magic-sets query optimization. As a consequence, both the generic polynomial-time QA algorithm and its magic-set optimization can be particularized and applied to WS.


Query Expressibility and Verification in Ontology-Based Data Access

Lutz, Carsten, Marti, Johannes, Sabellek, Leif

arXiv.org Artificial Intelligence

In ontology-based data access, multiple data sources are integrated using an ontology and mappings. In practice, this is often achieved by a bootstrapping process, that is, the ontology and mappings are first designed to support only the most important queries over the sources and then gradually extended to enable additional queries. In this paper, we study two reasoning problems that support such an approach. The expressibility problem asks whether a given source query $q_s$ is expressible as a target query (that is, over the ontology's vocabulary) and the verification problem asks, additionally given a candidate target query $q_t$, whether $q_t$ expresses $q_s$. We consider (U)CQs as source and target queries and GAV mappings, showing that both problems are $\Pi^p_2$-complete in DL-Lite, coNExpTime-complete between EL and ELHI when source queries are rooted, and 2ExpTime-complete for unrestricted source queries.


#275: Presented work at IROS 2018 (Part 2 of 3), with Robert Lösch, Ali Marjovi and Sophia Sakr

Robohub

In this episode, Audrow Nash interviews Robert Lösch, Ali Marjovi, and Sophia Sakr about the work they presented at the 2018 International Conference on Intelligent Robots and Systems (IROS) in Madrid, Spain. Robert Lösch is a PhD Student at Technische Universität Bergakademie Freiberg (TU Freiberg) in Germany, and he speaks on an approach to have robots navigate mining environments. Ali Marjovi is a Post doc at the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, and he speaks about on how robots could be used to localize odors, which could be useful for finding explosives or for search-and-rescue. Marjovi discusses how odor localization works, his experimental setup, the challenges of odor localization, and on giving robots a sense of smell. Sophia Sakr, from Institut des Systèmes Intelligents et de Robotique (ISIR) in France, speaks about a haptic pair of tweezers (designed by Thomas Daunizeau).


skychain (SCH) - ICO rating and details

#artificialintelligence

Skychain is an infrastructure blockchain project aimed to host, train and use artificial neural networks (ANNs) by market participants. First years of Skychain development will be devoted only to medicine to help doctors and patients have accurate diagnoses using this system. Skychain is a "sharing economy" project, it means that each member of the Skychain ecosystem will provide his resources and thus create a product that is ahead of any competitors. In its turn, the system will reward each participant with high benefits. Skychain is a project that will "uberize" artificial neural networks, but with developers of individual ANNs instead of taxi drivers, consumers of ANNs (doctors and patients) instead of passengers and computers and server of miners instead of cars.