cutset
LayerPipe2: Multistage Pipelining and Weight Recompute via Improved Exponential Moving Average for Training Neural Networks
Unnikrishnan, Nanda K., Parhi, Keshab K.
In our prior work, LayerPipe, we had introduced an approach to accelerate training of convolutional, fully connected, and spiking neural networks by overlapping forward and backward computation. However, despite empirical success, a principled understanding of how much gradient delay needs to be introduced at each layer to achieve desired level of pipelining was not addressed. This paper, LayerPipe2, fills that gap by formally deriving LayerPipe using variable delayed gradient adaptation and retiming. We identify where delays may be legally inserted and show that the required amount of delay follows directly from the network structure where inner layers require fewer delays and outer layers require longer delays. When pipelining is applied at every layer, the amount of delay depends only on the number of remaining downstream stages. When layers are pipelined in groups, all layers in the group share the same assignment of delays. These insights not only explain previously observed scheduling patterns but also expose an often overlooked challenge that pipelining implicitly requires storage of historical weights. We overcome this storage bottleneck by developing a pipeline--aware moving average that reconstructs the required past states rather than storing them explicitly. This reduces memory cost without sacrificing the accuracy guarantees that makes pipelined learning viable. The result is a principled framework that illustrates how to construct LayerPipe architectures, predicts their delay requirements, and mitigates their storage burden, thereby enabling scalable pipelined training with controlled communication computation tradeoffs.
First-Order Decomposition Trees
Exact lifted inference methods, like their propositional counterparts, work by recursively decomposing the model and the problem. In the propositional case, there exist formal structures, such as decomposition trees (dtrees), that represent such a decomposition and allow us to determine the complexity of inference a priori. However, there is currently no equivalent structure nor analogous complexity results for lifted inference. In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the first-order level. We show how these trees can characterize a lifted inference solution for a probabilistic logical model (in terms of a sequence of lifted operations), and make a theoretical analysis of the complexity of lifted inference in terms of the novel notion of lifted width for the tree.
Cutsets and EF1 Fair Division of Graphs
Chen, Jiehua, Zwicker, William S.
In fair division of a connected graph $G = (V, E)$, each of $n$ agents receives a share of $G$'s vertex set $V$. These shares partition $V$, with each share required to induce a connected subgraph. Agents use their own valuation functions to determine the non-negative numerical values of the shares, which determine whether the allocation is fair in some specified sense. We introduce forbidden substructures called graph cutsets, which block divisions that are fair in the EF1 (envy-free up to one item) sense by cutting the graph into "too many pieces". Two parameters - gap and valence - determine blocked values of $n$. If $G$ guarantees connected EF1 allocations for $n$ agents with valuations that are CA (common and additive), then $G$ contains no elementary cutset of gap $k \ge 2$ and valence in the interval $\[n - k + 1, n - 1\]$. If $G$ guarantees connected EF1 allocations for $n$ agents with valuations in the broader CM (common and monotone) class, then $G$ contains no cutset of gap $k \ge 2$ and valence in the interval $\[n - k + 1, n - 1\]$. These results rule out the existence of connected EF1 allocations in a variety of situations. For some graphs $G$ we can, with help from some new positive results, pin down $G$'s spectrum - the list of exactly which values of $n$ do/do not guarantee connected EF1 allocations. Examples suggest a conjectured common spectral pattern for all graphs. Further, we show that it is NP-hard to determine whether a graph admits a cutset. We also provide an example of a (non-traceable) graph on eight vertices that has no cutsets of gap $\ge 2$ at all, yet fails to guarantee connected EF1 allocations for three agents with CA preferences.
Improving the filtering of Branch-And-Bound MDD solver (extended)
Gillard, Xavier, Coppรฉ, Vianney, Schaus, Pierre, Cire, Andrรฉ Augusto
This paper presents and evaluates two pruning techniques to reinforce the efficiency of constraint optimization solvers based on multi-valued decision-diagrams (MDD). It adopts the branch-and-bound framework proposed by Bergman et al. in 2016 to solve dynamic programs to optimality. In particular, our paper presents and evaluates the effectiveness of the local-bound (LocB) and rough upper-bound pruning (RUB). LocB is a new and effective rule that leverages the approximate MDD structure to avoid the exploration of non-interesting nodes. RUB is a rule to reduce the search space during the development of bounded-width-MDDs. The experimental study we conducted on the Maximum Independent Set Problem (MISP), Maximum Cut Problem (MCP), Maximum 2 Satisfiability (MAX2SAT) and the Traveling Salesman Problem with Time Windows (TSPTW) shows evidence indicating that rough-upper-bound and local-bound pruning have a high impact on optimization solvers based on branch-and-bound with MDDs. In particular, it shows that RUB delivers excellent results but requires some effort when defining the model. Also, it shows that LocB provides a significant improvement automatically; without necessitating any user-supplied information. Finally, it also shows that rough-upper-bound and local-bound pruning are not mutually exclusive, and their combined benefit supersedes the individual benefit of using each technique.
Bounded Conditioning: Flexible Inference for Decisions under Scarce Resources
Horvitz, Eric J., Suermondt, Jaap, Cooper, Gregory F.
We introduce a graceful approach to probabilistic inference called bounded conditioning. Bounded conditioning monotonically refines the bounds on posterior probabilities in a belief network with computation, and converges on final probabilities of interest with the allocation of a complete resource fraction. The approach allows a reasoner to exchange arbitrary quantities of computational resource for incremental gains in inference quality. As such, bounded conditioning holds promise as a useful inference technique for reasoning under the general conditions of uncertain and varying reasoning resources. The algorithm solves a probabilistic bounding problem in complex belief networks by breaking the problem into a set of mutually exclusive, tractable subproblems and ordering their solution by the expected effect that each subproblem will have on the final answer. We introduce the algorithm, discuss its characterization, and present its performance on several belief networks, including a complex model for reasoning about problems in intensive-care medicine.
On Heuristics for Finding Loop Cutsets in Multiply-Connected Belief Networks
We introduce a new heuristic algorithm for the problem of finding minimum size loop cutsets in multiply connected belief networks. We compare this algorithm to that proposed in [Suemmondt and Cooper, 1988]. We provide lower bounds on the performance of these algorithms with respect to one another and with respect to optimal. We demonstrate that no heuristic algorithm for this problem cam be guaranteed to produce loop cutsets within a constant difference from optimal. We discuss experimental results based on randomly generated networks, and discuss future work and open questions.
An Evaluation of Structural Parameters for Probabilistic Reasoning: Results on Benchmark Circuits
Fattah, Yousri El, Dechter, Rina
Many algorithms for processing probabilistic networks are dependent on the topological properties of the problem's structure. Such algorithms (e.g., clustering, conditioning) are effective only if the problem has a sparse graph captured by parameters such as tree width and cycle-cut set size. In this paper we initiate a study to determine the potential of structure-based algorithms in real-life applications. We analyze empirically the structural properties of problems coming from the circuit diagnosis domain. Specifically, we locate those properties that capture the effectiveness of clustering and conditioning as well as of a family of conditioning+clustering algorithms designed to gradually trade space for time. We perform our analysis on 11 benchmark circuits widely used in the testing community. We also report on the effect of ordering heuristics on tree-clustering and show that, on our benchmarks, the well-known max-cardinality ordering is substantially inferior to an ordering called min-degree.
Context-Specific Independence in Bayesian Networks
Boutilier, Craig, Friedman, Nir, Goldszmidt, Moises, Koller, Daphne
Bayesian networks provide a language for qualitatively representing the conditional independence properties of a distribution. This allows a natural and compact representation of the distribution, eases knowledge acquisition, and supports effective inference algorithms. It is well-known, however, that there are certain independencies that we cannot capture qualitatively within the Bayesian network structure: independencies that hold only in certain contexts, i.e., given a specific assignment of values to certain variables. In this paper, we propose a formal notion of context-specific independence (CSI), based on regularities in the conditional probability tables (CPTs) at a node. We present a technique, analogous to (and based on) d-separation, for determining when such independence holds in a given network. We then focus on a particular qualitative representation scheme - tree-structured CPTs - for capturing CSI. We suggest ways in which this representation can be used to support effective inference algorithms. In particular, we present a structural decomposition of the resulting network which can improve the performance of clustering algorithms, and an alternative algorithm based on cutset conditioning.
New Advances in Inference by Recursive Conditioning
Recursive Conditioning (RC) was introduced recently as the first any-space algorithm for inference in Bayesian networks which can trade time for space by varying the size of its cache at the increment needed to store a floating point number. Under full caching, RC has an asymptotic time and space complexity which is comparable to mainstream algorithms based on variable elimination and clustering (exponential in the network treewidth and linear in its size). We show two main results about RC in this paper. First, we show that its actual space requirements under full caching are much more modest than those needed by mainstream methods and study the implications of this finding. Second, we show that RC can effectively deal with determinism in Bayesian networks by employing standard logical techniques, such as unit resolution, allowing a significant reduction in its time requirements in certain cases. We illustrate our results using a number of benchmark networks, including the very challenging ones that arise in genetic linkage analysis.
An Empirical Study of w-Cutset Sampling for Bayesian Networks
Bidyuk, Bozhena, Dechter, Rina
The paper studies empirically the time-space trade-off between sampling and inference in a sl cutset sampling algorithm. The algorithm samples over a subset of nodes in a Bayesian network and applies exact inference over the rest. Consequently, while the size of the sampling space decreases, requiring less samples for convergence, the time for generating each single sample increases. The w-cutset sampling selects a sampling set such that the induced-width of the network when the sampling set is observed is bounded by w, thus requiring inference whose complexity is exponential in w. In this paper, we investigate performance of w-cutset sampling over a range of w values and measure the accuracy of w-cutset sampling as a function of w. Our experiments demonstrate that the cutset sampling idea is quite powerful showing that an optimal balance between inference and sampling benefits substantially from restricting the cutset size, even at the cost of more complex inference.