Reviews: The streaming rollout of deep networks - towards fully model-parallel execution

Neural Information Processing Systems 

A main motivation is to increase the efficiency (e.g., response time) of the network during training/inference. A rollout is a graph that captures the functional dependency of network nodes over time. The authors argue that there are different possible rollouts that have different quality (e.g., response time), introduce mathematical definitions to describe rollouts (e.g., validity / model-parallelizable) and analyze rollouts theoretically and experimentally. In my understanding, the actual conclusion of the paper seems to be: the streaming ("R \equiv 1") is the best, e.g., Theorem in L192 states that the streaming rollout achieves the lowest response time over the entire graph. The experiments seem to support that conclusion. Note that how to obtain the streaming rollout is not clearly stated by the authors, although the Thm in L192 seems to suggest a working rule for obtaining it. Pro: Originality/Significance: - I'm not aware of earlier work that analyzes this low-level implementation issue, but it is worthwhile to analyze this for optimization purposes.