Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach