Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes