Reasoning about Actions over Visual and Linguistic Modalities: A Survey