Learning the Effects of Physical Actions in a Multi-modal Environment