An environment representation (ER) is a substantial part of every autonomous system. It introduces a common interface between perception and other system components, such as decision making, and allows downstream algorithms to deal with abstracted data without knowledge of the used sensor. In this work, we propose and evaluate a novel architecture that generates an egocentric, grid-based, predictive, and semantically-interpretable ER. In particular, we provide a proof of concept for the spatio-temporal fusion of multiple camera sequences and short-term prediction in such an ER. Our design utilizes a strong semantic segmentation network together with depth and egomotion estimates to first extract semantic information from multiple camera streams and then transform these separately into egocentric temporally-aligned bird's-eye view grids. A deep encoder-decoder network is trained to fuse a stack of these grids into a unified semantic grid representation and to predict the dynamics of its surrounding. We evaluate this representation on real-world sequences of the Cityscapes dataset and show that our architecture can make accurate predictions in complex sensor fusion scenarios and significantly outperforms a model-driven baseline in a category-based evaluation.
The Bayesian paradigm provides a natural and effective means of exploiting priorknowledge concerning the time-frequency structure of sound signals such as speech and music--something which has often been overlooked intraditional audio signal processing approaches. Here, after constructing aBayesian model and prior distributions capable of taking into account the time-frequency characteristics of typical audio waveforms, we apply Markov chain Monte Carlo methods in order to sample from the resultant posterior distribution of interest. We present speech enhancement resultswhich compare favourably in objective terms with standard time-varying filtering techniques (and in several cases yield superior performance, bothobjectively and subjectively); moreover, in contrast to such methods, our results are obtained without an assumption of prior knowledge of the noise power.
These methods have generally been successful at processing simple declarative sentences, but are less suited to process other kinds of - or more complex - sentence structures. Among the main problems are the difficulty of coping with different word orders and the fact that the burden of untangling complex structures falls on individual semantic information, rather than general syntactic rules, thus increasing the amount of specification needed for the vocabulary of a given domain. Thus, other authors, such as Heidom (1972), Sowa and Way (1986) and Boguraev and Sparck Jones (1983) have preferred to add a semantic component to a syntactic parser. However, purely semantic parsers have other advantages, notably the potential for greater robustness, which justify their further study. This paper presents a new method for semantic parsing called "dual frames", which attempts to solve or lessen the above problems.
In many signal processing problems, it may be fruitful to represent the signal under study in a frame. If a probabilistic approach is adopted, it becomes then necessary to estimate the hyper-parameters characterizing the probability distribution of the frame coefficients. This problem is difficult since in general the frame synthesis operator is not bijective. Consequently, the frame coefficients are not directly observable. This paper introduces a hierarchical Bayesian model for frame representation. The posterior distribution of the frame coefficients and model hyper-parameters is derived. Hybrid Markov Chain Monte Carlo algorithms are subsequently proposed to sample from this posterior distribution. The generated samples are then exploited to estimate the hyper-parameters and the frame coefficients of the target signal. Validation experiments show that the proposed algorithms provide an accurate estimation of the frame coefficients and hyper-parameters. Application to practical problems of image denoising show the impact of the resulting Bayesian estimation on the recovered signal quality.
Real-world planners must be able to temporally project their external actions internally. Typically, this has been done entirely at an abstract representational level. We investigate an alternative approach, performing projections on a concrete, property-based representation, and rederiving the abstract level from it. We show that this approach can greatly alleviate the frame problem in certain types of spatial domains, while maintaining the advantages of planning at an abstract level. We present a comparison of the approaches in various object-manipulation domains, including the results of a re-implementation of the Robo-Soar system [Laird et al., 1989].