sequential data
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India (0.04)
- Europe > Finland (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Virginia (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.68)
- Education (0.68)
the four main areas of criticism below (reviewers referred to as R1-5)
We first thank the reviewers for their insightful comments which we have taken into careful consideration. If our work were to be evaluated using only performance metrics, this criticism would be fair. Learning paradigms for networks of'convex layers' have been shown to be effective (e.g. The key advance over standard SCNs is that we show how to perform non-linear computations in these systems. Standard SCNs such as in Boerlin et al (2013) are restricted to linear computations. It may seem surprising, but such layers are actually not well understood!
- Asia > China > Beijing > Beijing (0.05)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Appendix 1
Pi,jCi,j γH(P) subjectto P Rt t+,PT1t =1t,P1t =1t, (6) where Pi,j is the transport plan andCi,j is the ground metric that measures the distance between point i in the source andj in the target. This will induce some smoothness and wiggle room in the solutionofourobjective. To increase the diversity of the observed trajectories, we inject Gaussian noise (σ = 0.05) into trajectories by perturbing the initial velocities. Since two-body systems are non-chaotic systems, we divide training set and testing set such that for training set[mmin,mmax] = [0.8,1.2], while testing set[mmin,mmax] = [0.9,1.3] to create domain distribution shifting. The initial velocities of all bodies are based on their initial positions by rotating itby 90 andscalingitbyr1.5.
PCF-GAN: generating sequential data via the characteristic function of measures on the path space
Generating high-fidelity time series data using generative adversarial networks (GANs) remains a challenging task, as it is difficult to capture the temporal dependence of joint probability distributions induced by time-series data. Towards this goal, a key step is the development of an effective discriminator to distinguish between time series distributions. We propose the so-called PCF-GAN, a novel GAN that incorporates the path characteristic function (PCF) as the principled representation of time series distribution into the discriminator to enhance its generative performance. On the one hand, we establish theoretical foundations of the PCF distance by proving its characteristicity, boundedness, differentiability with respect to generator parameters, and weak continuity, which ensure the stability and feasibility of training the PCF-GAN. On the other hand, we design efficient initialisation and optimisation schemes for PCFs to strengthen the discriminative power and accelerate training efficiency. To further boost the capabilities of complex time series generation, we integrate the auto-encoder structure via sequential embedding into the PCF-GAN, which provides additional reconstruction functionality. Extensive numerical experiments on various datasets demonstrate the consistently superior performance of PCF-GAN over state-of-the-art baselines, in both generation and reconstruction quality.
Multi-Resolution Weak Supervision for Sequential Data
Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in weak supervision is estimating the unknown accuracies and correlations of these sources without using labeled data. Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence. We propose Dugong, the first framework to model multi-resolution weak supervision sources with complex correlations to assign probabilistic labels to training data. Theoretically, we prove that Dugong, under mild conditions, can uniquely recover the unobserved accuracy and correlation parameters and use parameter sharing to improve sample complexity. Our method assigns clinician-validated labels to population-scale biomedical video repositories, helping outperform traditional supervision by 36.8 F1 points and addressing a key use case where machine learning has been severely limited by the lack of expert labeled data. On average, Dugong improves over traditional supervision by 16.0 F1 points and existing weak supervision approaches by 24.2 F1 points across several video and sensor classification tasks.