Not enough data to create a plot.
Try a different view from the menu above.
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation
In speech separation, time-domain approaches have successfully replaced the time-frequency domain with latent sequence feature from a learnable encoder. Conventionally, the feature is separated into speaker-specific ones at the final stage of the network. Instead, we propose a more intuitive strategy that separates features earlier by expanding the feature sequence to the number of speakers as an extra dimension. To achieve this, an asymmetric strategy is presented in which the encoder and decoder are partitioned to perform distinct processing in separation tasks. The encoder analyzes features, and the output of the encoder is split into the number of speakers to be separated.
Outlier-Robust Distributionally Robust Optimization via Unbalanced Optimal Transport
Distributionally Robust Optimization (DRO) accounts for uncertainty in data distributions by optimizing the model performance against the worst possible distribution within an ambiguity set. In this paper, we propose a DRO framework that relies on a new distance inspired by Unbalanced Optimal Transport (UOT). The proposed UOT distance employs a soft penalization term instead of hard constraints, enabling the construction of an ambiguity set that is more resilient to outliers. Under smoothness conditions, we establish strong duality of the proposed DRO problem. Moreover, we introduce a computationally efficient Lagrangian penalty formulation for which we show that strong duality also holds. Finally, we provide empirical results that demonstrate that our method offers improved robustness to outliers and is computationally less demanding for regression and classification tasks.
A Proofs
Although we do not allow bias in the output neuron, the additive term B can be implemented by adding a hidden neuron with fan-in 0 and bias 1, that is connected to the output neuron with weight B. Note that E We will prove the following two lemmas: Lemma A.1. Then, combining Lemmas A.1 and A.2 with Eq. 1 and 2, we have ˆN f A.1.1 Proof of Lemma A.1 We start with an intuitive explanation, and then turn to the formal proof. We show that for each step, w.h.p., the change in N Since there are only poly(d) intervals and intervals with large derivatives are small, then by using the fact that µ has an almost-bounded conditional density, we are able to show that w.h.p. the interval between x We show that w.h.p. we obtain g Proof of Lemma A.2 The network ˆN consists of three parts. First, it transforms with high probability the input x to a binary representation of x. Then, it simulates N( x) by using arithmetic operations on binary vectors.
Neural Networks with Small Weights and Depth-Separation Barriers
In studying the expressiveness of neural networks, an important question is whether there are functions which can only be approximated by sufficiently deep networks, assuming their size is bounded. However, for constant depths, existing results are limited to depths 2 and 3, and achieving results for higher depths has been an important open question. In this paper, we focus on feedforward ReLU networks, and prove fundamental barriers to proving such results beyond depth 4, by reduction to open problems and natural-proof barriers in circuit complexity. To show this, we study a seemingly unrelated problem of independent interest: Namely, whether there are polynomially-bounded functions which require super-polynomial weights in order to approximate with constant-depth neural networks. We provide a negative and constructive answer to that question, by showing that if a function can be approximated by a polynomially-sized, constant depth k network with arbitrarily large weights, it can also be approximated by a polynomially-sized, depth 3k + 3 network, whose weights are polynomially bounded.