Not enough data to create a plot.
Try a different view from the menu above.
To all reviewers: on the limitation of problem setting: Gaussian inputs and non-overlapping filters
Our current result is for Gaussian inputs, and single hidden layer non-overlapping CNNs. "it would be much better to write down the proof of the claim in line 209... are required. We will add the derivation in line 209. "the proof indicates that ฮท "where and how the assumptions in Theorem 4.3 are used and to explain what the conditions mean." Thank you for pointing out these issues.
SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection 1,2 Gang Li
Detection of face forgery videos remains a formidable challenge in the field of digital forensics, especially the generalization to unseen datasets and common perturbations. In this paper, we tackle this issue by leveraging the synergy between audio and visual speech elements, embarking on a novel approach through audiovisual speech representation learning. Our work is motivated by the finding that audio signals, enriched with speech content, can provide precise information effectively reflecting facial movements. To this end, we first learn precise audio-visual speech representations on real videos via a self-supervised masked prediction task, which encodes both local and global semantic information simultaneously. Then, the derived model is directly transferred to the forgery detection task. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods in terms of cross-dataset generalization and robustness, without the participation of any fake video in model training. The code is available here.
PointAD: Comprehending 3D Anomalies from Points and Pixels for Zero-shot 3D Anomaly Detection
Zero-shot (ZS) 3D anomaly detection is a crucial yet unexplored field that addresses scenarios where target 3D training samples are unavailable due to practical concerns like privacy protection. This paper introduces PointAD, a novel approach that transfers the strong generalization capabilities of CLIP for recognizing 3D anomalies on unseen objects. PointAD provides a unified framework to comprehend 3D anomalies from both points and pixels.
3ab6be46e1d6b21d59a3c3a0b9d0f6ef-AuthorFeedback.pdf
R1: "I am not entirely convinced that an amortized explanation model is a reasonable thing. R2: "I would appreciate some clarification about what is gained by learning ร and not just reporting ฮฉ directly." We thank R1, R2 and R3 for their insightful feedback. However, ฮฉ can only be computed given ground truth labels. R1: "(The objective) does not attribute importance to features that change how the model goes wrong (...)" R1: "Why is the rank-based method necessary?" R2: "Additionally, can the authors clarify what is being averaged in the definition of the causal objective?" The causal objective is averaged over all N samples in the dataset. Every data point has an ฮฉ. R2: "If the goal is to determine what might happen to our predictions if we change a particular feature slightly Our goal is not to estimate what would happen if a particular feature's value changed, but to provide a causal explanation R2: "Some additional clarity on why the authors are using a KL discrepancy is merited. R3: "Masking one by one; this is essentially equivalent to assuming that feature contributions are additive." We do not define a feature's importance as its additive contribution to the model output, but as it's marginal reduction This subtle change in definition allows us to efficiently compute feature importance one by one. R3: "Replacing a masked value by a point-wise estimation can be very bad, especially when the classifiers output Why would the average value (or, even worse, zero) be meaningful?" We will clarify this point in the next revision. R3: "It would also be interesting to compare the proposed method with causal inference technique for SEMs." Recent work [29] has explored the use of SEMs for model attribution in deep learning. CXPlain can explain any machine-learning model, and (ii) attribution time was considerably slower than CXPlain. R3: "It seems to me that the chosen performance measure may correlate much more with the Granger-causal loss
Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory
Counterfactual explanations provide ways of achieving a favorable model outcome with minimum input perturbation. However, counterfactual explanations can also be leveraged to reconstruct the model by strategically training a surrogate model to give similar predictions as the original (target) model. In this work, we analyze how model reconstruction using counterfactuals can be improved by further leveraging the fact that the counterfactuals also lie quite close to the decision boundary. Our main contribution is to derive novel theoretical relationships between the error in model reconstruction and the number of counterfactual queries required using polytope theory. Our theoretical analysis leads us to propose a strategy for model reconstruction that we call Counterfactual Clamping Attack (CCA) which trains a surrogate model using a unique loss function that treats counterfactuals differently than ordinary instances. Our approach also alleviates the related problem of decision boundary shift that arises in existing model reconstruction approaches when counterfactuals are treated as ordinary instances. Experimental results demonstrate that our strategy improves fidelity between the target and surrogate model predictions on several datasets.
Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers
Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James L. Sharpnack
In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastic shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.
One Epoch
Figure 1: Projecting 50-dimensional embeddings obtained by training a simple neural network without SSE (Left), and with SSE-Graph (Center), SSE-SE (Right) into 3D space using PCA. Table 1: The experimental results of BERT-NER with SSE-SE and without SSE-SE when we only allow fine-tuning BERT for one epoch and two epochs. The same set of hyper-parameters are used. We thank the reviewers for their insightful feedback. In the following, we address their concerns and questions.
A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding Inria
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems, and whose architectures are derived from an optimization algorithm. We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions. This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end. The priors used in this presentation include variants of total variation, Laplacian regularization, bilateral filtering, sparse coding on learned dictionaries, and non-local self similarities. Our models are fully interpretable as well as parameter and data efficient. Our experiments demonstrate their effectiveness on a large diversity of tasks ranging from image denoising and compressed sensing for fMRI to dense stereo matching.
Sample Complexity of Posted Pricing for a Single Item Billy Jin Thomas Kesselheim Will Ma
Selling a single item to n self-interested buyers is a fundamental problem in economics, where the two objectives typically considered are welfare maximization and revenue maximization. Since the optimal mechanisms are often impractical and do not work for sequential buyers, posted pricing mechanisms, where fixed prices are set for the item for different buyers, have emerged as a practical and effective alternative. This paper investigates how many samples are needed from buyers' value distributions to find near-optimal posted prices, considering both independent and correlated buyer distributions, and welfare versus revenue maximization. We obtain matching upper and lower bounds (up to logarithmic factors) on the sample complexity for all these settings.