Not enough data to create a plot.
Try a different view from the menu above.
Supplementary Material for CrossGNN: Confronting Noisy Multivariate Time Series Via Cross Interaction Refinement Anonymous Author(s) Affiliation Address email
We conduct extensive experiments on 8 real-world datasets following [4]. Correlation mechanism to capture cross-time dependency for forecasting. Besides, the dimension of the channel is set to 16 based on efficiency considerations. The first row shows the performance when the prediction horizon is 96, while the second row shows the performance when the prediction horizon is 336. Figure 3: The MSE (left Y-axis) and MAE results (right Y-axis) of CrossGNN with different number of scales (X-axis) on ETTh2, ETTm2, Traffic, and Weather. Figure 4: The MSE (left Y-axis) and MAE results (right Y-axis) of CrossGNN with different K (X-axis) on ETTh2, ETTm2, Traffic, and Weather.
NeuralSteiner: Learning Steiner Tree for Overflow-avoiding Global Routing in Chip Design
Global routing plays a critical role in modern chip design. The routing paths generated by global routers often form a rectilinear Steiner tree (RST). Recent advances from the machine learning community have shown the power of learning-based route generation; however, the yielded routing paths by the existing approaches often suffer from considerable overflow, thus greatly hindering their application in practice. We propose NeuralSteiner, an accurate approach to overflow-avoiding global routing in chip design. The key idea of NeuralSteiner approach is to learn Steiner trees: we first predict the locations of highly likely Steiner points by adopting a neural network considering full-net spatial and overflow information, then select appropriate points by running a graph-based post-processing algorithm, and finally connect these points with the input pins to yield overflow-avoiding RSTs. NeuralSteiner offers two advantages over previous learning-based models. First, by using the learning scheme, NeuralSteiner ensures the connectivity of generated routes while significantly reducing congestion. Second, NeuralSteiner can effectively scale to large nets and transfer to unseen chip designs without any modifications or fine-tuning. Extensive experiments over public large-scale benchmarks reveal that, compared with the state-of-the-art deep generative methods, NeuralSteiner achieves up to a 99.8% reduction in overflow while speeding up the generation and maintaining a slight wirelength loss within only 1.8%.
AverNet: All-in-one Video Restoration for Time-varying Unknown Degradations
Traditional video restoration approaches were designed to recover clean videos from a specific type of degradation, making them ineffective in handling multiple unknown types of degradation. To address this issue, several studies have been conducted and have shown promising results. However, these studies overlook that the degradations in video usually change over time, dubbed time-varying unknown degradations (TUD). To tackle such a less-touched challenge, we propose an innovative method, termed as All-in-one VidEo Restoration Network (Aver-Net), which comprises two core modules, i.e., Prompt-Guided Alignment (PGA) module and Prompt-Conditioned Enhancement (PCE) module. Specifically, PGA addresses the issue of pixel shifts caused by time-varying degradations by learning and utilizing prompts to align video frames at the pixel level. To handle multiple unknown degradations, PCE recasts it into a conditional restoration problem by implicitly establishing a conditional map between degradations and ground truths. Thanks to the collaboration between PGA and PCE modules, AverNet empirically demonstrates its effectiveness in recovering videos from TUD. Extensive experiments are carried out on two synthesized datasets featuring seven types of degradations with random corruption levels.
Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders
The ability to answer causal questions is crucial in many domains, as causal inference allows one to understand the impact of interventions. In many applications, only a single intervention is possible at a given time. However, in some important areas, multiple interventions are concurrently applied. Disentangling the effects of single interventions from jointly applied interventions is a challenging task-- especially as simultaneously applied interventions can interact. This problem is made harder still by unobserved confounders, which influence both treatments and outcome.
Learning Group Actions on Latent Representations
In this work, we introduce a new approach to model group actions in autoencoders. Diverging from prior research in this domain, we propose to learn the group actions on the latent space rather than strictly on the data space. This adaptation enhances the versatility of our model, enabling it to learn a broader range of scenarios prevalent in the real world, where groups can act on latent factors. Our method allows a wide flexibility in the encoder and decoder architectures and does not require group-specific layers. In addition, we show that our model theoretically serves as a superset of methods that learn group actions on the data space. We test our approach on five image datasets with diverse groups acting on them and demonstrate superior performance to recently proposed methods for modeling group actions.
A Transfer and finetuning details
Few-shot evaluation We use the linear adaptation protocol and evaluation sets from [68, 70], reporting the 10-shot classification accuracy. For every combination of data set and model we run the 10-shot adaptation three times and report the mean (and standard deviation for key results). LiT decoder and T5 decoder To train a multi-task decoder from scratch on top of the frozen representation for classification, captioning and VQA, we precisely follow the setup and hyper parameters from [2] except for the data mixing strategy, for which we set to "concat image-question pairs" ([2, Sec. For all encoders, we use the full feature sequence before pooling (including the class token for the evaluation of CLIP). Throughout, we rely on a B-sized transformer decoder [60] with 12 layers.
Image Captioners Are Scalable Vision Learners Too Michael Tschannen, Andreas Steiner Xiaohua Zhai Neil Houlsby Lucas Beyer
Contrastive pretraining on image-text pairs from the web is one of the most popular large-scale pretraining strategies for vision backbones, especially in the context of large multimodal models. At the same time, image captioning on this type of data is commonly considered an inferior pretraining strategy. In this paper, we perform a fair comparison of these two pretraining strategies, carefully matching training data, compute, and model capacity. Using a standard encoder-decoder transformer, we find that captioning alone is surprisingly effective: on classification tasks, captioning produces vision encoders competitive with contrastively pretrained encoders, while surpassing them on vision & language tasks. We further analyze the effect of the model architecture and scale, as well as the pretraining data on the representation quality, and find that captioning exhibits the same or better scaling behavior along these axes. Overall our results show that plain image captioning is a more powerful pretraining strategy than was previously believed.