Goto

Collaborating Authors

 Country


Permutohedral-GCN: Graph Convolutional Networks with Global Attention

arXiv.org Artificial Intelligence

Graph convolutional networks (GCNs) update a node's feature vector by aggregating features from its neighbors in the graph. This ignores potentially useful contributions from distant nodes. Identifying such useful distant contributions is challenging due to scalability issues (too many nodes can potentially contribute) and oversmoothing (aggregating features from too many nodes risks swamping out relevant information and may result in nodes having different labels but indistinguishable features). We introduce a global attention mechanism where a node can selectively attend to, and aggregate features from, any other node in the graph. The attention coefficients depend on the Euclidean distance between learnable node embeddings, and we show that the resulting attention-based global aggregation scheme is analogous to high-dimensional Gaussian filtering. This makes it possible to use efficient approximate Gaussian filtering techniques to implement our attention-based global aggregation scheme. By employing an approximate filtering method based on the permutohedral lattice, the time complexity of our proposed global aggregation scheme only grows linearly with the number of nodes. The resulting GCNs, which we term permutohedral-GCNs, are differentiable and trained end-to-end, and they achieve state of the art performance on several node classification benchmarks.


Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets

arXiv.org Artificial Intelligence

Deep neural nets (DNNs) compression is crucial for adaptation to mobile devices. Though many successful algorithms exist to compress naturally trained DNNs, developing efficient and stable compression algorithms for robustly trained DNNs remains widely open. In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep learning. Such a co-design enables us to advance the goal of accommodating both sparsity and robustness. With this objective in mind, we leverage the relaxed augmented Lagrangian based algorithms to prune the weights of adversarially trained DNNs, at both structured and unstructured levels. Using a Feynman-Kac formalism principled robust and sparse DNNs, we can at least double the channel sparsity of the adversarially trained ResNet20 for CIFAR10 classification, meanwhile, improve the natural accuracy by $8.69$\% and the robust accuracy under the benchmark $20$ iterations of IFGSM attack by $5.42$\%. The code is available at \url{https://github.com/BaoWangMath/rvsm-rgsm-admm}.


Differentially Private Deep Learning with Smooth Sensitivity

arXiv.org Artificial Intelligence

Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice. One approach to study these concerns is through the lens of differential privacy. In this framework, privacy guarantees are generally obtained by perturbing models in such a way that specifics of data used to train the model are made ambiguous. A particular instance of this approach is through a "teacher-student" framework, wherein the teacher, who owns the sensitive data, provides the student with useful, but noisy, information, hopefully allowing the student model to perform well on a given task without access to particular features of the sensitive data. Because stronger privacy guarantees generally involve more significant perturbation on the part of the teacher, deploying existing frameworks fundamentally involves a trade-off between student's performance and privacy guarantee. One of the most important techniques used in previous works involves an ensemble of teacher models, which return information to a student based on a noisy voting procedure. In this work, we propose a novel voting mechanism with smooth sensitivity, which we call Immutable Noisy ArgMax, that, under certain conditions, can bear very large random noising from the teacher without affecting the useful information transferred to the student. Compared with previous work, our approach improves over the state-of-the-art methods on all measures, and scale to larger tasks with both better performance and stronger privacy ($\epsilon \approx 0$). This new proposed framework can be applied with any machine learning models, and provides an appealing solution for tasks that requires training on a large amount of data.


MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships

arXiv.org Artificial Intelligence

Monocular 3D object detection is an essential component in autonomous driving while challenging to solve, especially for those occluded samples which are only partially visible. Most detectors consider each 3D object as an independent training target, inevitably resulting in a lack of useful information for occluded samples. To this end, we propose a novel method to improve the monocular 3D object detection by considering the relationship of paired samples. This allows us to encode spatial constraints for partially-occluded objects from their adjacent neighbors. Specifically, the proposed detector computes uncertainty-aware predictions for object locations and 3D distances for the adjacent object pairs, which are subsequently jointly optimized by nonlinear least squares. Finally, the one-stage uncertainty-aware prediction structure and the post-optimization module are dedicatedly integrated for ensuring the run-time efficiency. Experiments demonstrate that our method yields the best performance on KITTI 3D detection benchmark, by outperforming state-of-the-art competitors by wide margins, especially for the hard samples.


Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

arXiv.org Artificial Intelligence

Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, existing methods tend to overfit training data in seen environments and fail to generalize well in previously unseen environments. In order to close the gap between seen and unseen environments, we aim at learning a generalized navigation model from two novel perspectives: (1) we introduce a multitask navigation model that can be seamlessly trained on both Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks, which benefits from richer natural language guidance and effectively transfers knowledge across tasks; (2) we propose to learn environment-agnostic representations for the navigation policy that are invariant among the environments seen during training, thus generalizing better on unseen environments. Extensive experiments show that training with environmentagnostic multitask learning objective significantly reduces the performance gap between seen and unseen environments and the navigation agent so trained outperforms the baselines on unseen environments by 16% (relative measure on success rate) on VLN and 120% (goal progress) on NDH. Our submission to the CVDN leaderboard establishes a new state-of-the-art for the NDH task outperforming the existing best model by more than 66% (goal progress) on the holdout test set.


Differential Evolution with Individuals Redistribution for Real Parameter Single Objective Optimization

arXiv.org Artificial Intelligence

Differential Evolution (DE) is quite powerful for real parameter single objective optimization. However, the ability of extending or changing search area when falling into a local optimum is still required to be developed in DE for accommodating extremely complicated fitness landscapes with a huge number of local optima. We propose a new flow of DE, termed DE with individuals redistribution, in which a process of individuals redistribution will be called when progress on fitness is low for generations. In such a process, mutation and crossover are standardized, while trial vectors are all kept in selection. Once diversity exceeds a predetermined threshold, our opposition replacement is executed, then algorithm behavior returns to original mode. In our experiments based on two benchmark test suites, we apply individuals redistribution in ten DE algorithms. Versions of the ten DE algorithms based on individuals redistribution are compared with not only original version but also version based on complete restart, where individuals redistribution and complete restart are based on the same entry criterion. Experimental results indicate that, for most of the DE algorithms, version based on individuals redistribution performs better than both original version and version based on complete restart.


A Study on Multimodal and Interactive Explanations for Visual Question Answering

arXiv.org Artificial Intelligence

Explainability and interpretability of AI models is an essential factor affecting the safety of AI. While various explainable AI (XAI) approaches aim at mitigating the lack of transparency in deep networks, the evidence of the effectiveness of these approaches in improving usability, trust, and understanding of AI systems are still missing. We evaluate multimodal explanations in the setting of a Visual Question Answering (VQA) task, by asking users to predict the response accuracy of a VQA agent with and without explanations. We use between-subjects and within-subjects experiments to probe explanation effectiveness in terms of improving user prediction accuracy, confidence, and reliance, among other factors. The results indicate that the explanations help improve human prediction accuracy, especially in trials when the VQA system's answer is inaccurate. Furthermore, we introduce active attention, a novel method for evaluating causal attentional effects through intervention by editing attention maps. User explanation ratings are strongly correlated with human prediction accuracy and suggest the efficacy of these explanations in human-machine AI collaboration tasks.


Towards Automatic Face-to-Face Translation

arXiv.org Artificial Intelligence

In light of the recent breakthroughs in automatic machine translation systems, we propose a novel approach that we term as "Face-to-Face Translation". As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization. In this work, we create an automatic pipeline for this problem and demonstrate its impact on multiple real-world applications. First, we build a working speech-to-speech translation system by bringing together multiple existing modules from speech and language. We then move towards "Face-to-Face Translation" by incorporating a novel visual module, LipGAN for generating realistic talking faces from the translated audio. Quantitative evaluation of LipGAN on the standard LRW test set shows that it significantly outperforms existing approaches across all standard metrics. We also subject our Face-to-Face Translation pipeline, to multiple human evaluations and show that it can significantly improve the overall user experience for consuming and interacting with multimodal content across languages. Code, models and demo video are made publicly available. Demo video: https://www.youtube.com/watch?v=aHG6Oei8jF0 Code and models: https://github.com/Rudrabha/LipGAN


Data Pre-Processing and Evaluating the Performance of Several Data Mining Methods for Predicting Irrigation Water Requirement

arXiv.org Artificial Intelligence

Recent drought and population growth are planting unprecedented demand for the use of available limited water resources. Irrigated agriculture is one of the major consumers of freshwater. A large amount of water in irrigated agriculture is wasted due to poor water management practices. To improve water management in irrigated areas, models for estimation of future water requirements are needed. Developing a model for forecasting irrigation water demand can improve water management practices and maximise water productivity. Data mining can be used effectively to build such models. In this study, we prepare a dataset containing information on suitable attributes for forecasting irrigation water demand. The data is obtained from three different sources namely meteorological data, remote sensing images and water delivery statements. In order to make the prepared dataset useful for demand forecasting and pattern extraction, we pre-process the dataset using a novel approach based on a combination of irrigation and data mining knowledge. We then apply and compare the effectiveness of different data mining methods namely decision tree (DT), artificial neural networks (ANNs), systematically developed forest (SysFor) for multiple trees, support vector machine (SVM), logistic regression, and the traditional Evapotranspiration (ETc) methods and evaluate the performance of these models to predict irrigation water demand. Our experimental results indicate the usefulness of data pre-processing and the effectiveness of different classifiers. Among the six methods we used, SysFor produces the best prediction with 97.5% accuracy followed by a decision tree with 96% and ANN with 95% respectively by closely matching the predictions with actual water usage. Therefore, we recommend using SysFor and DT models for irrigation water demand forecasting.


Turning 30: New Ideas in Inductive Logic Programming

arXiv.org Artificial Intelligence

Common criticisms of state-of-the-art machine learning include poor generalisation, a lack of interpretability, and a need for large amounts of training data. We survey recent work in inductive logic programming (ILP), a form of machine learning that induces logic programs from data, which has shown promise at addressing these limitations. We focus on new methods for learning recursive programs that generalise from few examples, a shift from using hand-crafted background knowledge to \emph{learning} background knowledge, and the use of different technologies, notably answer set programming and neural networks. As ILP approaches 30, we also discuss directions for future research.