Plotting

 Information Technology


MACD: Multilingual Abusive Comment Detection at Scale for Indic Languages

Neural Information Processing Systems

Social media platforms were conceived to act as online'town squares' where people could get together, share information and communicate with each other peacefully. However, harmful content borne out of bad actors are constantly plaguing these platforms slowly converting them into'mosh pits' where the bad actors take the liberty to extensively abuse various marginalised groups. Accurate and timely detection of abusive content on social media platforms is therefore very important for facilitating safe interactions between users. However, due to the small scale and sparse linguistic coverage of Indic abusive speech datasets, development of such algorithms for Indic social media users (one-sixth of global population) is severely impeded.


Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Neural Information Processing Systems

A current remarkable improvement of unsupervised visual representation learning is based on heavy networks with large-batch training. While recent methods have greatly reduced the gap between supervised and unsupervised performance of deep models such as ResNet-50, this development has been relatively limited for small models. In this work, we propose a novel unsupervised learning framework for small networks that combines deep self-supervised representation learning and knowledge distillation within one-phase training. In particular, a teacher model is trained to produce consistent cluster assignments between different views of the same image. Simultaneously, a student model is encouraged to mimic the prediction of on-the-fly self-supervised teacher. For effective knowledge transfer, we adopt the idea of domain classifier so that student training is guided by discriminative features invariant to the representational space shift between teacher and student. We also introduce a network driven multi-view generation paradigm to capture rich feature information contained in the network itself. Extensive experiments show that our student models surpass state-of-the-art offline distilled networks even from stronger self-supervised teachers as well as top-performing self-supervised models. Notably, our ResNet-18, trained with ResNet-50 teacher, achieves 68.3% ImageNet Top-1 accuracy on frozen feature linear evaluation, which is only 1.5% below the supervised baseline.


Supplementary Material: Model Class Reliance for Random Forests

Neural Information Processing Systems

Unless otherwise specified all algorithms were timed on single core versions even though, for instance, the proposed method is in places trivially parallelizable (i.e. during forest build). An exception was the grid search across meta-parameters to find the best (optimal) reference model where parallelization was used when required as this stage does not form part of the time comparisons. Hosted on Google Colaboratory they enable the use of hosted or local runtime environments. When tested hosted runtimes were running Python 3.6.9 Please note that while a hosted runtime can be used for ease of replication, all timings reported in the paper were based on using a local runtime environment as previously indicated NOT a hosted environment. The notebooks, when run in the hosted environment will automatically install the required packages developed as part of this work.


On Giant's Shoulders: Effortless Weakto Strong by Dynamic Logits Fusion

Neural Information Processing Systems

Efficient fine-tuning of large language models for task-specific applications is imperative, yet the vast number of parameters in these models makes their training increasingly challenging. Despite numerous proposals for effective methods, a substantial memory overhead remains for gradient computations during updates. Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training? In this paper, we explore weak-to-strong specialization using logit arithmetic, facilitating a direct answer to this question. Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge, which leads to suboptimal performance.


Distributionally Robust Imitation Learning

Neural Information Processing Systems

We consider the imitation learning problem of learning a policy in a Markov Decision Process (MDP) setting where the reward function is not given, but demonstrations from experts are available. Although the goal of imitation learning is to learn a policy that produces behaviors nearly as good as the experts' for a desired task, assumptions of consistent optimality for demonstrated behaviors are often violated in practice. Finding a policy that is distributionally robust against noisy demonstrations based on an adversarial construction potentially solves this problem by avoiding optimistic generalizations of the demonstrated data.


Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation Gengshan Yang

Neural Information Processing Systems

However, they often fail to produce realistic geometric details, resulting in overly smooth surfaces or geometric details inaccurately baked in albedo maps. To address this, we introduce a new method that incorporates touch as an additional modality to improve the geometric details of generated 3D assets. We design a lightweight 3D texture field to synthesize visual and tactile textures, guided by 2D diffusion model priors on both visual and tactile domains. We condition the visual texture generation on high-resolution tactile normals and guide the patch-based tactile texture refinement with a customized TextureDreambooth. We further present a multi-part generation pipeline that enables us to synthesize different textures across various regions. To our knowledge, we are the first to leverage high-resolution tactile sensing to enhance geometric details for 3D generation tasks. We evaluate our method in both text-to-3D and image-to-3D settings. Our experiments demonstrate that our method provides customized and realistic fine geometric textures while maintaining accurate alignment between two modalities of vision and touch.


Non-parametric classification via expand-and-sparsify representation

Neural Information Processing Systems

We propose two algorithms for non-parametric classification using such EaS representation. For our first algorithm, we use winners-take-all operation for the sparsification step and show that the proposed classifier admits the form of a locally weighted average classifier and establish its consistency via Stone's Theorem. Further, assuming that the conditional probability function P (y = 1|x) = η(x) is Hölder continuous and for optimal choice of m, we show that the convergence rate of this classifier is minimax-optimal.


Supplementary Material: Cross Aggregation Transformer for Image Restoration

Neural Information Processing Systems

These settings are consistent with CAT-R and CAT-A. For CAT-R-2, we apply regular-Rwin, and set [sw, sh] as [4, 16] (same as CAT-R). We set the MLP expansion ratio as 2, consistent with SwinIR [13]. For CAT-A-2, we apply axial-Rwin, and set sl as 4 for all CATB in each RG. The MLP expansion ratio is set as 4. Best and second best results are colored with red and blue.


Hard Negative Mixing for Contrastive Learning

Neural Information Processing Systems

The uniformity experiment is based on Wang and Isola [53]. We follow the same definitions of the losses/metrics as presented in the paper. We set α = 2 and t = 2. All features were L2-normalized, as the metrics are defined on the hypersphere. B.1 Proxy task: Effect of MLP and Stronger Augmentation Following our discussion in Section 3, we wanted to verify that hardness of the proxy task for MoCo [19] is directly correlated to the difficulty of the transformations set, i.e. proxy task hardness can modulated via the positive pair.


Constrained Sampling with Primal-Dual Langevin Monte Carlo

Neural Information Processing Systems

This work considers the problem of sampling from a probability distribution known up to a normalization constant while satisfying a set of statistical constraints specified by the expected values of general nonlinear functions. This problem finds applications in, e.g., Bayesian inference, where it can constrain moments to evaluate counterfactual scenarios or enforce desiderata such as prediction fairness. Methods developed to handle support constraints, such as those based on mirror maps, barriers, and penalties, are not suited for this task. This work therefore relies on gradient descent-ascent dynamics in Wasserstein space to put forward a discretetime primal-dual Langevin Monte Carlo algorithm (PD-LMC) that simultaneously constrains the target distribution and samples from it. We analyze the convergence of PD-LMC under standard assumptions on the target distribution and constraints, namely (strong) convexity and log-Sobolev inequalities. To do so, we bring classical optimization arguments for saddle-point algorithms to the geometry of Wasserstein space. We illustrate the relevance and effectiveness of PD-LMC in several applications.