Goto

Collaborating Authors

 Gupta, Gaurav


ProtSi: Prototypical Siamese Network with Data Augmentation for Few-Shot Subjective Answer Evaluation

arXiv.org Artificial Intelligence

Subjective answer evaluation is a time-consuming and tedious task, and the quality of the evaluation is heavily influenced by a variety of subjective personal characteristics. Instead, machine evaluation can effectively assist educators in saving time while also ensuring that evaluations are fair and realistic. However, most existing methods using regular machine learning and natural language processing techniques are generally hampered by a lack of annotated answers and poor model interpretability, making them unsuitable for real-world use. To solve these challenges, we propose ProtSi Network, a unique semi-supervised architecture that for the first time uses few-shot learning to subjective answer evaluation. To evaluate students' answers by similarity prototypes, ProtSi Network simulates the natural process of evaluator scoring answers by combining Siamese Network which consists of BERT and encoder layers with Prototypical Network. We employed an unsupervised diverse paraphrasing model ProtAugment, in order to prevent overfitting for effective few-shot text classification. By integrating contrastive learning, the discriminative text issue can be mitigated. Experiments on the Kaggle Short Scoring Dataset demonstrate that the ProtSi Network outperforms the most recent baseline models in terms of accuracy and quadratic weighted kappa.


STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge

arXiv.org Machine Learning

Empirical risk minimization is perhaps the most influential idea in statistical learning, with applications to nearly all scientific and technical domains in the form of regression and classification models. To analyze massive streaming datasets in distributed computing environments, practitioners increasingly prefer to deploy regression models on edge rather than in the cloud. By keeping data on edge devices, we minimize the energy, communication, and data security risk associated with the model. Although it is equally advantageous to train models at the edge, a common assumption is that the model was originally trained in the cloud, since training typically requires substantial computation and memory. To this end, we propose STORM, an online sketch for empirical risk minimization. STORM compresses a data stream into a tiny array of integer counters. This sketch is sufficient to estimate a variety of surrogate losses over the original dataset. We provide rigorous theoretical analysis and show that STORM can estimate a carefully chosen surrogate loss for the least-squares objective. In an exhaustive experimental comparison for linear regression models on real-world datasets, we find that STORM allows accurate regression models to be trained.


Learning in Confusion: Batch Active Learning with Noisy Oracle

arXiv.org Machine Learning

We study the problem of training machine learning models incrementally using active learning with access to imperfect or noisy oracles. We specifically consider the setting of batch active learning, in which multiple samples are selected as opposed to a single sample as in classical settings so as to reduce the training overhead. Our approach bridges between uniform randomness and score based importance sampling of clusters when selecting a batch of new samples. Experiments on benchmark image classification datasets (MNIST, SVHN, and CIFAR10) shows improvement over existing active learning strategies. We introduce an extra denoising layer to deep networks to make active learning robust to label noises and show significant improvements.


Learning Latent Fractional dynamics with Unknown Unknowns

arXiv.org Machine Learning

Despite significant effort in understanding complex systems (CS), we lack a theory for modeling, inference, analysis and efficient control of time-varying complex networks (TVCNs) in uncertain environments. From brain activity dynamics to microbiome, and even chromatin interactions within the genome architecture, many such TVCNs exhibits a pronounced spatio-temporal fractality. Moreover, for many TVCNs only limited information (e.g., few variables) is accessible for modeling, which hampers the capabilities of analytical tools to uncover the true degrees of freedom and infer the CS model, the hidden states and their parameters. Another fundamental limitation is that of understanding and unveiling of unknown drivers of the dynamics that could sporadically excite the network in ways that straightforward modeling does not work due to our inability to model non-stationary processes. Towards addressing these challenges, in this paper, we consider the problem of learning the fractional dynamical complex networks under unknown unknowns (i.e., hidden drivers) and partial observability (i.e., only partial data is available). More precisely, we consider a generalized modeling approach of TVCNs consisting of discrete-time fractional dynamical equations and propose an iterative framework to determine the network parameterization and predict the state of the system. We showcase the performance of the proposed framework in the context of task classification using real electroencephalogram data.


Data-driven Perception of Neuron Point Process with Unknown Unknowns

arXiv.org Machine Learning

Identification of patterns from discrete data time-series for statistical inference, threat detection, social opinion dynamics, brain activity prediction has received recent momentum. In addition to the huge data size, the associated challenges are, for example, (i) missing data to construct a closed time-varying complex network, and (ii) contribution of unknown sources which are not probed. Towards this end, the current work focuses on statistical neuron system model with multi-covariates and unknown inputs. Previous research of neuron activity analysis is mainly limited with effects from the spiking history of target neuron and the interaction with other neurons in the system while ignoring the influence of unknown stimuli. We propose to use unknown unknowns, which describes the effect of unknown stimuli, undetected neuron activities and all other hidden sources of error. The maximum likelihood estimation with the fixed-point iteration method is implemented. The fixed-point iterations converge fast, and the proposed methods can be efficiently parallelized and offer computational advantage especially when the input spiking trains are over long time-horizon. The developed framework provides an intuition into the meaning of having extra degrees-of-freedom in the data to support the need for unknowns. The proposed algorithm is applied to simulated spike trains and on real-world experimental data of mouse somatosensory, mouse retina and cat retina. The model shows a successful increasing of system likelihood with respect to the conditional intensity function, and it also reveals the convergence with iterations. Results suggest that the neural connection model with unknown unknowns can efficiently estimate the statistical properties of the process by increasing the network likelihood.