AITopics | Gupta, Otkrist

OCR Graph Features for Manipulation Detection in Documents

Joren, Hailey, Gupta, Otkrist, Raviv, Dan

arXiv.org Artificial IntelligenceSep-14-2020

Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driven and generalizable approach. We frame this issue as a graph comparison problem using the character bounding boxes, and propose a model that leverages graph features using OCR (Optical Character Recognition). Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features. We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections. Our proposed model dramatically outperforms the most closely-related document manipulation detection model on this task.

detection, machine learning, pattern recognition, (17 more...)

arXiv.org Artificial Intelligence

2009.05158

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.40)

Industry:

Information Technology (0.47)
Media > Photography (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

NoPeek: Information leakage reduction to share activations in distributed deep learning

Vepakomma, Praneeth, Singh, Abhishek, Gupta, Otkrist, Raskar, Ramesh

arXiv.org Machine LearningAug-20-2020

For distributed machine learning with sensitive data, we demonstrate how minimizing distance correlation between raw data and intermediary representations reduces leakage of sensitive raw data patterns across client communications while maintaining model accuracy. Leakage (measured using distance correlation between input and intermediate representations) is the risk associated with the invertibility of raw data from intermediary representations. This can prevent client entities that hold sensitive data from using distributed deep learning services. We demonstrate that our method is resilient to such reconstruction attacks and is based on reduction of distance correlation between raw data and learned representations during training and inference with image datasets. We prevent such reconstruction of raw data while maintaining information required to sustain good classification accuracies.

activation, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2008.09161

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Printing and Scanning Attack for Image Counter Forensics

James, Hailey, Gupta, Otkrist, Raviv, Dan

arXiv.org Machine LearningJun-24-2020

Examining the authenticity of images has become increasingly important as manipulation tools become more accessible and advanced. Recent work has shown that while CNN-based image manipulation detectors can successfully identify manipulations, they are also vulnerable to adversarial attacks, ranging from simple double JPEG compression to advanced pixel-based perturbation. In this paper we explore another method of highly plausible attack: printing and scanning. We demonstrate the vulnerability of two state-of-the-art models to this type of attack. We also propose a new machine learning model that performs comparably to these state-of-the-art models when trained and validated on printed and scanned images. Of the three models, our proposed model outperforms the others when trained and validated on images from a single printer. To facilitate this exploration, we create a dataset of over 6,000 printed and scanned image blocks. Further analysis suggests that variation between images produced from different printers is significant, large enough that good validation accuracy on images from one printer does not imply similar validation accuracy on identical images from a different printer.

dataset, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2005.0216

Genre: Research Report > Promising Solution (0.55)

Industry: Information Technology > Security & Privacy (0.90)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Maximum-Entropy Fine Grained Classification

Dubey, Abhimanyu, Gupta, Otkrist, Raskar, Ramesh, Naik, Nikhil

Neural Information Processing SystemsFeb-14-2020, 06:11:23 GMT

Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data. Utilizing this notion of small visual diversity, we revisit Maximum-Entropy learning in the context of fine-grained classification, and provide a training routine that maximizes the entropy of the output probability distribution for training convolutional neural networks on FGVC tasks. We provide a theoretical as well as empirical justification of our approach, and achieve state-of-the-art performance across a variety of classification tasks in FGVC, that can potentially be extended to any fine-tuning task. Our method is robust to different hyperparameter values, amount of training data and amount of training label noise and can hence be a valuable tool in many similar problems. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, maximum-entropy fine grained classification, neural network, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Maximal adversarial perturbations for obfuscation: Hiding certain attributes while preserving rest

Ilanchezian, Indu, Vepakomma, Praneeth, Singh, Abhishek, Gupta, Otkrist, Prasanna, G. N. Srinivasa, Raskar, Ramesh

arXiv.org Machine LearningSep-27-2019

In this paper we investigate the usage of adversarial perturbations for the purpose of privacy from human perception and model (machine) based detection. We employ adversarial perturbations for obfuscating certain variables in raw data while preserving the rest. Current adversarial perturbation methods are used for data poisoning with minimal perturbations of the raw data such that the machine learning model's performance is adversely impacted while the human vision cannot perceive the difference in the poisoned dataset due to minimal nature of perturbations. We instead apply relatively maximal perturbations of raw data to conditionally damage model's classification of one attribute while preserving the model performance over another attribute. In addition, the maximal nature of perturbation helps adversely impact human perception in classifying hidden attribute apart from impacting model performance. We validate our result qualitatively by showing the obfuscated dataset and quantitatively by showing the inability of models trained on clean data to predict the hidden attribute from the perturbed dataset while being able to predict the rest of attributes.

deep learning, neural network, perturbation, (19 more...)

arXiv.org Machine Learning

1909.12734

Country:

North America > United States (0.14)
Asia > India (0.14)

Genre: Research Report > New Finding (0.35)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Detailed comparison of communication efficiency of split learning and federated learning

Singh, Abhishek, Vepakomma, Praneeth, Gupta, Otkrist, Raskar, Ramesh

arXiv.org Machine LearningSep-18-2019

We compare communication efficiencies of two compelling distributed machine learning approaches of split learning and federated learning. We show useful settings under which each method outperforms the other in terms of communication efficiency. We consider various practical scenarios of distributed learning setup and juxtapose the two methods under various real-life scenarios. We consider settings of small and large number of clients as well as small models (1M - 6M parameters), large models (10M - 200M parameters) and very large models (1 Billion-100 Billion parameters). We show that increasing number of clients or increasing model size favors split learning setup over the federated while increasing the number of data samples while keeping the number of clients or model size low makes federated learning more communication efficient.

artificial intelligence, communication efficiency, neural network, (16 more...)

arXiv.org Machine Learning

1909.09145

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Maximum-Entropy Fine Grained Classification

Dubey, Abhimanyu, Gupta, Otkrist, Raskar, Ramesh, Naik, Nikhil

Neural Information Processing SystemsDec-31-2018

Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data. Utilizing this notion of small visual diversity, we revisit Maximum-Entropy learning in the context of fine-grained classification, and provide a training routine that maximizes the entropy of the output probability distribution for training convolutional neural networks on FGVC tasks. We provide a theoretical as well as empirical justification of our approach, and achieve state-of-the-art performance across a variety of classification tasks in FGVC, that can potentially be extended to any fine-tuning task. Our method is robust to different hyperparameter values, amount of training data and amount of training label noise and can hence be a valuable tool in many similar problems.

bayesian inference, classification, health & medicine, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Maximum-Entropy Fine Grained Classification

Dubey, Abhimanyu, Gupta, Otkrist, Raskar, Ramesh, Naik, Nikhil

Neural Information Processing SystemsDec-31-2018

Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data. Utilizing this notion of small visual diversity, we revisit Maximum-Entropy learning in the context of fine-grained classification, and provide a training routine that maximizes the entropy of the output probability distribution for training convolutional neural networks on FGVC tasks. We provide a theoretical as well as empirical justification of our approach, and achieve state-of-the-art performance across a variety of classification tasks in FGVC, that can potentially be extended to any fine-tuning task. Our method is robust to different hyperparameter values, amount of training data and amount of training label noise and can hence be a valuable tool in many similar problems.

bayesian inference, entropy, health & medicine, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

No Peek: A Survey of private distributed deep learning

Vepakomma, Praneeth, Swedish, Tristan, Raskar, Ramesh, Gupta, Otkrist, Dubey, Abhimanyu

arXiv.org Machine LearningDec-8-2018

We survey distributed deep learning models for training or inference without accessing raw data from clients. These methods aim to protect confidential patterns in data while still allowing servers to train models. The distributed deep learning methods of federated learning, split learning and large batch stochastic gradient descent are compared in addition to private and secure approaches of differential privacy, homomorphic encryption, oblivious transfer and garbled circuits in the context of neural networks. We study their benefits, limitations and trade-offs with regards to computational resources, data leakage and communication efficiency and also share our anticipated future trends.

deep learning, neural network, us government, (17 more...)

arXiv.org Machine Learning

1812.03288

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre:

Research Report (0.82)
Overview (0.52)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
Health & Medicine > Health Care Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Split learning for health: Distributed deep learning without sharing raw patient data

Vepakomma, Praneeth, Gupta, Otkrist, Swedish, Tristan, Raskar, Ramesh

arXiv.org Machine LearningDec-3-2018

Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN does not share raw data or model details with collaborating institutions. The proposed configurations of splitNN cater to practical settings of i) entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks and iii) learning without sharing labels. We compare performance and resource efficiency trade-offs of splitNN and other distributed deep learning methods like federated learning, large batch synchronous stochastic gradient descent and show highly encouraging results for splitNN.

deep learning, health & medicine, neural network, (17 more...)

arXiv.org Machine Learning

1812.00564

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.95)
Health & Medicine > Diagnostic Medicine (0.72)
Health & Medicine > Health Care Technology (0.69)
Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

Gupta, Otkrist

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

OCR Graph Features for Manipulation Detection in Documents

NoPeek: Information leakage reduction to share activations in distributed deep learning

Printing and Scanning Attack for Image Counter Forensics

Maximum-Entropy Fine Grained Classification

Maximal adversarial perturbations for obfuscation: Hiding certain attributes while preserving rest

Detailed comparison of communication efficiency of split learning and federated learning

Maximum-Entropy Fine Grained Classification

Maximum-Entropy Fine Grained Classification

No Peek: A Survey of private distributed deep learning

Split learning for health: Distributed deep learning without sharing raw patient data