AITopics | Joshi, Anupam

Collaborating Authors

Joshi, Anupam

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KIPPS: Knowledge infusion in Privacy Preserving Synthetic Data Generation

Kotal, Anantaa, Joshi, Anupam

arXiv.org Artificial IntelligenceSep-25-2024

The integration of privacy measures, including differential privacy techniques, ensures a provable privacy guarantee for the synthetic data. However, challenges arise for Generative Deep Learning models when tasked with generating realistic data, especially in critical domains such as Cybersecurity and Healthcare. Generative Models optimized for continuous data struggle to model discrete and non-Gaussian features that have domain constraints. Challenges increase when the training datasets are limited and not diverse. In such cases, generative models create synthetic data that repeats sensitive features, which is a privacy risk. Moreover, generative models face difficulties comprehending attribute constraints in specialized domains. This leads to the generation of unrealistic data that impacts downstream accuracy. To address these issues, this paper proposes a novel model, KIPPS, that infuses Domain and Regulatory Knowledge from Knowledge Graphs into Generative Deep Learning models for enhanced Privacy Preserving Synthetic data generation. The novel framework augments the training of generative models with supplementary context about attribute values and enforces domain constraints during training. This added guidance enhances the model's capacity to generate realistic and domain-compliant synthetic data. The proposed model is evaluated on real-world datasets, specifically in the domains of Cybersecurity and Healthcare, where domain constraints and rules add to the complexity of the data. Our experiments evaluate the privacy resilience and downstream accuracy of the model against benchmark methods, demonstrating its effectiveness in addressing the balance between privacy preservation and data accuracy in complex domains.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2409.17315

Country:

North America > United States > Maryland > Baltimore County (0.14)
North America > United States > Maryland > Baltimore (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.69)
Government > Military > Cyberwarfare (0.55)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

KiNETGAN: Enabling Distributed Network Intrusion Detection through Knowledge-Infused Synthetic Data Generation

Kotal, Anantaa, Luton, Brandon, Joshi, Anupam

arXiv.org Artificial IntelligenceMay-26-2024

In the realm of IoT/CPS systems connected over mobile networks, traditional intrusion detection methods analyze network traffic across multiple devices using anomaly detection techniques to flag potential security threats. However, these methods face significant privacy challenges, particularly with deep packet inspection and network communication analysis. This type of monitoring is highly intrusive, as it involves examining the content of data packets, which can include personal and sensitive information. Such data scrutiny is often governed by stringent laws and regulations, especially in environments like smart homes where data privacy is paramount. Synthetic data offers a promising solution by mimicking real network behavior without revealing sensitive details. Generative models such as Generative Adversarial Networks (GANs) can produce synthetic data, but they often struggle to generate realistic data in specialized domains like network activity. This limitation stems from insufficient training data, which impedes the model's ability to grasp the domain's rules and constraints adequately. Moreover, the scarcity of training data exacerbates the problem of class imbalance in intrusion detection methods. To address these challenges, we propose a Privacy-Driven framework that utilizes a knowledge-infused Generative Adversarial Network for generating synthetic network activity data (KiNETGAN). This approach enhances the resilience of distributed intrusion detection while addressing privacy concerns. Our Knowledge Guided GAN produces realistic representations of network activity, validated through rigorous experimentation. We demonstrate that KiNETGAN maintains minimal accuracy loss in downstream tasks, effectively balancing data privacy and utility.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.16476

Country: North America > United States > Maryland > Baltimore County (0.15)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.54)

Add feedback

PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification

Garza, Leon, Elluri, Lavanya, Kotal, Anantaa, Piplai, Aritran, Gupta, Deepti, Joshi, Anupam

arXiv.org Artificial IntelligenceApr-30-2024

Data protection and privacy is becoming increasingly crucial in the digital era. Numerous companies depend on third-party vendors and service providers to carry out critical functions within their operations, encompassing tasks such as data handling and storage. However, this reliance introduces potential vulnerabilities, as these vendors' security measures and practices may not always align with the standards expected by regulatory bodies. Businesses are required, often under the penalty of law, to ensure compliance with the evolving regulatory rules. Interpreting and implementing these regulations pose challenges due to their complexity. Regulatory documents are extensive, demanding significant effort for interpretation, while vendor-drafted privacy policies often lack the detail required for full legal compliance, leading to ambiguity. To ensure a concise interpretation of the regulatory requirements and compliance of organizational privacy policy with said regulations, we propose a Large Language Model (LLM) and Semantic Web based approach for privacy compliance. In this paper, we develop the novel Privacy Policy Compliance Verification Knowledge Graph, PrivComp-KG. It is designed to efficiently store and retrieve comprehensive information concerning privacy policies, regulatory frameworks, and domain-specific knowledge pertaining to the legal landscape of privacy. Using Retrieval Augmented Generation, we identify the relevant sections in a privacy policy with corresponding regulatory rules. This information about individual privacy policies is populated into the PrivComp-KG. Combining this with the domain context and rules, the PrivComp-KG can be queried to check for compliance with privacy policies by each vendor against relevant policy regulations. We demonstrate the relevance of the PrivComp-KG, by verifying compliance of privacy policy documents for various organizations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.19744

Country:

Europe (0.69)
North America > United States > Maryland (0.28)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules for Secure and Confidential Data Synthesis

Kotal, Anantaa, Elluri, Lavanya, Gupta, Deepti, Mandalapu, Varun, Joshi, Anupam

arXiv.org Artificial IntelligenceNov-26-2023

Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the EU GDPR, the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats and secure data based on applicable regulatory policy rules.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/BigData59044.2023.10386276

2311.1546

Country: North America > United States > Maryland > Baltimore County (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Food & Agriculture > Agriculture (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Change Management using Generative Modeling on Digital Twins

Das, Nilanjana, Kotal, Anantaa, Roseberry, Daniel, Joshi, Anupam

arXiv.org Artificial IntelligenceSep-21-2023

A key challenge faced by small and medium-sized business entities is securely managing software updates and changes. Specifically, with rapidly evolving cybersecurity threats, changes/updates/patches to software systems are necessary to stay ahead of emerging threats and are often mandated by regulators or statutory authorities to counter these. However, security patches/updates require stress testing before they can be released in the production system. Stress testing in production environments is risky and poses security threats. Large businesses usually have a non-production environment where such changes can be made and tested before being released into production. Smaller businesses do not have such facilities. In this work, we show how "digital twins", especially for a mix of IT and IoT environments, can be created on the cloud. These digital twins act as a non-production environment where changes can be applied, and the system can be securely tested before patch release. Additionally, the non-production digital twin can be used to collect system data and run stress tests on the environment, both manually and automatically. In this paper, we show how using a small sample of real data/interactions, Generative Artificial Intelligence (AI) models can be used to generate testing scenarios to check for points of failure.

artificial intelligence, change management, generative modeling, (1 more...)

arXiv.org Artificial Intelligence

2309.12421

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.53)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Knowledge-enhanced Neuro-Symbolic AI for Cybersecurity and Privacy

Piplai, Aritran, Kotal, Anantaa, Mohseni, Seyedreza, Gaur, Manas, Mittal, Sudip, Joshi, Anupam

arXiv.org Artificial IntelligenceJul-24-2023

Neuro-Symbolic Artificial Intelligence (AI) is an emerging and quickly advancing field that combines the subsymbolic strengths of (deep) neural networks and explicit, symbolic knowledge contained in knowledge graphs to enhance explainability and safety in AI systems. This approach addresses a key criticism of current generation systems, namely their inability to generate human-understandable explanations for their outcomes and ensure safe behaviors, especially in scenarios with \textit{unknown unknowns} (e.g. cybersecurity, privacy). The integration of neural networks, which excel at exploring complex data spaces, and symbolic knowledge graphs, which represent domain knowledge, allows AI systems to reason, learn, and generalize in a manner understandable to experts. This article describes how applications in cybersecurity and privacy, two most demanding domains in terms of the need for AI to be explainable while being highly accurate in complex environments, can benefit from Neuro-Symbolic AI.

artificial intelligence, knowledge, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2308.02031

Country: North America > United States > Maryland (0.29)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Practical Entity Linking System for Tables in Scientific Literature

Mulwad, Varish, Finin, Tim, Kumar, Vijay S., Williams, Jenny Weisenberg, Dixit, Sharad, Joshi, Anupam

arXiv.org Artificial IntelligenceJun-11-2023

Entity linking is an important step towards constructing knowledge graphs that facilitate advanced question answering over scientific documents, including the retrieval of relevant information included in tables within these documents. This paper introduces a general-purpose system for linking entities to items in the Wikidata knowledge base. It describes how we adapt this system for linking domain-specific entities, especially for those entities embedded within tables drawn from COVID-19-related scientific literature. We describe the setup of an efficient offline instance of the system that enables our entity-linking approach to be more feasible in practice. As part of a broader approach to infer the semantic meaning of scientific tables, we leverage the structural and semantic characteristics of the tables to improve overall entity linking performance.

artificial intelligence, natural language, wikidata, (17 more...)

arXiv.org Artificial Intelligence

2306.10044

Country:

North America > United States > Maryland > Baltimore County (0.14)
North America > United States > Maryland > Baltimore (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.94)
Information Technology > Communications > Web > Semantic Web (0.68)

Add feedback

Recognizing and Extracting Cybersecurtity-relevant Entities from Text

Hanks, Casey, Maiden, Michael, Ranade, Priyanka, Finin, Tim, Joshi, Anupam

arXiv.org Artificial IntelligenceAug-2-2022

Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools and create methods for continuous integration of new information extracted from text.

annotation, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2208.01693

Country: North America > United States > Maryland > Baltimore County > Baltimore (0.28)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Generating Fake Cyber Threat Intelligence Using Transformer-Based Models

Ranade, Priyanka, Piplai, Aritran, Mittal, Sudip, Joshi, Anupam, Finin, Tim

arXiv.org Artificial IntelligenceFeb-8-2021

Cyber-defense systems are being developed to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. A potential risk is that fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these systems. Adversaries can use fake CTI examples as training input to subvert cyber defense systems, forcing the model to learn incorrect inputs to serve their malicious needs. In this paper, we automatically generate fake CTI text descriptions using transformers. We show that given an initial prompt sentence, a public language model like GPT-2 with fine-tuning, can generate plausible CTI text with the ability of corrupting cyber-defense systems. We utilize the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus. The poisoning attack introduced adverse impacts such as returning incorrect reasoning outputs, representation poisoning, and corruption of other dependent AI-based cyber defense systems. We evaluate with traditional approaches and conduct a human evaluation study with cybersecurity professionals and threat hunters. Based on the study, professional threat hunters were equally likely to consider our fake generated CTI as true.

cti, cyberwarfare, deep learning, (22 more...)

arXiv.org Artificial Intelligence

2102.04351

Country: North America > United States > Maryland (0.28)

Genre:

Research Report (1.00)
Overview (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Independent Component Analysis for Trustworthy Cyberspace during High Impact Events: An Application to Covid-19

Boukouvalas, Zois, Mallinson, Christine, Crothers, Evan, Japkowicz, Nathalie, Piplai, Aritran, Mittal, Sudip, Joshi, Anupam, Adalı, Tülay

arXiv.org Machine LearningJun-30-2020

Social media has become an important communication channel during high impact events, such as the COVID-19 pandemic. As misinformation in social media can rapidly spread, creating social unrest, curtailing the spread of misinformation during such events is a significant data challenge. While recent solutions that are based on machine learning have shown promise for the detection of misinformation, most widely used methods include approaches that rely on either handcrafted features that cannot be optimal for all scenarios, or those that are based on deep learning where the interpretation of the prediction results is not directly accessible. In this work, we propose a data-driven solution that is based on the ICA model, such that knowledge discovery and detection of misinformation are achieved jointly. To demonstrate the effectiveness of our method and compare its performance with deep learning methods, we developed a labeled COVID-19 Twitter dataset based on socio-linguistic criteria.

deep learning, immunology, tweet, (20 more...)

arXiv.org Machine Learning

2006.01284

Country:

Asia (1.00)
North America > United States > Maryland > Baltimore (0.15)
North America > United States > Maryland > Baltimore County (0.14)
North America > United States > North Carolina > New Hanover County > Wilmington (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback