Goto

Collaborating Authors

 presidio


Detection of Personal Data in Structured Datasets Using a Large Language Model

arXiv.org Artificial Intelligence

We propose a novel approach for detecting personal data in structured datasets, leveraging GPT-4o, a state-of-the-art Large Language Model. A key innovation of our method is the incorporation of contextual information: in addition to a feature's name and values, we utilize information from other feature names within the dataset as well as the dataset description. We compare our approach to alternative methods, including Microsoft Presidio and CASSED, evaluating them on multiple datasets: DeSSI, a large synthetic dataset, datasets we collected from Kaggle and OpenML as well as MIMIC-Demo-Ext, a real-world dataset containing patient information from critical care units. Our findings reveal that detection performance varies significantly depending on the dataset used for evaluation. CASSED excels on DeSSI, the dataset on which it was trained. Performance on the medical dataset MIMIC-Demo-Ext is comparable across all models, with our GPT-4o-based approach clearly outperforming the others. Notably, personal data detection in the Kaggle and OpenML datasets appears to benefit from contextual information. This is evidenced by the poor performance of CASSED and Presidio (both of which do not utilize the context of the dataset) compared to the strong results of our GPT-4o-based approach. We conclude that further progress in this field would greatly benefit from the availability of more real-world datasets containing personal information.


Enhancing the De-identification of Personally Identifiable Information in Educational Data

arXiv.org Artificial Intelligence

Protecting Personally Identifiable Information (PII), such as names, is a critical requirement in learning technologies to safeguard student and teacher privacy and maintain trust. Accurate PII detection is an essential step toward anonymizing sensitive information while preserving the utility of educational data. Motivated by recent advancements in artificial intelligence, our study investigates the GPT-4o-mini model as a cost-effective and efficient solution for PII detection tasks. We explore both prompting and fine-tuning approaches and compare GPT-4o-mini's performance against established frameworks, including Microsoft Presidio and Azure AI Language. Our evaluation on two public datasets, CRAPII and TSCC, demonstrates that the fine-tuned GPT-4o-mini model achieves superior performance, with a recall of 0.9589 on CRAPII. Additionally, fine-tuned GPT-4o-mini significantly improves precision scores (a threefold increase) while reducing computational costs to nearly one-tenth of those associated with Azure AI Language. Furthermore, our bias analysis reveals that the fine-tuned GPT-4o-mini model consistently delivers accurate results across diverse cultural backgrounds and genders. The generalizability analysis using the TSCC dataset further highlights its robustness, achieving a recall of 0.9895 with minimal additional training data from TSCC. These results emphasize the potential of fine-tuned GPT-4o-mini as an accurate and cost-effective tool for PII detection in educational data. It offers robust privacy protection while preserving the data's utility for research and pedagogical analysis. Our code is available on GitHub: https://github.com/AnonJD/PrivacyAI


Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches

arXiv.org Artificial Intelligence

In the realm of data privacy, the ability to effectively anonymise text is paramount. With the proliferation of deep learning and, in particular, transformer architectures, there is a burgeoning interest in leveraging these advanced models for text anonymisation tasks. This paper presents a comprehensive benchmarking study comparing the performance of transformer-based models and Large Language Models(LLM) against traditional architectures for text anonymisation. Utilising the CoNLL-2003 dataset, known for its robustness and diversity, we evaluate several models. Our results showcase the strengths and weaknesses of each approach, offering a clear perspective on the efficacy of modern versus traditional methods. Notably, while modern models exhibit advanced capabilities in capturing con textual nuances, certain traditional architectures still keep high performance. This work aims to guide researchers in selecting the most suitable model for their anonymisation needs, while also shedding light on potential paths for future advancements in the field.


Fulltime Cloud Architect openings in Portland on September 03, 2022

#artificialintelligence

HumanaPharmacy is a leader committed to the health and wellbeing of members through mail-order delivery of maintenance and specialty medicines as well as diabetic supplies. The Senior Cloud Architect leads the planning, design, and engineering of enterprise-level infrastructure and platforms related to cloud computing. The Senior Cloud Architect work assignments involve moderately complex to complex issues where the analysis of situations or data requires an in-depth evaluation of variable factors. The Senior Cloud Architect performs technical planning, architecture development and modification of specifications for cloud computing environments. Develops specifications for new IT cloud computing products and service offerings. Assesses the compatibility and integration of products/services proposed as standards in order to ensure an integrated architecture across interdependent technologies. Begins to influence department's strategy. Makes decisions on moderately complex to complex issues regarding technical approach for project components, and work is performed without direction. Responsibilities • Advocate and define architecture vision from a strategic perspective, including internal and external platforms, tools, and systems. Required Qualifications • Bachelor's degree • 5 or more years of technical experience • Must be passionate about contributing to an organization focused on continuously improving consumer experiences Preferred Qualifications • Experience in SFCC Additional Information Humana and its subsidiaries require vaccinated associates who work outside of their home to submit proof of vaccination, including COVID-19 boosters. Associates who remain unvaccinated must either undergo weekly negative COVID testing OR wear a mask at all times while in a Humana facility or while working in the field.


A Texas town approved an AI border security camera

#artificialintelligence

The city council of Presidio, Texas, voted on June 7, 2021 to approve locating a new camera system for Customs and Border Patrol on city property. The Sentry camera is a re-deployable 30-foot-tall tower bristling with sensors and powered by solar panels. It's made by Anduril, a security technology startup. As the city council agenda notes, Presidio approved locating one such Sentry "on city property near the City of Presidio Waste Water Treatment Plant." Presidio, population 4,000, sits on the US side of the confluence of the Rio Grande and Rio Conchos rivers, across from Ojinaga in Mexico, in the broader Big Bend region of the state.


Protecting Personal Identifiable Information with Azure AI

#artificialintelligence

TLDR; The following post will outline both first party and open source techniques for detecting PII with Azure. Personally Identifiable information (PII), is any data that can be used used to identify a individuals such as names, driver's license number, SSNs, bank account numbers, passport numbers, email addresses and more. Many regulations from GDPR to HIPPA require strict protection of user privacy. If you are new to Azure you can get started a free subscription using the link below. Azure Cognitive Search is a cloud solution that provides developers APIs and tools for adding a rich search experience to their data, content and applications.


Spark Integration Services (The Bots Are Coming)

#artificialintelligence

Chat bots have been around since the'early days' of the Internet. I remember using them in mIRC, ICQ and AOL Instant Messenger in the late 90's in my high school days. Admittedly, I had no vision about their future potential and just thought they were cool. At the time, I was more interested in the network than the bots. I believe, what returned chat bots to the headlines is the Internet of Things ("IoT"), the evolution of Enterprise Chat, thanks to companies such as Slack and WeChat, and improvements to Artificial Intelligence ("AI").