privacy regulation
Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG
Fang, Chenhao, Larson, Derek, Zhu, Shitong, Zeng, Sophie, Summer, Wendy, Peng, Yanqing, Hulovatyy, Yuriy, Rao, Rajeev, Forgues, Gabriel, Pudota, Arya, Goncalves, Alex, Robert, Hervé
This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies.
Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory
Li, Haoran, Fan, Wei, Chen, Yulin, Cheng, Jiayang, Chu, Tianshu, Zhou, Xuebing, Hu, Peizhao, Song, Yangqiu
Privacy research has attracted wide attention as individuals worry that their private data can be easily leaked during interactions with smart devices, social platforms, and AI applications. Computer science researchers, on the other hand, commonly study privacy issues through privacy attacks and defenses on segmented fields. Privacy research is conducted on various sub-fields, including Computer Vision (CV), Natural Language Processing (NLP), and Computer Networks. Within each field, privacy has its own formulation. Though pioneering works on attacks and defenses reveal sensitive privacy issues, they are narrowly trapped and cannot fully cover people's actual privacy concerns. Consequently, the research on general and human-centric privacy research remains rather unexplored. In this paper, we formulate the privacy issue as a reasoning problem rather than simple pattern matching. We ground on the Contextual Integrity (CI) theory which posits that people's perceptions of privacy are highly correlated with the corresponding social context. Based on such an assumption, we develop the first comprehensive checklist that covers social identities, private attributes, and existing privacy regulations. Unlike prior works on CI that either cover limited expert annotated norms or model incomplete social context, our proposed privacy checklist uses the whole Health Insurance Portability and Accountability Act of 1996 (HIPAA) as an example, to show that we can resort to large language models (LLMs) to completely cover the HIPAA's regulations. Additionally, our checklist also gathers expert annotations across multiple ontologies to determine private information including but not limited to personally identifiable information (PII). We use our preliminary results on the HIPAA to shed light on future context-centric privacy research to cover more privacy regulations, social norms and standards.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New Jersey (0.04)
- (4 more...)
Top lawmaker on AI working group says privacy regs should be a priority for Congress
Kara Frederick, tech director at the Heritage Foundation, discusses the need for regulations on artificial intelligence as lawmakers and tech titans discuss the potential risks. The vice chair of Congress' artificial intelligence caucus says privacy regulations need to be a top short-term priority for Congress as Washington looks to get to grips with the rapidly emerging technology – which he says poses risks, but could be a catalyst for the next expansion of the U.S. economy. Rep. Jay Obernolte, R-Calif., told Fox News Digital in an interview that he is an optimist when it comes to the potential for artificial intelligence, but Congress needs to make sure it is protecting Americans from the potential negatives and disruption that AI brings. "I think in the short term, the ability of AI to pierce through digital data privacy and to re-aggregate data that has supposedly been disaggregated and use it to create behavioral models that could be used to influence behavior, that's very concerning, and that's something that the government definitely needs to play a role in mitigating," Obernolte said. Rep. Jay Obernolte has a graduate degree in artificial intelligence.
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > California (0.05)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules for Secure and Confidential Data Synthesis
Kotal, Anantaa, Elluri, Lavanya, Gupta, Deepti, Mandalapu, Varun, Joshi, Anupam
Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the EU GDPR, the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats and secure data based on applicable regulatory policy rules.
- North America > United States > Maryland > Baltimore County (0.14)
- North America > United States > Texas (0.05)
- North America > United States > Maryland > Baltimore (0.04)
- (3 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Food & Agriculture > Agriculture (1.00)
- (2 more...)
Synthetic Data 101: What are the use cases for synthetic data?
Synthetic data accurately mimics real-world data. It serves as a placeholder for production data in development and testing workflows and is also used to improve the quality of machine learning algorithms. Common use cases revolve around product development/testing, machine learning, data analysis, and data privacy and security. For example, financial institutions use synthetic data to generate reliable market data for algorithmic trading and risk analysis, while healthcare providers use it to analyze patient data without compromising sensitive patient information. Additionally, synthetic data is used in machine learning algorithms to improve performance and accuracy and thus accelerate the development process.
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.36)
Extracting personal information from anonymous cell phone data using machine learning
A research team at Illinois Institute of Technology has extracted personal information, specifically protected characteristics like age and gender, from anonymous cell phone data using machine learning and artificial intelligence algorithms, raising questions about data security. The research was conducted by an interdisciplinary team of three Illinois Tech faculty including Vijay K. Gurbani, research associate professor of computer science; Matthew Shapiro, professor of political science; and Yuri Mansury, associate professor of social sciences. They were joined by Illinois Tech alumni Lida Kuang (M.S. CS '19) and Samruda Pobbathi (M.S. CS '19) who worked with Gurbani to publish "Predicting Age and Gender from Network Telemetry: Implications for Privacy and Impact on Policy" in PLOS One. The researchers used data from a Latin American cell phone company to successfully estimate the gender and age of individual users through their private communications with relative ease. The team developed a neural network model to estimate gender with 67% accuracy, which outperforms modern techniques such as decision tree, random forest, and gradient boosting models by a significant margin.
How data protection can benefit from artificial intelligence
As the attack surfaces of organisations has grown since lockdown took hold and employees migrated to remote devices, pre-pandemic data protection practices have since needed a revamp of methods. Increasingly, this has entailed the automation of processes, powered by technologies such as artificial intelligence (AI), which is capable of making data protection more efficient if implemented properly. With this in mind, we take a look at some of the most valuable ways in which artificial intelligence lends itself towards data protection. This article will explore how artificial intelligence is set to impact organisations in the future, gauging the insights of experts in the space. Because AI relies heavily on data algorithms, organisations may feel hesitant about using the technology to aid data protection practices.
A Proposal for Amending Privacy Regulations to Tackle the Challenges Stemming from Combining Data Sets
Erdélyi, Gábor, Erdélyi, Olivia J., Kempa-Liehr, Andreas W.
We focus on some shortcomings in current data protection regulation's ability to adequately address the ramifications of AI-driven data processing practices, in particular those of combining data sets. We propose that privacy regulation relies less on individuals' privacy expectations and recommend regulatory reform in two directions: (1) abolishing the distinction between personal and anonymized data for the purposes of triggering the application of data protection laws and (2) developing methods to prioritize regulatory intervention based on the level of privacy risk posed by individual data processing actions. This is an interdisciplinary paper that intends to build a bridge between the various communities involved in privacy research. We put special emphasis on linking technical notions with their regulatory implications and introducing the relevant technical and legal terminology in use to foster more efficient coordination between the policymaking and technical communities and enable a timely solution of the problems raised.
- North America > United States (1.00)
- Europe > Hungary > Budapest > Budapest (0.05)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- (4 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
Q&A With AdSkate Founders
We recently took a deep dive into how AdSkate makes digital advertising smarter with AI contextual targeting. Now we sit down with the founders to learn more about their stories. Tell me about the moment that sparked the idea for your product? Shreyas and Salil were exploring different applications for computer vision and machine learning models and they discussed the power of these technologies with Akaash. Akaash who was working at a large ad agency at the time suggested that the digital ad space right now is facing a lot of problems around privacy and brand safety aspects and this tech might just be the solution.
- Marketing (0.75)
- Information Technology (0.71)
Unstructured Privacy Data Risks: AI Can Help
As per Gartner, 65% of world population's data will be impacted due to privacy regulations by 2023. In fact, it might happen sooner as most countries wish to provide economic nationalism by restricting cross country data transfers and data rationing by global technology businesses. Another Independent trend coupled with the rise of tighter privacy regulations is the volume of unstructured data being collected. Combined, both structured & unstructured data are projected to grow at the rate of 7-12% on an annual basis. Technological advances along with ever falling storage prices have made it quite easy to collect unstructured data from the customers.