Collaborating Authors


UN warns law enforcement against using 'big data' to discriminate


Police and border guards must combat racial profiling and ensure that their use of "big data" collected via artificial intelligence does not reinforce biases against minorities, United Nations experts said on Thursday. Companies that sell algorithmic profiling systems to public entities and private companies, often used in screening job applicants, must be regulated to prevent misuse of personal data that perpetuates prejudices, they said. "It's a rapidly developing technological means used by law enforcement to determine, using big data, who is likely to do what. And that's the danger of it," Verene Shepherd, a member of the UN Committee on the Elimination of Racial Discrimination, told Reuters. "We've heard about companies using these algorithmic methods to discriminate on the basis of skin colour," she added, speaking from Jamaica.

Adversarial representation learning for synthetic replacement of private attributes Machine Learning

Data privacy is an increasingly important aspect of many real-world big data analytics tasks. Data sources that contain sensitive information may have immense potential which could be unlocked using privacy enhancing transformations, but current methods often fail to produce convincing output. Furthermore, finding the right balance between privacy and utility is often a tricky tradeoff. In this work, we propose a novel approach for data privatization, which involves two steps: in the first step, it removes the sensitive information, and in the second step, it replaces this information with an independent random sample. Our method builds on adversarial representation learning which ensures strong privacy by training the model to fool an increasingly strong adversary. While previous methods only aim at obfuscating the sensitive information, we find that adding new random information in its place strengthens the provided privacy and provides better utility at any given level of privacy. The result is an approach that can provide stronger privatization on image data, and yet be preserving both the domain and the utility of the inputs, entirely independent of the downstream task. Increasing capacity and performance of modern machine learning models lead to increasing amounts of data required for training them (Goodfellow et al., 2016). However, collecting and using large datasets which may contain sensitive information about individuals is often impeded by increasingly strong privacy laws protecting individual rights, and the infeasibility of obtaining individual consent.

Precision Health Data: Requirements, Challenges and Existing Techniques for Data Security and Privacy Artificial Intelligence

Precision health leverages information from various sources, including omics, lifestyle, environment, social media, medical records, and medical insurance claims to enable personalized care, prevent and predict illness, and precise treatments. It extensively uses sensing technologies (e.g., electronic health monitoring devices), computations (e.g., machine learning), and communication (e.g., interaction between the health data centers). As health data contain sensitive private information, including the identity of patient and carer and medical conditions of the patient, proper care is required at all times. Leakage of these private information affects the personal life, including bullying, high insurance premium, and loss of job due to the medical history. Thus, the security, privacy of and trust on the information are of utmost importance. Moreover, government legislation and ethics committees demand the security and privacy of healthcare data. Herein, in the light of precision health data security, privacy, ethical and regulatory requirements, finding the best methods and techniques for the utilization of the health data, and thus precision health is essential. In this regard, firstly, this paper explores the regulations, ethical guidelines around the world, and domain-specific needs. Then it presents the requirements and investigates the associated challenges. Secondly, this paper investigates secure and privacy-preserving machine learning methods suitable for the computation of precision health data along with their usage in relevant health projects. Finally, it illustrates the best available techniques for precision health data security and privacy with a conceptual system model that enables compliance, ethics clearance, consent management, medical innovations, and developments in the health domain.

Why Businesses Should Adopt an AI Code of Ethics -- Now - InformationWeek


The issues of ethical development and deployment of applications using artificial intelligence (AI) technologies is rife with nuance and complexity. Because humans are diverse -- different genders, races, values and cultural norms -- AI algorithms and automated processes won't work with equal acceptance or effectiveness for everyone worldwide. What most people agree upon is that these technologies should be used to improve the human condition. There are many AI success stories with positive outcomes in fields from healthcare to education to transportation. But there have also been unexpected problems with several AI applications including facial recognition and unintended bias in numerous others.

Has Big Data ethics gone out the window?


Their argument is essentially, "If everyone else is doing the same thing in their respective areas, why should we be the ones to start to change?" Consequently, the situation has become chronic, and no solution is reached. Not even some of the most important public agencies that fund this kind of research in the United States take this requirement seriously. Even authors of academic or commercial studies that use data on health, find ways to get around administrative restrictions such as those imposed by the Family Educational Rights and Privacy Act (FERPA), the Health Insurance Portability and Accountability Act (HIPAA), and other regulatory legislation. The complaints of Kalev Leetaru--who has worked for Yahoo!, Google, Georgetown University, and the World Economic Forum Global Agenda Council on the Future of Government--are applicable to technicians and managers of various related activities: big data, data mining, machine learning, artificial intelligence, the internet of things, etc.

MIT, White House co-sponsor workshop on big-data privacy

AITopics Original Links

On Monday, MIT hosted a daylong workshop on big data and privacy, co-sponsored by the White House as part of a 90-day review of data privacy policy that President Barack Obama announced in a Jan. 17 speech on U.S. intelligence gathering. White House Counselor John Podesta, grounded by snow in Washington, delivered his keynote address and took questions over the phone. But Secretary of Commerce Penny Pritzker was on hand, as were MIT President L. Rafael Reif and a host of computer scientists from MIT, Harvard University, and Microsoft Research, who spoke about the technical challenges of protecting privacy in big data sets. In his brief opening remarks, Reif mentioned the promise of big data and the difficulties that managing it responsibly poses, and he offered the example of MIT's online-learning initiative, MITx, to illustrate both. "We want to study the huge quantities of data about how MITx students interact with our digital courses," he said.

Threshold Bandits, With and Without Censored Feedback

Neural Information Processing Systems

We consider the \emph{Threshold Bandit} setting, a variant of the classical multi-armed bandit problem in which the reward on each round depends on a piece of side information known as a \emph{threshold value}. The learner selects one of $K$ actions (arms), this action generates a random sample from a fixed distribution, and the action then receives a unit payoff in the event that this sample exceeds the threshold value. We consider two versions of this problem, the \emph{uncensored} and \emph{censored} case, that determine whether the sample is always observed or only when the threshold is not met. Using new tools to understand the popular UCB algorithm, we show that the uncensored case is essentially no more difficult than the classical multi-armed bandit setting. Finally we show that the censored case exhibits more challenges, but we give guarantees in the event that the sequence of threshold values is generated optimistically.

PrivLogit: Efficient Privacy-preserving Logistic Regression by Tailoring Numerical Optimizers Machine Learning

Safeguarding privacy in machine learning is highly desirable, especially in collaborative studies across many organizations. Privacy-preserving distributed machine learning (based on cryptography) is popular to solve the problem. However, existing cryptographic protocols still incur excess computational overhead. Here, we make a novel observation that this is partially due to naive adoption of mainstream numerical optimization (e.g., Newton method) and failing to tailor for secure computing. This work presents a contrasting perspective: customizing numerical optimization specifically for secure settings. We propose a seemingly less-favorable optimization method that can in fact significantly accelerate privacy-preserving logistic regression. Leveraging this new method, we propose two new secure protocols for conducting logistic regression in a privacy-preserving and distributed manner. Extensive theoretical and empirical evaluations prove the competitive performance of our two secure proposals while without compromising accuracy or privacy: with speedup up to 2.3x and 8.1x, respectively, over state-of-the-art; and even faster as data scales up. Such drastic speedup is on top of and in addition to performance improvements from existing (and future) state-of-the-art cryptography. Our work provides a new way towards efficient and practical privacy-preserving logistic regression for large-scale studies which are common for modern science.

Google's Brain Team: 'AIs can be racist and sexist but we can change that'


Google's methodology could have applications in any scoring system, such as a bank's credit-scoring system. In an age where data is driving decisions about everything from creditworthiness, to insurance and criminal justice, machines could well end up making bad predictions that just reflect and reinforce past discrimination. The Obama Administration outlined its concerns about this issue in its 2014 big-data report, warning that automated discrimination against certain groups could be the inadvertent outcome of the way big-data technologies are used. While privacy and regulation will slow the pace of adoption, AI will bring some profound changes to healthcare. Using social networks or location data to assess a person's creditworthiness could boost access to finance for people who don't have a credit history.

Your hiring algorithm might be racist - Philly


If you're wondering why a company's staff lacks diversity, you might want to take a look at the computers behind their hiring process. Corporations are using technology in the hiring process in order to remedy historical and routine applicant discrimination, but the same technology can end up simply reinforcing this discrimination, said postdoctoral research associate Solon Barocas during "The Intersection of Data and Poverty," a Philly Tech Week 2016 presented by Comcast symposium organized by Community Legal Services and Philadelphia Legal Assistance and held at Montgomery McCracken Walker & Rhoads in Center City. Barocas spoke on a panel about "How Big and Open Data Harms the Poor," which was focused on the unintended consequences of data technology on vulnerable populations. Companies that use machine learning and big data in their hiring process use "training data," which is typically taken from prior and current employees. A statistical process then automatically discovers the traits that correlate to high performance among the training data and looks for those traits in the applicant pools.