Collaborating Authors


Big Data Industry Predictions for 2021 - insideBIGDATA


But the big data industry has significant inertia moving into 2021. In order to give our valued readers a pulse on important new trends leading into next year, we here at insideBIGDATA heard from all our friends across the vendor ecosystem to get their insights, reflections and predictions for what may be coming. We were very encouraged to hear such exciting perspectives. Even if only half actually come true, Big Data in the next year is destined to be quite an exciting ride. The "analytic divide" is going to get worse. Like the much-publicized "digital divide" we're also seeing the emergence of an "analytic divide." Many companies were driven to invest in analytics due to the pandemic, while others have been forced to cut anything they didn't view as critical to keep the lights on – and a proper investment in analytics was, for these organizations, analytics was on the chopping block. This means that the analytic divide will further widen in 2021, and this trend will continue for ...

How We'll Conduct Algorithmic Audits in the New Economy - InformationWeek


Algorithms are the heartbeat of applications, but they may not be perceived as entirely benign by their intended beneficiaries. Most educated people know that an algorithm is simply any stepwise computational procedure. Most computer programs are algorithms of one sort of another. Embedded in operational applications, algorithms make decisions, take actions, and deliver results continuously, reliably, and invisibly. But on the odd occasion that an algorithm stings -- encroaching on customer privacy, refusing them a home loan, or perhaps targeting them with a barrage of objectionable solicitation -- stakeholders' understandable reaction may be to swat back in anger, and possibly with legal action.

Global Big Data Conference


AutoML is poised to turn developers into data scientists -- and vice versa. Here's how AutoML will radically change data science for the better. In the coming decade, the data scientist role as we know it will look very different than it does today. But don't worry, no one is predicting lost jobs, just changed jobs. Data scientists will be fine -- according to the Bureau of Labor Statistics, the role is still projected to grow at a higher than average clip through 2029.

Federated Learning for Privacy-Preserving AI

Communications of the ACM

There has been remarkable success of machine learning (ML) technologies in empowering practical artificial intelligence (AI) applications, such as automatic speech recognition and computer vision. However, we are facing two major challenges in adopting AI today. One is that data in most industries exist in the form of isolated islands. The other is the ever-increasing demand for privacy-preserving AI. Conventional AI approaches based on centralized data collection cannot meet these challenges.

Deconvoluting Kernel Density Estimation and Regression for Locally Differentially Private Data Machine Learning

Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.

Senior Data Engineer 3 - Machine Learning and Cyber in RICHLAND, Washington, United States


Do you want to create a legacy of meaningful research for the greater good? Do you want to lead and contribute to work in support of an organization that addresses some of today's most challenging problems that face our Nation? Then join us in the Data Sciences and Analytics Group at the Pacific Northwest National Laboratory (PNNL)! For more than 50 years, PNNL has advanced the frontiers of science and engineering in the service of our nation and the world in the areas of energy, the environment and national security. PNNL is committed to advancing the state-of-the-art in artificial intelligence through applied machine learning and deep learning to support scientific discovery and our sponsors' missions.

Precision Health Data: Requirements, Challenges and Existing Techniques for Data Security and Privacy Artificial Intelligence

Precision health leverages information from various sources, including omics, lifestyle, environment, social media, medical records, and medical insurance claims to enable personalized care, prevent and predict illness, and precise treatments. It extensively uses sensing technologies (e.g., electronic health monitoring devices), computations (e.g., machine learning), and communication (e.g., interaction between the health data centers). As health data contain sensitive private information, including the identity of patient and carer and medical conditions of the patient, proper care is required at all times. Leakage of these private information affects the personal life, including bullying, high insurance premium, and loss of job due to the medical history. Thus, the security, privacy of and trust on the information are of utmost importance. Moreover, government legislation and ethics committees demand the security and privacy of healthcare data. Herein, in the light of precision health data security, privacy, ethical and regulatory requirements, finding the best methods and techniques for the utilization of the health data, and thus precision health is essential. In this regard, firstly, this paper explores the regulations, ethical guidelines around the world, and domain-specific needs. Then it presents the requirements and investigates the associated challenges. Secondly, this paper investigates secure and privacy-preserving machine learning methods suitable for the computation of precision health data along with their usage in relevant health projects. Finally, it illustrates the best available techniques for precision health data security and privacy with a conceptual system model that enables compliance, ethics clearance, consent management, medical innovations, and developments in the health domain.

Event Prediction in the Big Data Era: A Systematic Survey Artificial Intelligence

Events are occurrences in specific locations, time, and semantics that nontrivially impact either our society or the nature, such as civil unrest, system failures, and epidemics. It is highly desirable to be able to anticipate the occurrence of such events in advance in order to reduce the potential social upheaval and damage caused. Event prediction, which has traditionally been prohibitively challenging, is now becoming a viable option in the big data era and is thus experiencing rapid growth. There is a large amount of existing work that focuses on addressing the challenges involved, including heterogeneous multi-faceted outputs, complex dependencies, and streaming data feeds. Most existing event prediction methods were initially designed to deal with specific application domains, though the techniques and evaluation procedures utilized are usually generalizable across different domains. However, it is imperative yet difficult to cross-reference the techniques across different domains, given the absence of a comprehensive literature survey for event prediction. This paper aims to provide a systematic and comprehensive survey of the technologies, applications, and evaluations of event prediction in the big data era. First, systematic categorization and summary of existing techniques are presented, which facilitate domain experts' searches for suitable techniques and help model developers consolidate their research at the frontiers. Then, comprehensive categorization and summary of major application domains are provided. Evaluation metrics and procedures are summarized and standardized to unify the understanding of model performance among stakeholders, model developers, and domain experts in various application domains. Finally, open problems and future directions for this promising and important domain are elucidated and discussed.

Privacy-preserving Artificial Intelligence Techniques in Biomedicine Artificial Intelligence

Artificial intelligence (AI) has been successfully applied in numerous scientific domains including biomedicine and healthcare. Here, it has led to several breakthroughs ranging from clinical decision support systems, image analysis to whole genome sequencing. However, training an AI model on sensitive data raises also concerns about the privacy of individual participants. Adversary AIs, for example, can abuse even summary statistics of a study to determine the presence or absence of an individual in a given dataset. This has resulted in increasing restrictions to access biomedical data, which in turn is detrimental for collaborative research and impedes scientific progress. Hence there has been an explosive growth in efforts to harness the power of AI for learning from sensitive data while protecting patients' privacy. This paper provides a structured overview of recent advances in privacy-preserving AI techniques in biomedicine. It places the most important state-of-the-art approaches within a unified taxonomy, and discusses their strengths, limitations, and open problems.

AI Regulation: Has the Time Arrived? - InformationWeek


Is artificial intelligence getting too smart (and intrusive) for its own good? A growing number of nations have concluded that it's time to take a close look at AI's impact on an array of critical issues, including privacy, security, human rights, crime, and finance. A proposal for an international oversight panel, the Global Partnership on AI, already has the support of six members of The Group of Seven (G7), an international organization comprised of nations with the largest and most advanced economies. The G7's dominant member, the United States, remains the only holdout, claiming that regulation could hamper the development of AI technologies and hurt US businesses. The Global Partnership on AI and OECD's G20 AI principles represent a good first step toward building a worldwide AI regulatory structure, noted Robert L. Foehl, an executive-in-residence for business law and ethics at Ohio University.