Goto

Collaborating Authors

Understanding Database Reconstruction Attacks on Public Data

Communications of the ACM

There exists a solution universe of all the possible solutions to this set of constraints. If the solution universe contains a single possible solution, then the published statistics completely reveal the underlying confidential data--provided that noise was not added to either the microdata or the tabulations as a disclosure-avoidance mechanism. If there are multiple satisfying solutions, then any element (person) in common among all of the solutions is revealed. If the equations have no solution, either the set of published statistics is inconsistent with the fictional statistical agency's claim that it is tabulated from a real confidential database or an error was made in that tabulation. This doesn't mean that a high-quality reconstruction is not possible.


An Unsupervised Machine Learning Approach to Assess the ZIP Code Level Impact of COVID-19 in NYC

arXiv.org Machine Learning

New York City has been recognized as the world's epicenter of the novel Coronavirus pandemic. To identify the key inherent factors that are highly correlated to the Increase Rate of COVID-19 new cases in NYC, we propose an unsupervised machine learning framework. Based on the assumption that ZIP code areas with similar demographic, socioeconomic, and mobility patterns are likely to experience similar outbreaks, we select the most relevant features to perform a clustering that can best reflect the spread, and map them down to 9 interpretable categories. We believe that our findings can guide policy makers to promptly anticipate and prevent the spread of the virus by taking the right measures.


A Knowledge Graph-based Approach for Exploring the U.S. Opioid Epidemic

arXiv.org Artificial Intelligence

The United States is in the midst of an opioid epidemic with recent estimates indicating that more than 130 people die every day due to drug overdose. The over-prescription and addiction to opioid painkillers, heroin, and synthetic opioids, has led to a public health crisis and created a huge social and economic burden. Statistical learning methods that use data from multiple clinical centers across the US to detect opioid over-prescribing trends and predict possible opioid misuse are required. However, the semantic heterogeneity in the representation of clinical data across different centers makes the development and evaluation of such methods difficult and non-trivial. We create the Opioid Drug Knowledge Graph (ODKG) -- a network of opioid-related drugs, active ingredients, formulations, combinations, and brand names. We use the ODKG to normalize drug strings in a clinical data warehouse consisting of patient data from over 400 healthcare facilities in 42 different states. We showcase the use of ODKG to generate summary statistics of opioid prescription trends across US regions. These methods and resources can aid the development of advanced and scalable models to monitor the opioid epidemic and to detect illicit opioid misuse behavior. Our work is relevant to policymakers and pain researchers who wish to systematically assess factors that contribute to opioid over-prescribing and iatrogenic opioid addiction in the US.


Predicting Air Pollution with Prophet on GCP

#artificialintelligence

GCP offers a suite of cloud technologies with fully managed and serverless solutions that make processing, storing, and analyzing data easy. This analysis will utilize BigQuery, Geo Viz, and AI platform. GCP offers $300 of trial credits to new users. Additionally, BigQuery comes with 1TB of free processing a month. This demo uses a fraction of these credits.


A Tale of Three Datasets

Communications of the ACM

Internet access in an LTE network is available through base stations, known as eNodeBs, which the network provider operates. User equipment (UE), such as smartphones, tablets, or LTE modems, connects to the eNodeB over the radio link. The eNodeB connects to a centralized cellular core, known as the evolved packet core (EPC), typically via a wired link forming a middle-mile connection. The EPC consists of several network elements, including a packet data network gateway (PGW), which is the connecting node between an end-user device and the public Internet. Thus, LTE broadband access depends on multiple factors, including radio coverage, middle-mile capacity, and interconnection links with other networks--transit providers and content providers, for instance--in the public Internet. However, the focus of this article is to understand last-mile LTE connectivity characterized by the radio coverage of the eNodeB.