Collaborating Authors

Model-Based Clustering of Nonparametric Weighted Networks Machine Learning

Water pollution is a major global environmental problem, and it poses a great environmental risk to public health and biological diversity. This work is motivated by assessing the potential environmental threat of coal mining through increased sulfate concentrations in river networks, which do not belong to any simple parametric distribution. However, existing network models mainly focus on binary or discrete networks and weighted networks with known parametric weight distributions. We propose a principled nonparametric weighted network model based on exponential-family random graph models and local likelihood estimation and study its model-based clustering with application to large-scale water pollution network analysis. We do not require any parametric distribution assumption on network weights. The proposed method greatly extends the methodology and applicability of statistical network models. Furthermore, it is scalable to large and complex networks in large-scale environmental studies and geoscientific research. The power of our proposed methods is demonstrated in simulation studies.

GADGET GOLD MINE Apple robots dig 40M out of discarded iPhones

FOX News

Apple harvested almost 40 million worth of gold from recycled gadgets last year, and is now deploying robots to take iPhones apart in a major environmental push. In its latest annual environmental responsibility report, which was published last week, Apple explained that it gathered 2,204 pounds of recycled gold during its fiscal year 2015. The gold, which weighs more than a ton, is worth 39.6 million. Apple recovered more than 63 million pounds of various materials via its "take-back" recycling initiatives in 2015, according to the company's environmental report. The tech giant gathered over 23 million pounds of steel, making it the most recycled material, and more than 13 million pounds of plastics.

Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing

AAAI Conferences

Crowdsourcing is an effective tool for scalable data annotation in both research and enterprise contexts. Due to crowdsourcing’s open participation model, quality assurance is critical to the success of any project. Present methods rely on EM-style post-processing or manual annotation of large gold standard sets. In this paper we present an automated quality assurance process that is inexpensive and scalable. Our novel process relies on programmatic gold creation to provide targeted training feedback to workers and to prevent common scamming scenarios. We find that it decreases the amount of manual work required to manage crowdsourced labor while improving the overall quality of the results.

BHP lifts lid on major data science project


BHP is applying data science to understand how it services machines located across its mines, in the hope of saving $79 million this financial year alone.