Goto

Collaborating Authors

 shoehorn


Extract, Shoehorn, and Load

Communications of the ACM

A lot of data is moved from system to system in an important and increasing part of the computing landscape. This is traditionally known as ETL (extract, transform, and load). While many systems are extremely good at this process, the source for the extraction and the destination for the load frequently have different representations for their data. It is common for this transformation to squeeze, truncate, or pad the data to make it fit into the target. This is really like using a shoehorn to fit into a shoe that is too small. Sometimes it's a needed step.


Technical Perspective: Expressive Probabilistic Models and Scalable Method of Moments

Communications of the ACM

Across diverse fields, investigators face problems and opportunities involving data. Scientists, scholars, engineers, and other analysts seek new methods to ingest data, extract salient patterns, and then use the results for prediction and understanding. These methods come from machine learning (ML), which is quickly becoming core to modern technological systems, modern scientific workflow, and modern approaches to understanding data. The classical approach to solving a problem with ML follows the "cookbook" approach, one where the scientist shoehorns her data and problem to match the inputs and outputs of a reliable ML method. This strategy has been successful in many domains--examples include spam filtering, speech recognition, and movie recommendation--but it can only take us so far.