Real-Time Aggregation Features for Machine Learning (Part 1)
Machine Learning features are derived from an organization's raw data and provide a signal to an ML model. A very common type of feature transformation is a rolling time window aggregation. For example, you may use the rolling 30-minute transaction count of a credit card to predict the likelihood that a given transaction is fraudulent. It's easy enough to calculate rolling time window aggregations offline using window functions in a SQL query against your favorite data warehouse. However, serving this type of feature for real-time predictions in production poses a difficult problem: How can you efficiently serve such a feature that aggregates a lot of raw events ( 1000s), at a very high scale ( 1000s QPS), at low serving latency ( 100ms), at high freshness ( 1s) and with high feature accuracy?
Jun-19-2021, 17:15:35 GMT
- Technology: