Papers in Production Lightning Talks

#artificialintelligence 

Shoup: I'm going to share very little of my personal knowledge, in fact, none of it, but I'm going to talk about a cool paper that I really like. Then Gwen [Shapira] is going to talk about another cool paper and Roland [Meertens] is going to talk about yet another cool paper. The one I want to talk about is a paper that's around using machine learning to do database indexing better. This is a picture of my bookshelf at home. A while ago, I bought myself a box set of "The Art of Computer Programming", which has basically all of computer science algorithms written by or assembled by Don Knuth. There's 4a, so he's still working on completing the thing, hopefully, that will happen. When we're choosing a data structure, typically we're choosing it in this way, we are trying to look for time complexity, how fast is it going to run, and space complexity, how big is it going to be? We typically evaluate those things asymptotically, we're not looking as much at real-world workloads, but looking at what are the complexity characteristics of this thing at the limit when things get very large? We're also, and this is critical, looking at those things without having seen the data and without having seen typically the usage pattern. We're doing is we're saying what is the least worst time and space complexity, given an arbitrary data distribution and an arbitrary usage pattern? It seems like we could do a little better than that, that's what this paper is about. What we'd like to be able to ask or to be able to answer is how could we achieve the best time/space complexity given a specific real-world data distribution and a specific real-world usage pattern.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found