Do you have benchmarks showing speed improvements over pandas? Pandas can be dreadfully slow, and a restricted implementation that uses only a subset of the "most useful" pandas features might be significantly faster. For instance, consider this benchmark for row and column access to a pandas DataFrame vs a dict of ndarrays columns. For row access, the fastest pandas way to iterate through rows (iterrows) is x6 slower than the simple dict implementation: 24ms vs 4ms. Furthermore, pandas DataFrame a column-based data structure is a whopping 36x slower than a dict of ndarrays for access to a single column of data.
Good options exist for numeric data but text is a pain. Categorical dtypes are a good option. I need to read and write Pandas DataFrames to disk. Typically we use libraries like pickle to serialize Python objects. For dask.frame we really care about doing this quickly so we're going to also look at a few alternatives.
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle. 2