XGBoost is one of the most used libraries fora data science. At the time XGBoost came into existence, it was lightning fast compared to its nearest rival Python's Scikit-learn GBM. But as the times have progressed, it has been rivaled by some awesome libraries like LightGBM and Catboost, both on speed as well as accuracy. I, for one, use LightGBM for most of the use cases where I have just got CPU for training. But when I have a GPU or multiple GPUs at my disposal, I still love to train with XGBoost.
Do you have benchmarks showing speed improvements over pandas? Pandas can be dreadfully slow, and a restricted implementation that uses only a subset of the "most useful" pandas features might be significantly faster. For instance, consider this benchmark for row and column access to a pandas DataFrame vs a dict of ndarrays columns. For row access, the fastest pandas way to iterate through rows (iterrows) is x6 slower than the simple dict implementation: 24ms vs 4ms. Furthermore, pandas DataFrame a column-based data structure is a whopping 36x slower than a dict of ndarrays for access to a single column of data.
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle. 2