Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors