Learning curves theory for hierarchically compositional data with power-law distributed features