An Index-based Approach for Efficient and Effective Web Content Extraction