Web crawler strategies for web pages under robot.txt restriction