comprehensive benchmark on eight tasks