TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training