The KL3M Data Project: Copyright-Clean Training Resources for Large Language Models