Combining Reinforcement Learning and Configuration Checking for Maximum k-plex Problem