Do personality tests generalize to Large Language Models?