1c6bed78d3813886d3d72595dbecb80b-Supplemental-Datasets_and_Benchmarks.pdf
–Neural Information Processing Systems
Table 4 contains the full set of topics for the k " 30LDA model introduced in 4.406 Table 4: LDA[6] topic modeling outputs (k=30 topics) when trained on a random sample of documents from mmc4. Topic frequencies are determined by taking the mean distribution over documents in the corpus. Topic names are generated by GPT-4 conditioned on the top 20 words for each topic, prompted by a request for a short 1-2 word summary. Table 5 and Table 6 list the top-50 most frequent top-level domains for documents and images as408 discussed in 4. We show domain statistics in both mmc4and mmc4-core.409 The symbol "*" is employed to denote specific patterns, such as digits or location acronyms, commonly utilized to differentiate sub-sites within the same domain.
Neural Information Processing Systems
Apr-25-2026, 14:22:37 GMT
- Technology: