real-world knowledge
CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM Evaluation
Zhao, Jingqian, Wang, Bingbing, Tu, Geng, Zhang, Yice, Wang, Qianlong, Liang, Bin, Li, Jing, Xu, Ruifeng
Data contamination poses a significant challenge to the fairness of LLM evaluations in natural language processing tasks by inadvertently exposing models to test data during training. Current studies attempt to mitigate this issue by modifying existing datasets or generating new ones from freshly collected information. However, these methods fall short of ensuring contamination-resilient evaluation, as they fail to fully eliminate pre-existing knowledge from models or preserve the semantic complexity of the original datasets. To address these limitations, we propose \textbf{CoreEval}, a \textbf{Co}ntamination-\textbf{re}silient \textbf{Eval}uation strategy for automatically updating data with real-world knowledge. This approach begins by extracting entity relationships from the original data and leveraging the GDELT database to retrieve relevant, up-to-date knowledge. The retrieved knowledge is then recontextualized and integrated with the original data, which is refined and restructured to ensure semantic coherence and enhanced task relevance. Ultimately, a robust data reflection mechanism is employed to iteratively verify and refine labels, ensuring consistency between the updated and original datasets. Extensive experiments on updated datasets validate the robustness of CoreEval, demonstrating its effectiveness in mitigating performance overestimation caused by data contamination.
Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Tewel, Yoad, Shalev, Yoav, Schwartz, Idan, Wolf, Lior
Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences. While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image. In this work, we repurpose such models to generate a descriptive text given an image at inference time, without any further training or tuning step. This is done by combining the visual-semantic model with a large language model, benefiting from the knowledge in both web-scale models. The resulting captions are much less restrictive than those obtained by supervised captioning methods. Moreover, as a zero-shot learning method, it is extremely flexible and we demonstrate its ability to perform image arithmetic in which the inputs can be either images or text and the output is a sentence. This enables novel high-level vision capabilities such as comparing two images or solving visual analogy tests.
Machines that answer back
"PRESS one if you are calling to check your balance, press two to set up an appointment." Such automated telephone services have been annoying callers looking for simple help for several years now. On the telephone, people want to deal with a human being, not a recorded voice putting them through hoops (see article). Yet that does not have to be the case. Consider the success that some companies have had with services that respond automatically to inquiries sent by e-mail.
NETL: A system for representing and using real-world knowledge
This report describes a knowledge-base system in which the information is stored in a network of small parallel processing elements??de and link units??ich are controlled by an external serial computer. This network is similar to the semantic network system of Quillian, but is much more tightly controlled. Such a network can perform certain critical deductions and searches very quickly; it avoids many of the problems of current systems, which must use complex heuristics to limit and guided their searches. It is argued (with examples) that the key operation in a knowledge-base system is the intersection of large explicit and semi-explicit sets. The parallel network system does this in a small, essentially constant number of cycles; a serial machine takes time proportional to the size of the sets, except in special cases.