guideline
Appendix T able of Contents
We provide the guidelines presented to the users for the creation of the dataset. To see some examples of how the guidelines can be applied, visit the examples document. You can use it to rate each guideline and leave feedback for each task. The user should be allowed to refuse to give up any information. Ask the user to elaborate or rephrase instead.
- North America > United States (0.14)
- Europe > Germany (0.14)
- North America > United States > Ohio (0.04)
- North America > United States > California (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Questionnaire & Opinion Survey (0.46)
- Research Report (0.34)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (0.68)
The Download: the case for AI slop, and helping CRISPR fulfill its promise
If I were to locate the moment AI slop broke through into popular consciousness, I'd pick the video of rabbits bouncing on a trampoline that went viral last summer. For many savvy internet users, myself included, it was the first time we were fooled by an AI video, and it ended up spawning a wave of almost identical generated clips. My first reaction was that, broadly speaking, all of this sucked. That's become a familiar refrain, in think pieces and at dinner parties. Everything online is slop now--the internet "enshittified," with AI taking much of the blame. But then friends started sharing AI clips in group chats that were compellingly weird, or funny.
- Asia > China (0.07)
- North America > United States > New York (0.05)
- North America > United States > New Jersey (0.05)
- (2 more...)
Overcoming Common Flaws in the Evaluation of Selective Classification Systems
Selective Classification, wherein models can reject low-confidence predictions, promises reliable translation of machine-learning based classification systems to real-world scenarios such as clinical diagnostics. While current evaluation of these systems typically assumes fixed working points based on pre-defined rejection thresholds, methodological progress requires benchmarking the general performance of systems akin to the AUROC in standard classification. In this work, we define 5 requirements for multi-threshold metrics in selective classification regarding task alignment, interpretability, and flexibility, and show how current approaches fail to meet them. We propose the Area under the Generalized Risk Coverage curve ( AUGRC), which meets all requirements and can be directly interpreted as the average risk of undetected failures. We empirically demonstrate the relevance of AUGRC on a comprehensive benchmark spanning 6 data sets and 13 confidence scoring functions. We find that the proposed metric substantially changes metric rankings on 5 out of the 6 data sets.
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- North America > United States (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine (0.93)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Education (0.68)
- Information Technology > Services (0.46)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Asia > Vietnam > Long An Province > Tân An (0.04)
- Asia > Indonesia > Bali (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
- Information Technology > Security & Privacy (0.68)
- Education (0.67)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
DiffusionPID: Interpreting Diffusion via Partial Information Decomposition
Text-to-image diffusion models have made significant progress in generating naturalistic images from textual inputs, and demonstrate the capacity to learn and represent complex visual-semantic relationships. While these diffusion models have achieved remarkable success, the underlying mechanisms driving their performance are not yet fully accounted for, with many unanswered questions surrounding what they learn, how they represent visual-semantic relationships, and why they sometimes fail to generalize.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)