artifact
Towards Reliable LLM Evaluation: Correcting the Winner's Curse in Adaptive Benchmarking
Xu, Yang, Zhang, Jiefu, Sun, Haixiang, Zhou, Zihan, Cao, Tianyu, Aggarwal, Vaneet
Adaptive prompt and program search makes LLM evaluation selection-sensitive. Once benchmark items are reused inside tuning, the observed winner's score need not estimate the fresh-data performance of the full tune-then-deploy procedure. We study inference for this procedure-level target under explicit tuning budgets. We propose SIREN, a selection-aware repeated-split reporting protocol that freezes the post-search shortlist, separates splitwise selection from held-out evaluation, and uses an item-level Gaussian multiplier bootstrap for uncertainty quantification. In a fixed-shortlist regime with smooth stabilized selection, the estimator admits a first-order item-level representation, and the bootstrap yields valid simultaneous inference on a finite budget grid. This supports confidence intervals for procedureperformance curves and pre-specified equal-budget and cross-budget comparisons. Controlled simulations and MMLU-Pro tuning experiments show that winnerbased reporting can be optimistic and can change deployment conclusions, while SIREN remains close to the finite-sample reporting target. Codes are available at https://github.com/jznmsl/siren.
Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos
Human motion capture from monocular videos has made significant progress in recent years. However, modern approaches often produce temporal artifacts, e.g. in form of jittery motion and struggle to achieve smooth and physically plausible motions. Explicitly integrating physics, in form of internal forces and exterior torques, helps alleviating these artifacts. Current state-of-the-art approaches make use of an automatic PD controller to predict torques and reaction forces in order to re-simulate the input kinematics, i.e. the joint angles of a predefined skeleton. However, due to imperfect physical models, these methods often require simplifying assumptions and extensive preprocessing of the input kinematics to achieve good performance.
Can We Leave Deepfake Data Behind in Training Deepfake Detector?
The generalization ability of deepfake detectors is vital for their applications in real-world scenarios. One effective solution to enhance this ability is to train the models with manually-blended data, which we termed ''blendfake'', encouraging models to learn generic forgery artifacts like blending boundary. Interestingly, current SoTA methods utilize blendfake $\textit{without}$ incorporating any deepfake data in their training process. This is likely because previous empirical observations suggest that vanilla hybrid training (VHT), which combines deepfake and blendfake data, results in inferior performance to methods using only blendfake data (so-called "1+1<2"). Therefore, a critical question arises: Can we leave deepfake behind and rely solely on blendfake data to train an effective deepfake detector? Intuitively, as deepfakes also contain additional informative forgery clues ($\textit{e.g.,}$ deep generative artifacts), excluding all deepfake data in training deepfake detectors seems counter-intuitive.
Roman artifact discovered in the Americas shatters New World history as we know it
THE LOST WEDDING PHOTOS: See JFK Jr and Carolyn Bessette at their secret nuptials... and read every intimate detail of ultra-private ceremony Tulsi Gabbard lets Iran nuke bombshell slip as Senate hearing spirals for Trump's embattled spy chief Candace Owens's sickening low-blow at Karoline Leavitt as Iran war sparks wild attacks Lunatic Megyn Kelly is FINALLY ruined! Her appalling X-rated smear of my friend proves it... but now I know her truly disturbing plan: JOSH HAMMER Inside the epidemic of midlife women who are repulsed by their husbands, the age and'vital statistics' that make men most at risk - and the telltale signs YOUR marriage is about to die: Special report by SADIE NICHOLAS Meghan gives glimpse of'mama's little helpers' Archie and Lilibet in'behind the scenes' video of her latest As Ever launch Shameful hypocrisy of NASCAR star Daniel Suarez's nepo-baby wife: 'Victim' mask slips as she ignites new Las Vegas drama... and dark family past rears its ugly head Princess Kate dons her favourite tiara and the late Queen's earrings as she arrives at King's banquet for the Nigerian President in country's first state visit in almost 40 years Everything JFK Jr told friends about his love affair with'sexual dynamo' Madonna... her unprintable pillow talk... and his perverse incest request that she couldn't go through with Site of'Jesus' crucifixion' forced to shut for Holy Week in unprecedented move tied to biblical prophecies of the Antichrist Ugly new Nicole Kidman and Keith Urban divorce fight ERUPTS: Her friends share humiliating details of'midlife crisis'... and reveal brutal REAL reason daughter Sunday Rose'snubbed' him Outrage after Seattle museum vandal destroys $250,000 of famous Dale Chihuly glass at city's museum dedicated to him Amanda Bynes, 39, 'is now a size 4 after losing 35lb' thanks to weight-loss medication... after hitting 180lb Chilling unclassified threat report reveals the'most likely' terror attack scenario on US soil Three's Company bombshell Jenilee Harrison who was also on Dallas and The Love Boat still looks great at 67, see her now The discovery of a Roman artifact in the Americas has sparked a debate about who truly discovered the New World. While Christopher Columbus is hailed as the first in 1492, archaeologists uncovered a small terracotta head of a bearded man carved with distinctive European features tucked inside a Mexican tomb. The artifact, known as the Tecaxic-Calixtlahuaca Head, was discovered in 1933 inside a sealed pre-Hispanic burial beneath multiple intact layers, indicating it had not been disturbed after its placement. Experts say its facial features, beard style and craftsmanship bear a striking resemblance to objects from the ancient Mediterranean rather than indigenous Mesoamerican traditions.
Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding
Neural time-series data contain a wide variety of prototypical signal waveforms (atoms) that are of significant importance in clinical and cognitive research. One of the goals for analyzing such data is hence to extract such `shift-invariant' atoms. Even though some success has been reported with existing algorithms, they are limited in applicability due to their heuristic nature. Moreover, they are often vulnerable to artifacts and impulsive noise, which are typically present in raw neural recordings. In this study, we address these issues and propose a novel probabilistic convolutional sparse coding (CSC) model for learning shift-invariant atoms from raw neural signals containing potentially severe artifacts.
Geometry Based Data Generation
We propose a new type of generative model for high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold. This is in contrast to existing generative models that represent data density, and are strongly affected by noise and other artifacts of data collection. We demonstrate how this approach corrects sampling biases and artifacts, thus improves several downstream data analysis tasks, such as clustering and classification. Finally, we demonstrate that this approach is especially useful in biology where, despite the advent of single-cell technologies, rare subpopulations and gene-interaction relationships are affected by biased sampling. We show that SUGAR can generate hypothetical populations, and it is able to reveal intrinsic patterns and mutual-information relationships between genes on a single-cell RNA sequencing dataset of hematopoiesis.
0b8aff0438617c055eb55f0ba5d226fa-Supplemental.pdf
Inthis supplemental material, wefirst present thedetailed networkarchitecture andparameters of the proposed approach in Sec. A. We further provide more analysis of the proposed method and ablation studies in Sec. B. Section C shows some qualitative results for potential applications of the proposed approach on medical imaging and imaging in astronomy. Figure 6: Illustration of learned deep features.(a) The blurry input and ground truth are shown in Figure 1(a)-(b). However, on may actually wonder whether the feature extraction network acts as a denoiser, leading to the observed robustness of the proposed method to various noise levels.