Goto

Collaborating Authors

 proc


M5HisDoc: ALarge-scale Multi-style Chinese Historical Document Analysis Benchmark

Neural Information Processing Systems

Recognizing and organizing text in correct reading order plays a crucial role in historical document analysis and preservation. While existing methods have shown promising performance, they often struggle with challenges such as diverse layouts, low image quality, style variations, and distortions. This is primarily due to the lack of consideration for these issues in the current benchmarks, which hinders the development and evaluation of historical document analysis and recognition (HDAR) methods in complex real-world scenarios. To address this gap, this paper introduces a complex multi-style Chinese historical document analysis benchmark, named M5HisDoc. The M5 indicates five properties of style, ie., Multiple layouts, Multiple document types, Multiple calligraphy styles, Multiple backgrounds, and Multiple challenges.



Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition

Neural Information Processing Systems

The plug-and-play priors (PnP) and regularization by denoising (RED) methods have become widely used for solving inverse problems by leveraging pre-trained deep denoisers as image priors. While the empirical imaging performance and the theoretical convergence properties of these algorithms have been widely investigated, their recovery properties have not previously been theoretically analyzed. We address this gap by showing how to establish theoretical recovery guarantees for PnP/RED by assuming that the solution of these methods lies near the fixedpoints of a deep neural network. We also present numerical results comparing the recovery performance of PnP/RED in compressive sensing against that of recent compressive sensing algorithms based on generative models. Our numerical results suggest that PnP with a pre-trained artifact removal network provides significantly better results compared to the existing state-of-the-art methods.



0234c510bc6d908b28c70ff313743079-AuthorFeedback.pdf

Neural Information Processing Systems

Figure 1: (a) Precision (blue) and recall (orange) for Figure 2: (a) Real data covers five modes (1-5) and several neighborhood sizes k. Both metrics were evaluated using 20k real and of varying sample count. Figure 1a illustrates the effect of varying k in the setup used in Figure 4b of the submission (truncation sweep 4 in StyleGAN, VGG-16 features, 50k samples). In general, different k yield consistent results and affect mainly the 5 saturation towards 0 or 1. Therefore, selecting k is a tradeoff between under-or overestimating the manifolds.






ID and OODPerformance Are Sometimes Inversely Correlated on Real-world Datasets

Neural Information Processing Systems

Several studies have compared the in-distribution (ID) and out-ofdistribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation, but surprisingly, almost never an inverse correlation that would be indicative of a necessary trade-off. Such inverse patterns are possible theoretically, and their occurrence in practice is important to determine whether ID performance can serve as a proxy for OOD generalization.