Multimodal Side-Tuning for Document Classification

Zingaro, Stefano Pio, Lisanti, Giuseppe, Gabbrielli, Maurizio

Jan-23-2023–arXiv.org Artificial Intelligence

Notwithstanding the many technological advances in computer vision and artificial intelligence, which are contributing to the "digital transformation" of many companies and industrial processes, there still exist a surprising number of tasks which are almost completely carried out by humans. In particular, many tasks in different industries, from administrative procedures to archival of old manuscripts, involve the human elaboration of a huge number of paper documents, with consequent high costs for the companies and, ultimately, for their clients. There are two main reasons for this situation: one is deeply connected to the internal rules and processes of some companies, banks in particular, which have an important number of legacy procedures and have big inertia for innovation. The second reason, that we consider in this paper, is the lack of completely satisfactory (automatic) tools for document classification, especially when documents contain different source of information such as text, images, and handwritten parts. While some paper documents could be replaced by electronic means, one cannot eliminate paper documentation, hence efficient and trustworthy tools for document classification are essential. As we discuss in the next section, document classification has been widely investigated and methods can be roughly divided into three categories: those that are based on the textual content of the document, often obtained from Optical Character Recognition (OCR), those based on the visual structure of the image, and multimodal methods that use both text and image. The latter family of solutions [1-8] have provided significant advances, yet dealing with both textual and visual content in full generality remains an open problem [8].

classification, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jan-23-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Text Classification (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found