kuronet
KuroNet: Regularized Residual U-Nets for End-to-End Kuzushiji Character Recognition
Kuzushiji, a cursive writing style, had been used in Japan for over a thousand years starting from the eighth century. Over 3 million books on a diverse array of topics, such as literature, science, mathematics and even cooking are preserved. However, following a change to the Japanese writing system in 1900, Kuzushiji has not been included in regular school curricula. Therefore, most Japanese natives nowadays cannot read books written or printed just 150 years ago. Museums and libraries have invested a great deal of effort into creating digital copies of these historical documents as a safeguard against fires, earthquakes and tsunamis.
How Machine Learning Can Help Unlock the World of Ancient Japan
Humanity's rich history has left behind an enormous number of historical documents and artifacts. However, virtually none of these documents, containing stories and recorded experiences essential to our cultural heritage, can be understood by non-experts due to language and writing changes over time. For instance, archaeologist have unearthed tens of thousands of clay tablets from ancient Babylon [1], yet only a few hundred specially trained scholars can translate them. The vast majority of these documents have never been read, even if they were uncovered in the 1800s. To give a further illustration of the challenge posed by this scale, a tablet from the Tale of Gilgamesh was collected in an expedition in 1851, but its significance was not brought to light until 1872.
KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning
Kuzushiji, a cursive writing style, had been used in Japan for over a thousand years starting from the 8th century. Over 3 millions books on a diverse array of topics, such as literature, science, mathematics and even cooking are preserved. However, following a change to the Japanese writing system in 1900, Kuzushiji has not been included in regular school curricula. Therefore, most Japanese natives nowadays cannot read books written or printed just 150 years ago. Museums and libraries have invested a great deal of effort into creating digital copies of these historical documents as a safeguard against fires, earthquakes and tsunamis. The result has been datasets with hundreds of millions of photographs of historical documents which can only be read by a small number of specially trained experts.
- Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)