Tokyo, Japan – April 28th, 2022– Morpho AI Solutions, Inc., responsible for AI business in the Morpho Group, announced that it has completed the “Research and Development of OCR Processing Program (hereinafter “the Project”)” commissioned by the National Diet Library.
As part of “Vision 2021-2025: The Digital Shift at the National Diet Library”, the National Diet Library is working on a project to realize universal access to provide a variety of information resources to all users throughout the future. They are also working on projects to expand the national digital information infrastructure serving permanently for this purpose.
“Vision 2021-2025: The Digital Shift at the National Diet Library”: https://vision2021.ndl.go.jp/en/
In this project, Morpho researched and developed an OCR processing program that incorporates Morpho’s latest AI and image processing technologies to enable text data creation for the images of materials available in the Digital Collections of the National Diet Library hereafter. In addition, an OCR training dataset of approximately 13 million characters was constructed with the cooperation of Toppan Inc.
The OCR processing program developed in 2021 supports a variety of layouts and character types, enabling text conversion of complex materials from the Meiji to Showa periods, which existing OCR services are unable to handle.
Books and magazines from the 1860s onward can now be recognized with more than 90% accuracy, which is much higher than commercial OCR. In particular, for modern books and magazines from the Meiji period to the early Showa period, the reading accuracy doubled compared to commercial OCR (from approximately 40% to over 90%).
“NDLOCR, a Japanese OCR program developed in this Project, was released on April 25, 2022, as open source from the official NDL Lab GitHub account (https://github.com/ndl-lab). NDLOCR is an OCR program that enables additional training from the original training data. It will be used to create full-text data for materials to be digitized by the National Diet Library in the future. In addition to the program, the machine learning dataset used for development will be made available soon. (Note: only for the portion created from digitized materials whose copyright protection period has expired) We hope that NDLOCR will contribute to improving the accuracy of Japanese OCR overall, and we hope that many interested parties will benefit from it.”
Morpho AI Solutions is a company engaged in the commercialization of AI (Artificial Intelligence). We promote the introduction and actual operation of cutting-edge AI technologies, including AI-OCR, in the areas of social infrastructure such as government, electric power, transportation, and manufacturing.
For more information, visit https://www.morphoai.com.