Tokyo, Japan – April 22nd, 2024, – Morpho AI Solutions, Inc. (hereinafter “Morpho AIS”), which is responsible for the commercialization of AI within the Morpho Group, and the University of Tokyo are pleased to announce the launch of research and development using AI-OCR for their “Applied research towards practical use of Devanagari OCR and utilization of text database”.
Since July 1, 2022, Morpho AIS has been providing FROG AI-OCR as a commercial software. Morpho AIS supplies it to meet the needs of digital archiving projects and compliance with reading-related barrier-free laws, backed by the Digital Garden City Initiative that is being implemented by university libraries, regional libraries, and local governments.
Morpho AIS, specializing in the development and application of AI-OCR, has joined forces with the University of Tokyo, renowned for its expertise in Sanskrit philology, and TOPPAN Inc., has expertise in the production of training data for AI-OCR of special glyphs, to leverage their respective strengths in the development of AI-OCR to engage in research that expedites the creation of a text database of Devanagari scripts used in Sanskrit literature.
In this research, “Applied research towards practical use of Devanagari OCR and utilization of text database” (JSPS Grant-in-Aid for Scientific Research (B)), the aim is the development and practical application of optical character recognition (OCR) software to recognize Devanagari script, and to carry out applied research utilizing databases of Sanskrit literature read using this OCR. Among India’s scripts, Devanagari serves as the primary means of inscribing Sanskrit, thereby preserving not only contemporary languages such as Hindi, Marathi, and Nepali, but also a corpus of historical records that illuminate the culture and history of the Indian subcontinent. As with other humanities disciplines, the digital preservation and compilation of Sanskrit literary works is of paramount importance in Sanskrit philology. As a result, numerous initiatives have sprung up around the world to promote projects dedicated to this task.
However, all of these projects involve manual data conversion, which requires a great deal of time and effort on the part of individual researchers.
Given the above circumstances, this research has made it possible to convert Devanagari characters into OCR text by improving the National Diet Library’s NDLOCR*, which is used as the core engine of FROG AI-OCR, so text data collection methods that were previously performed manually can now be automated.
The current research phase is to verify the scanning results and improve the accuracy of OCR, but the ultimate goal is to significantly reduce the time and effort required for manual transcription and correction.
In the future, Morpho AIS plans to develop new AI-OCR for diverse applications through collaborations with industry, government, and academia.
The primary aim of this research is to develop Devanagari OCR technology capable of automatically converting the Sanskrit manuscripts we specialize in into text data. However, once OCR is implemented in practical settings, numerous possibilities will emerge as extensions of this technology.
In India and beyond, numerous manuscripts remain in handwritten form. Recently, a national initiative has been launched to digitize these manuscripts using digital photography and scanning, thus creating electronic archives. However, this effort alone is insufficient, and there will undoubtedly be a growing need in the near future for converting these manuscripts into text data and structuring the data accordingly. The utilization of OCR for scanning Sanskrit manuscripts remains largely unexplored in research up to now. This endeavor is anticipated to be a groundbreaking project on a global scale, pioneering new avenues of study in this field.
The Devanagari OCR project aligns well with our mission of preserving and disseminating cultural heritage through technological advancements. Our partnership with the University of Tokyo reflects our dedication to dismantling barriers to knowledge and fostering a valuable resource accessible to all. We firmly believe that the success of this research will broaden the scope of applications for our AI-OCR technology and foster substantial advancements in literary analysis. We extend our heartfelt gratitude to the University of Tokyo and Toppan Inc. for their invaluable collaboration and support.
By improving the National Diet Library’s NDLOCR and performing additional learning for the Devanagari script, it is now possible to process the sample image below.
– Video – Introducing FROG AI-OCR
– Requests and Inquiries
https://frog-ai-ocr.morphoai.com/
A free trial is also available from this page.
4/28/2022
Morpho AI Solutions Developed OCR Program for National Diet Library by Using Latest AI Technologies
https://www.morphoinc.com/en/news/20220428-epr-mais_ndl
5/9/2022
Morpho AI Solutions Develops OCR for the National Diet Library to Aid the Visually Impaired
https://www.morphoinc.com/en/news/20220509-epr-mais_ndl
5/12/2023
Morpho AI Solutions Announces a Researcher Package Plan for FROG AI-OCR, AI-OCR Software for Modern Books
https://www.morphoinc.com/en/news/20241205-epr-mais_frog_aiocr
* FROG AI-OCR is used as the core engine of the National Diet Library’s NDLOCR (https://github.com/ndl-lab/ndlocr_cli).
Morpho AI Solutions is a company engaged in the commercialization of AI (Artificial Intelligence). It promotes the introduction and actual operation of cutting-edge AI technologies, including AI-OCR, in the areas of social infrastructure such as government, electric power, transportation, and manufacturing.
For more information, visit https://www.morphoai.com/ or contact contact@morphoai.com.