Digital Scholarship Tools for Japanese Studies: NLP resources
A guide to resources for digital humanities projects related to Japanese language materials. Special thanks to Andrew P. Nelson for the initial guide.
Natural language processing resources
- NDL Full Text DataAffords key term search on the entire National Diet Library public domain collection. Aggregates texts containing search term in a result list, providing links to items in the NDL digital collection with full OCR download for text with 95% accuracy.
- NDL ngram viewerSophisticated historical word frequency visualization tool. Draws from the National Diet Library Digital Collection. Permits regex search. Allows tabular data download of overall frequency per year and publications per year. Covers from c.1860 to c. 2000.
- Minna de HonkokuAllows access to 3,000+ ongoing and completed transcription project documents
98.5% transcription accuracy on 23 million characters on 88,472 pages of text and growing. Registered contributors work with an OCR machine-training interface to render kuzushiji into standard Modern Japanese characters. - Bukan Complete CollectionDatabase of daimyo and their personnel, with domain maps, animated sankin kōtai maps and more. Documents are KuroNet linked for Kuzushiji OCR.
- Center for Open Data in The HumanitiesCentral landing page for a digital humanities center with members from the National Institute of Informatics and the Institute of Statistical Mathematics. Bountiful open source databases and code packages that cover a variety of disciplines. Notable projects are highlighted elsewhere in this document.
- Kuzushiji DatabaseA database of over 1,000,000 digitized kuzushiji characters prepared for the study of old texts and for machine training.
- KuroNet Kuzushiji OCR softwareAn AI OCR service trained on the Nihon koten seki kuzushiji dataset. (log-in required)
- Differential Reading PlatformA platform for comparing similar texts and viewing, with AI assistance, minute changes from version to version. Useful for comparing sequential issues of gazetteers, serial portraits, etc. Includes open source code.
- Chamame Online Japanese Text ParserA parser capable of handling any UniDict character; inserts spacing into Japanese text for the purpose of NLP analysis.
- NDL GithubAccess to 34 repositories that undergird the NDL Lab projects, including code packages for running OCR, as well as prepared OCR databases.
- GeoNLP Python Code for Place Name ExtractionOpen source software for extracting place names from historical maps. Builds dictionaries that associate place names with positional information.
- GeoLODA service for searching for and sharing toponym data.
- Last Updated: Sep 30, 2024 3:24 PM
- URL: https://guides.library.stanford.edu/DigitalResourcesForJapaneseStudies
- Print Page