Backgtound

Tsugio Sekiguchi, a Japanese linguist of German Grammar, wrote gigantic manuscript of 25000 pages for 30 years. He collected examples of sentences. Thanks to his first son, Ikuya, the manuscript is stored in Keio University, Hamamatsu University School of Medicine and Osaka Univesity. However, all the researchers who were in charge of the management of manuscript have been retired. It seems that no one use it. Some examples of the manuscript were open public at the web-site of Cyber Media Center, Osaka University before, but it's unavailable now.

I happened to get all the digitalized data of Manuscript which is stored in Osaka University, therefore I would like to try to decipher and make it open public as my social responsibility. The manuscript in Osaka University is not original, but copied one.

Work Record

2000Manuscript was stored in Osaka University and some of them was digitalized.
April 2018I got the digitalized data.
May 2018Start to decipher the index
January 2019Decipher of index was finished, start to upload to the website.

Digitalization of Index

It is a tough job to make it open public on the web. Although all the pages are digitalized, but we only have image files (TIF). Of course, it's better than nothing, because there are 25000 pages! TIF was the best format when they were scanned.

All index pages were written in word processor! So there's no ascii data. I used great OCR software of Google, "Tesseract." However, the index is written in several languages. It was a difficult point.

Since Sekiguchi used old style Japanese character as well which Tesseract cannot read. So I wrote replacing program by python. I worked on it every weekday and finished after 6 months or more.

Digitalization of Main Contents

To tell the truth, index pages were not written by Sekiguchi, some researcher wrote to copyedit. I hadn't reached main contents yet.

Sekiguchi used many styles, not only typewritter, but also Sütterlin or Fraktur. I can read both fonts because I learnt German by the textbook written by Sekiguchi, but generally speaking, it's very difficult for modern person. Japanese language was also written in old style, I can read it without hesitation since I learnt by the textbook written by Sekiguchi, published before WW2, but not easy for modern Japanese.

Since there are 25000 pages, I digitalized 1 page per a day, it needs 68 years. My life's deadline will come before I finish my work.

A page of the manuscript

Plan from Now on

I will move ahead slowly. Please wait.

In the Future

No one will manage after I dead. Therefore, I hope I could publish as a book which I will edit by Tex or something. Or someone could take over my work.