Text digitization, recognition and analysis

Author

Koen Hufkens

Published

January 7, 2025

Preface

These are the materials for the course “Text recognition and analysis, 6-7 Feb. 2025” at the Leibniz-Institut für Europäische Geschichte (IEG), Mainz, and in support of COBECORE project research efforts at the Free University Brussels, Belgium. This book will serve as a reference and as a general introduction for all things Handwritten Text Recognition/Optical Character Recognition (HTR/OCR).

This reference gives an overview of the most common tools for historical (handwritten) text recognition, but can be applied elsewhere, too. In addition, I will also briefly discuss the initial digitization and potential citizen science components of such projects, leveraging my experience leading the Congo basin eco-climatological data recovery and valorisation project. It will discuss the practical issues of such projects and how to resolve them efficiently and cost-effectively. This course is a practical tool, not a theoretical machine learning reference. This course will give you an idea of what it takes to start, and complete, a text recognition and analysis effort.