8  Tips and tricks

As highlighted in the project management section the scale of your project and the capacity of your team define the optimal approach. However, across the four scenarios provided a common notion is important, capacity building for the longevity of projects.

Within an academic context contracts are often limited in time, funds scarce. In addition, people move frequently between positions search for more stable or better paid (research) positions. This presents the danger of knowledge leakage. Preventing this slow trickle of disappearing knowledge requires redundancy in your project management approach, where the responsibility of key transcription components are not the sole responsibility of one person. Generally, this advice does not only apply to transcription projects, but most academic endeavours.

8.1 Mentoring

When teaching people transcription workflows, or the setup of a particular piece of software, do so in pairs. In addition, extensively document the process. Although courses and documentation exists for all software discussed these are generalized workflows and do not account for idiosyncrasies within your dataset.

When using annotation (Section 5.1) to provide tailored training data divide this task among trusted authorities, such as students, might speed up this process. Having a good manual at hand and the capacity within a research team to quickly teach this to someone new is key to the success of such an approach. Regardless, do not underestimate the time you need to invest in this process, so make it worth your while an pick your battles carefully.

8.2 Community/Citizen science

Most community/citizen science efforts will require half an hour of someone’s time to keep going, provide feedback and intermittent results to keep the community motivated. Furthermore, one should use citizen science because it is an easy way to get (training) data while not all other options such as data augmentation on smaller datasets have been exhausted. The latter approaches are often required regardless of data size. Assessing the accuracy of suite of models is required before concluding that more training data is needed than can reasonably be generated within a team.