Figure 11. Output of scanning page 173 of the Davidson (1886–1888) monograph using optical character recognition software. Compared with the original, there are six mistakes, which are highlighted using yellow shading. None of the mistakes interferes with extensible markup language (XML) parsing, and all but one are easily corrected using the learn function in the optical character recognition (OCR) dictionary. The large area of highlighting outlines an area in which the order of words in the original text has been very slightly altered. This latter feature is due to the presence of separate text boxes on the page, in particular the interaction of the text box containing the figure description text with the main taxonomic description. This feature was not significant for XML parsing of the text of the taxonomic description.