User:RP31/Processing books with mathematics

From DPWiki
Jump to navigation Jump to search

Background

In the past books with mathematics have been transcribed by DP entirely in Latex form. This seems to have ceased.

It is proposed to transcribe them using normal DP procedure except for the maths parts which can be transcribed in delimited Tex. The normal post-processing procedures will produce an html document with these markups. The document can be viewed using MathJax. A further processing stage can make svg images from the markups and link them into the html document.

SVG images are best since they can be zoomed larger without losing sharpness.

Proofreading Rounds

The old procedure for Latex books was to use normal DP guidelines except for the maths as in this document. This may need a few changes. User:RP31/Proof_reading_rounds_with_LaTex_Math_markup document is a first attempt at it. Also User:Donalies/Latex_Trial_Run_2015_Proofreading

What constitutes mathematics?

A question arises as to what should be transcribed as maths in the text. If a single letter (say "A") is a mathematical object should it be transcribed as <i>A</i> or \(A\)? The latter would be converted to an image so would make the resulting book bigger. (The way the PP tool works would result in the same image being used for all instances of \(A\)). This would however give the best result since if the "A" also appears in an equation the font used could be different to the font the reader has chosen for ordinary text so the letter could look significantly different.

Possible changes

Capital Greek letters which are similar to latin letters are now transcribed as the latin letter. This could cause a problem: for example if it was clear from the context that a capital rho was intended and the proofer transcribed as latin P, the formatter might think that a latin P was intended. If it was proofed as \Rho then the formatter could deal with it as appropriate.

Formatting Rounds

This document attempts to describe the basic procedure.

It is proposed to use only \[ ... \] and \( ... \) delimiters for display and inline maths. Latex used also $$ and $ but it seems best to use different start and end delimiters in order to be able to automatically find mistakes. In the proofing guidelines $$ is used with a different meaning so not using it for formatting should avoid unnecessary confusion.

A feature has been added to the format preview function to render the markup. This uses MathJax.

Post processing

A tool has been made which will process text with embedded Latex maths to generate images and embed links to them in the text. It is described here.

It can be used on the html document resulting from normal procedures.

What should the plain text version be like? It could have the Latex representation which is somewhat human-readable.

Conversion of existing incomplete books

Latex books

Books which have not yet started formatting can easily be resumed using this procedure.

Otherwise the math formatting could be used and the non-math parts re-worked. Existing books mostly use $ markup for inline maths. This could be automatically converted to \( .. \) markup. I have made a tool to do this and some other simple changes such as \emph and \textit to italic tags: https://www.pgdp.org/~rp31/convertor/UnDollar.html. This can be used in the rounds, converting each page. Some types of Latex formatting, such as handling footnotes, are very different to the normal procedure.

DP math books

There is also a tool for converting this markup: https://www.pgdp.org/~rp31/convertor/Convert.html. This can be used in the rounds, converting each page by pasting in the text and pasting the result back into the proofreading interface.

A Test Project

It would be possible to make a test project from a few pages of an existing book to try out the procedure.