Jens Nöckel's Homepage

Computer notes home

Converting from Word to LaTeX on Macs

This page is for LaTeX users who face one of three scenarios:

I'm going to collect information that I believe to be workable, without having done a huge amount of testing. That's simply because I don't have MS Office on this Mac computer. I almost never use OpenOffice or LibreOffice, either — so the information on this page may be outdated.

Probably the biggest challenge is the conversion of formula objects into LaTeX. I'll return to this at the end. Conversion is a game that can be played at different levels of sophistication, and I'm looking for the simplest and cheapest routes here.

Different routes to get a LaTeX file

This is not a complete list of possible routes. More alternatives can be found at TUG. I'm only listing the things that I think are really worth trying.

Assume you have a Word file text.doc. Here I'll list some ways of dealing with this file, ranked in the order their quality:
Word → OpenOffice ODT → LaTeX
This produces the best LaTeX (after LyX), to my knowledge.

Writer2LaTeX

Word → RTF → LaTeX

Use the unrtf tool. It can be installed via fink.

Word → HTML → LaTeX
To get math and images into the LaTeX document, the simplest method is to treat them all as graphics.
textutil -convert html text.doc
The converted HTML document has graphics and bitmapped formulas included. HTML is in principle a very readable source format, and at this point I would say one actually gains almost nothing in taking the extra step of converting this to LaTeX. The main point of LaTeX for me would be to be able to edit math formulas easily. But HTML conversion eliminates that possibility because it creates bitmaps from formulas. Nevertheless, there are several converters that all share the obvious name html2latex but differ in their capabilities as well as their implementation. An official place where you can find these converters (plus converters from HTML to other formats) is html2things. Most of these are so old that they don't recognize modern HTML tags or, e.g., style sheets. I've tried and ruled out the sed script, and found latex bugs with nc-html2latex, so that the best remaining choice ended up being HTML to LaTeX (version 2.7). The fact that this converter happens to have no graphics support is really irrelevant for the reason stated above (images can't be edited anyway).

noeckel@uoregon.edu
Last modified: Fri Nov 18 15:39:31 PST 2016