Batch Converting Word Documents to HTML using Mac textutil

November 6, 2015
Posted in: Code Snippets, How To

We recently had a client create hundreds of Microsoft Word (.doc) files, and ask us to import their contents to their WordPress site. Importing .doc files via PHP isn’t the quickest task to setup, so we decided to batch-convert the files to .html so we could easily read their contents and clean up the code before inserting into the database.

Unfortunately, all of the batch-conversion programs we tested out had trouble with non-English characters, and ended up doing more harm than good.

With a bit of research, we found that our beautiful Macs had a command-line application called “textutil” that could take care of this in seconds.

Here’s how:

  1. Open Terminal
  2. Navigate to the folder holding the original documents
  3. Enter the following command:
textutil -convert html *.doc

Open the folder in Finder, or run ls and you’ll see that every .doc file now has a .html companion. The generated HTML is fairly clean, but includes some code you may want to clean up.

textutil has many options, and even some that can clean up the output. See the full manual in the Mac developer library.

Scott Buckingham

President / Owner
613-801-1350 x101
[email protected]
Scott is a WordPress expert who has worked on hundreds of web design and development projects. He excels at finding creative ways to solve technical problems. View full profile