Introduction
A DocX user asked me during the week when was I going to support converting Word 2007 documents (.docx) into other useful forms such as (.doc, .pdf, .html). I would love to add this functionality to DocX, however there is a problem.
The Problem
The only easy way to do this conversion, is to use Microsoft’s Office interop libraries. For anyone who doesn't know what Microsoft’s Office interop libraries are, I envy you.
The Microsoft Office interop libraries are available in the Add Reference dialog.
The Code
Once you have added a reference to Microsoft.Office.Interop.Word you can use the below project to convert a Word 2007 .docx into .doc, .pdf, and .html.
- using System;
- using System.Collections.Generic;
- using System.Linq;
- using System.Text;
- using Word = Microsoft.Office.Interop.Word;
- using Microsoft.Office.Interop.Word;
- namespace ConsoleApplication1
- {
- class Program
- {
- static void Main(string[] args)
- {
- // Convert Input.docx into Output.doc
- Convert(@"C:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.doc", WdSaveFormat.wdFormatDocument);
- /*
- * Convert Input.docx into Output.pdf
- * Please note: You must have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed
- * http://www.microsoft.com/downloads/details.aspx?FamilyId=4D951911-3E7E-4AE6-B059-A2E79ED87041&displaylang=en
- */
- Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.pdf", WdSaveFormat.wdFormatPDF);
- // Convert Input.docx into Output.html
- Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.html", WdSaveFormat.wdFormatHTML);
- }
- // Convert a Word 2008 .docx to Word 2003 .doc
- public static void Convert(string input, string output, WdSaveFormat format)
- {
- // Create an instance of Word.exe
- Word._Application oWord = new Word.Application();
- // Make this instance of word invisible (Can still see it in the taskmgr).
- oWord.Visible = false;
- // Interop requires objects.
- object oMissing = System.Reflection.Missing.Value;
- object isVisible = true;
- object readOnly = false;
- object oInput = input;
- object oOutput = output;
- object oFormat = format;
- // Load a document into our instance of word.exe
- Word._Document oDoc = oWord.Documents.Open(ref oInput, ref oMissing, ref readOnly, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref isVisible, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
- // Make this document the active document.
- oDoc.Activate();
- // Save this document in Word 2003 format.
- oDoc.SaveAs(ref oOutput, ref oFormat, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
- // Always close Word.exe.
- oWord.Quit(ref oMissing, ref oMissing, ref oMissing);
- }
- }
- }
The result
Please note
This code will only execute on a machine that has Microsoft’s Office installed on it. The Microsoft’s Office interop libraries actually execute a “hidden” instance of the Office. If you run the above code and then take a look at taskmgr you will see the following.
If you want to convert to .pdf, you must also have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed.
It is for this reason that I have not included convert functionality into my DocX library. I do not want DocX to have a dependency on Word.exe.
The future
Is there no way to do conversions without having Word.exe installed on my machine. I didn’t say that, I said there is no easy way. This looks very promising, now if I could only find the time.
Donation?
As always, I offer this code to you for free. I am however a student and if you would like to say thank you, you can buy me lunch by sending a €5 euro donation via paypal.