About Me

My photo
Ireland
Hello, my name is Cathal Coffey. I am best described as a hybrid between a developer and an adventurer. When I am not behind a keyboard coding, I am hiking and climbing the beautiful mountains of my home country Ireland. I am a full time student studying Computer Science & Software Engineering at the National University of Ireland Maynooth. I am finishing the final year of a 4 year degree in September 2009. I am the creator of an open source project on codeplex.com called DocX. At the moment I spend a lot of my free time advancing DocX and I enjoy this very much. My aim is to build a community around DocX and add features based on requests from this community. I really enjoy hearing about how people are using DocX in their work\personal projects. So if you are one of these people, please send me an email. Cathal coffey.cathal@gmail.com

Saturday, October 31, 2009

Converting .docx into (.doc, .pdf, .html)

Introduction

A DocX user asked me during the week when was I going to support converting Word 2007 documents (.docx) into other useful forms such as  (.doc, .pdf, .html). I would love to add this functionality to DocX, however there is a problem.

The Problem

The only easy way to do this conversion, is to use Microsoft’s Office interop libraries. For anyone who doesn't know what Microsoft’s Office interop libraries are, I envy you.

The Microsoft Office interop libraries are available in the Add Reference dialog.

Untitled 

The Code

Once you have added a reference to Microsoft.Office.Interop.Word you can use the below project to convert a Word 2007 .docx into .doc, .pdf, and .html.

Code Snippet
  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Text;
  5. using Word = Microsoft.Office.Interop.Word;
  6. using Microsoft.Office.Interop.Word;
  7.  
  8. namespace ConsoleApplication1
  9. {
  10.     class Program
  11.     {
  12.         static void Main(string[] args)
  13.         {
  14.             // Convert Input.docx into Output.doc
  15.             Convert(@"C:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.doc", WdSaveFormat.wdFormatDocument);
  16.  
  17.             /*
  18.              * Convert Input.docx into Output.pdf
  19.              * Please note: You must have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed
  20.              * http://www.microsoft.com/downloads/details.aspx?FamilyId=4D951911-3E7E-4AE6-B059-A2E79ED87041&displaylang=en
  21.              */
  22.             Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.pdf", WdSaveFormat.wdFormatPDF);
  23.  
  24.             // Convert Input.docx into Output.html
  25.             Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.html", WdSaveFormat.wdFormatHTML);
  26.         }
  27.  
  28.         // Convert a Word 2008 .docx to Word 2003 .doc
  29.         public static void Convert(string input, string output, WdSaveFormat format)
  30.         {
  31.             // Create an instance of Word.exe
  32.             Word._Application oWord = new Word.Application();
  33.  
  34.             // Make this instance of word invisible (Can still see it in the taskmgr).
  35.             oWord.Visible = false;
  36.  
  37.             // Interop requires objects.
  38.             object oMissing = System.Reflection.Missing.Value;
  39.             object isVisible = true;
  40.             object readOnly = false;
  41.             object oInput = input;
  42.             object oOutput = output;
  43.             object oFormat = format;
  44.  
  45.             // Load a document into our instance of word.exe
  46.             Word._Document oDoc = oWord.Documents.Open(ref oInput, ref oMissing, ref readOnly, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref isVisible, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
  47.  
  48.             // Make this document the active document.
  49.             oDoc.Activate();
  50.  
  51.             // Save this document in Word 2003 format.
  52.             oDoc.SaveAs(ref oOutput, ref oFormat, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
  53.             
  54.             // Always close Word.exe.
  55.             oWord.Quit(ref oMissing, ref oMissing, ref oMissing);
  56.         }
  57.     }
  58. }

The result

 

image
Input.docx

 

image

image

image

Output.doc

Output.pdf

Output.html

Please note

This code will only execute on a machine that has Microsoft’s Office installed on it. The Microsoft’s Office interop libraries actually execute a “hidden” instance of the Office. If you run the above code and then take a look at taskmgr you will see the following.

image

If you want to convert to .pdf, you must also have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed.

It is for this reason that I have not included convert functionality into my DocX library. I do not want DocX to have a dependency on Word.exe.

The future

Is there no way to do conversions without having Word.exe installed on my machine. I didn’t say that, I said there is no easy way. This looks very promising, now if I could only find the time.

Donation?

As always, I offer this code to you for free. I am however a student and if you would like to say thank you, you can buy me lunch by sending a €5 euro donation via paypal.