About Me

My photo
Ireland
Hello, my name is Cathal Coffey. I am best described as a hybrid between a developer and an adventurer. When I am not behind a keyboard coding, I am hiking and climbing the beautiful mountains of my home country Ireland. I am a full time student studying Computer Science & Software Engineering at the National University of Ireland Maynooth. I am finishing the final year of a 4 year degree in September 2009. I am the creator of an open source project on codeplex.com called DocX. At the moment I spend a lot of my free time advancing DocX and I enjoy this very much. My aim is to build a community around DocX and add features based on requests from this community. I really enjoy hearing about how people are using DocX in their work\personal projects. So if you are one of these people, please send me an email. Cathal coffey.cathal@gmail.com

Saturday, October 31, 2009

Converting .docx into (.doc, .pdf, .html)

Introduction

A DocX user asked me during the week when was I going to support converting Word 2007 documents (.docx) into other useful forms such as  (.doc, .pdf, .html). I would love to add this functionality to DocX, however there is a problem.

The Problem

The only easy way to do this conversion, is to use Microsoft’s Office interop libraries. For anyone who doesn't know what Microsoft’s Office interop libraries are, I envy you.

The Microsoft Office interop libraries are available in the Add Reference dialog.

Untitled 

The Code

Once you have added a reference to Microsoft.Office.Interop.Word you can use the below project to convert a Word 2007 .docx into .doc, .pdf, and .html.

Code Snippet
  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Text;
  5. using Word = Microsoft.Office.Interop.Word;
  6. using Microsoft.Office.Interop.Word;
  7.  
  8. namespace ConsoleApplication1
  9. {
  10.     class Program
  11.     {
  12.         static void Main(string[] args)
  13.         {
  14.             // Convert Input.docx into Output.doc
  15.             Convert(@"C:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.doc", WdSaveFormat.wdFormatDocument);
  16.  
  17.             /*
  18.              * Convert Input.docx into Output.pdf
  19.              * Please note: You must have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed
  20.              * http://www.microsoft.com/downloads/details.aspx?FamilyId=4D951911-3E7E-4AE6-B059-A2E79ED87041&displaylang=en
  21.              */
  22.             Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.pdf", WdSaveFormat.wdFormatPDF);
  23.  
  24.             // Convert Input.docx into Output.html
  25.             Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.html", WdSaveFormat.wdFormatHTML);
  26.         }
  27.  
  28.         // Convert a Word 2008 .docx to Word 2003 .doc
  29.         public static void Convert(string input, string output, WdSaveFormat format)
  30.         {
  31.             // Create an instance of Word.exe
  32.             Word._Application oWord = new Word.Application();
  33.  
  34.             // Make this instance of word invisible (Can still see it in the taskmgr).
  35.             oWord.Visible = false;
  36.  
  37.             // Interop requires objects.
  38.             object oMissing = System.Reflection.Missing.Value;
  39.             object isVisible = true;
  40.             object readOnly = false;
  41.             object oInput = input;
  42.             object oOutput = output;
  43.             object oFormat = format;
  44.  
  45.             // Load a document into our instance of word.exe
  46.             Word._Document oDoc = oWord.Documents.Open(ref oInput, ref oMissing, ref readOnly, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref isVisible, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
  47.  
  48.             // Make this document the active document.
  49.             oDoc.Activate();
  50.  
  51.             // Save this document in Word 2003 format.
  52.             oDoc.SaveAs(ref oOutput, ref oFormat, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
  53.             
  54.             // Always close Word.exe.
  55.             oWord.Quit(ref oMissing, ref oMissing, ref oMissing);
  56.         }
  57.     }
  58. }

The result

 

image
Input.docx

 

image

image

image

Output.doc

Output.pdf

Output.html

Please note

This code will only execute on a machine that has Microsoft’s Office installed on it. The Microsoft’s Office interop libraries actually execute a “hidden” instance of the Office. If you run the above code and then take a look at taskmgr you will see the following.

image

If you want to convert to .pdf, you must also have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed.

It is for this reason that I have not included convert functionality into my DocX library. I do not want DocX to have a dependency on Word.exe.

The future

Is there no way to do conversions without having Word.exe installed on my machine. I didn’t say that, I said there is no easy way. This looks very promising, now if I could only find the time.

Donation?

As always, I offer this code to you for free. I am however a student and if you would like to say thank you, you can buy me lunch by sending a €5 euro donation via paypal.

69 comments:

  1. Cool Trick to Export to PDF, i was looking it for quite some time.

    Thanks for sharing

    ReplyDelete
    Replies
    1. The perfect!These articles written too great,they rich contents and data accurately.they are help to me.I expect to see your new share.
      -----------------
      RS Gold Runescape Gold Buy WOW Gold

      Delete
    2. hi, i am a student of software engineering and i doing work on my final year project, i want help. i want to convert pdf file in different formats in clients side , means in jacascript. could you please help ma ?

      engg.nashib@gmail.com

      Delete
    3. it doesn't work it shows some command failed exception on saveas property

      Delete
    4. I would like to thank you for the efforts you have made in writing this article. I am hoping the same best work from you in the future as well. Thanks... view

      Delete
  2. I just started look at your Open Source Project "DOCX" and then saw this blog. I would really suggest you to look at the OpenXmlPowerTools.HtmlConvertor and the iTextSharp. You can use both of these in combination to generate either html and or pdf from the html. It is not as perfect, but does the job pretty nicely without overage of the PIAs. The HTML is also pretty clean.

    Thanks,

    ReplyDelete
  3. I love it,Excellent article.I am decide to put this into use one of these days.Thank you for sharing this.To Your Success!
    _____________________________________________________________________________

    Rc Helicopter Parts|Rc Helicopter|Mini Rc Helicopter

    ReplyDelete
  4. It works good. Excellent article. Thanx.

    ReplyDelete
  5. Its pretty good and very easy to understand.

    Thanks

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Hey guys, here you are a reliable store to buy WoW gold which is really cheap. I know it through my friend's recommendation. If you are a wow fan, you can have a try. You know, it is difficult to buy cheap wow gold with fast delivery. Hope you like it.

    ReplyDelete
  8. same error....oDoc.Activate(); error "Object reference not set to an instance of an object." for issue. Help me plssssss :( that's y i tried ur dll but it don't have this functionality...

    ReplyDelete
    Replies
    1. check below setting

      Start->dcomcnfg.exe
      Computer
      Local Computer
      Config DCOM
      Search For Microsoft Word 97-2003 Documents->Properties
      Tab Identity ,change from Launching User To Interactive User

      Delete
  9. Hi Cathal,

    This library appears to optimised for writing/editing documents. If there are good interfaces to enumerate the document then I would suggest coding up iTextSharp to output to many different potential formats. (From memory iTextSharp has a generic interface to output to many formats).

    I would be willing to donate toward to such a project. At the moment there is a lockup of commercial products. Docx to Pdf particularly would be great to have in the open source realm, as even commercial products can have bugs which you can't fix yourself.

    ReplyDelete
  10. Hi...
    But it is not working in IIS .
    Can you suggest me how to do this

    ReplyDelete
  11. can we convert doc to pdf without installing microsoft offic and open office and also without using third party dll's

    ReplyDelete
  12. Not working need to have Microsoft office to avail Microsoft’s Office interop libraries.

    ReplyDelete
  13. I had some errors on this when creating .doc and .html files. I found that i had to use oDoc.Close() to get everything to work properly.

    ReplyDelete
  14. If DocX uses Microsoft.Office.Interop's dll then why don't we convert document to pdf using that only.?

    ReplyDelete
  15. Thanks for sharing wonderful tips.
    I am very curious to know how doc file can be converted into PDF without installing MS office. The problem is there is restriction to install MS office on production server. Please suggest the best way to convert word to pdf or any third party tools.

    ReplyDelete
  16. To be able to build the solution, you’ll need to have Office installed on your machine. This can be painful if you’re dealing with a CI server.

    I found a simple way around this problem. First change the Embed Interop Types property to false and rebuild your solution. You’ll end up with a Microsoft.Interop.*.dll in your bin/debug directory. Now take this assembly and store it in your project as a .dll. Next, you’ll have to update your project references to that new location, instead of directly referencing the installed version in your installation path.

    After that, you can change the Embed Interop Types to true, and rebuild again. Now you should be able to build without the need of an Office installation on your machine.

    ReplyDelete
  17. i get this error on server 2012r2
    Retrieving the COM class factory for component with CLSID {000209FF-0000-0000-C000-000000000046} failed due to the following error: 80070005 Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)).

    ReplyDelete
  18. do you know how can convert word to pdf without word installed?

    ReplyDelete
  19. Thank you for sharing this. I know Microsoft Office has its own plug-in for saving Word and Excel as PDF files. But if anyone need to convert Word to PDF on other applications, a third-party converter is necessary. I use RasterEdge, which support Word, Excel and Tiff to PDF, tiff, word and convert Word to PDF, I think it's also a good tool for you.

    ReplyDelete
  20. Anyone else have an issue where it thinks you updated the original word document and prompts if you would like to save?

    ReplyDelete
  21. object doNotSaveChanges = Word.WdSaveOptions.wdDoNotSaveChanges;
    this.Close(ref doNotSaveChanges, ref missing, ref missing);

    ReplyDelete
  22. This comment has been removed by the author.

    ReplyDelete
  23. This comment has been removed by the author.

    ReplyDelete
  24. More of the handy principles have been prescribed in well written form which will help in keeping your assignments get completed within time and there would be more tasks to done with. png to html

    ReplyDelete
  25. For a Office product key, any editions, check out this site: www.gankings.com, got mine from here, perfectly used!

    ReplyDelete
  26. Check out this site: www.motionkeys.com, you'll gonna find some fine working product keys.

    ReplyDelete

  27. Windows 7 Key Code (http://www.windows10keysale.com)

    Hearing all the negative sentiments toward the Windows Server 2012 R2 Essentials Product Key? OS made me really cautious in upgrading from Windows 7.
    Finally, my Windows 7 became buggy enough that I decided it was time for me to go ahead and upgrade.
    The days of reformatting my hard drive and reinstalling Windows OS (95, 98, XP...), when the computer started to act up, are over. I got the pro upgrade version....for a really great price on Windows 7 Key Code (http://www.windows10keysale.com) .
    I purchased all my software from them. They are the best online store I ever buy.
    Installation took a while, with lots of downloading and installing drivers, updates...etc.
    When it was finally done, computer was running a lot faster, and all of the weird behaviors of previous installation were gone.
    Now as far as using the new OS, it really wasn't that big of a deal. Yes, it's different from the previous Windows; but after 2 or 3 days, I've picked up so many cool features that I truly believe it's a superior OS compared to the previous Windows releases. Really.
    The interface is quite convenient once you figure out how to use all the features and what happens when you move your cursor around the screen.
    Oh, yeah, and I don't have a touchscreen. Although I'm certain that a touchscreen would definitely be a more effective way to navigate Windows 8, I really have no problems using just the mouse.
    So, there you have it. I love this new OS.

    Windows 7 Key Code (http://www.windows10keysale.com)

    ReplyDelete
  28. Hi cathal,
    I know very well we can convert doc to html with this library and also with office.interop. But my requirement is need to convert doc with entire content(headers & footers) to html. I tried more with interop but we couldn't find the solution. can i do this with Docx?? If yes please give references.

    Thanks,
    Naveenraj.S

    ReplyDelete
  29. This pdf to word conversion tool works well and is free. Keep adding more such information for the benefit of the people.

    ReplyDelete
  30. Hi, I need your help. when i'm trying to load a file with.doc extension, it is giving an exception says that "file contains interrupted data. how can i overcome this?

    ReplyDelete
  31. is there any way to print html tags to word using docx

    ReplyDelete
  32. There are a lot of tools claiming the best in converting the PDF into HTML but from my personal experience and as an developer, i would recommend the hand crafting conversion of PDF to HTML and the best agency to my knowledge in PDF to HTML Conversion services is HTML Pro

    ReplyDelete
  33. Thank you for bringing more information to this topic for me. I’m truly grateful and really impressed.

    xml data conversion

    ReplyDelete
  34. This comment has been removed by the author.

    ReplyDelete
  35. html to pdf converterHTML to PDF Converter | Convert from html to pdf online, using SelectPdf web to pdf convert


    ReplyDelete
  36. This comment has been removed by the author.

    ReplyDelete
  37. Awesome article…..truly appreciated thanks keep sharing convert word to html

    ReplyDelete
  38. html to pdf converterHTML to PDF Converter for .NET | Select.Pdf offers a powerful html to pdf converter that can be used in any .NET application to convert any web page or raw html string to pdf

    ReplyDelete
  39. pdf libraryCreate high quality PDFs with SelectPdf Html To Pdf Converter from the best PDF library. HTML to PDF API also available

    ReplyDelete
  40. html to pdf converter

    SelectPdf Free Html To Pdf Converter Samples for C# / ASP.NET. Pdf Library for .NET with full sample code in C# and VB.NET.

    https://selectpdf.com/html-to-pdf/demo/

    ReplyDelete
  41. html to pdf online

    HTML to PDF Converter Online | SelectPdf offers a powerful and free to use online html to pdf converter, as well as the possibility to add "download as pdf" buttons on your site.

    https://selectpdf.com/category/html-to-pdf-online/

    ReplyDelete
  42. I was looking for this process for so long! I went through a lot of different ways and approaches to make this happen but none of the procedures really worked for it. I saw this in practice and always wanted to know the whole dilemma behind it. Thanks to you my quest finally ended here!
    I also searched for the second option a Bytescout. I read good customer reviews about this tool too. I will definitely try these both options.

    ReplyDelete
  43. Pdf library

    Create high quality PDFs with SelectPdf Html To Pdf Converter from the best PDF library. HTML to PDF API also available

    to get more - https://selectpdf.com/

    ReplyDelete
  44. Big thanks for sharing this great post about DOCX to PDF Converter that will help all bloggers.

    ReplyDelete
  45. Html to pdf api

    HTML to PDF Converter Online | SelectPdf offers a powerful and free to use online html to pdf converter, as well as the possibility to add "download as pdf" buttons on your site.

    to get more - https://selectpdf.com/category/html-to-pdf-online/

    ReplyDelete
  46. Nice post, I have just read your articles on HTML to PDF Converter which is really amazing and awesome.

    ReplyDelete
  47. Ascent BPO Services is a one-stop solution for all your BPO services outsourcing requirements. We offer high-quality transcription, Data entry projects, web research, data entry work, Non-voice & form filling projects, data conversion, scanning, and many more outsourcing BPO services at a very economical rate.

    ReplyDelete
  48. Nice blog.This Batch Word to RTF converter is an indispensable tool for those who have to convert .DOC format file to .RTF format files in bulk on day to day basis. word to rtf converter

    ReplyDelete
  49. Very good topic, similar texts are I do not know if they are as good as your work out. windows 10

    ReplyDelete
  50. If you want to know how to open cfg file without knowing what they look like, then you need to know that these programs will get rid of the files.

    ReplyDelete
  51. Thanks for this. I really like what you've posted here and wish you the best of luck with this blog and thanks for sharing. Convert Docx to PDF

    ReplyDelete
  52. Is it possible to use DocX in VB 6.0?

    ReplyDelete
  53. Great article. Thanks for sharing this with us. Keep it up.
    web design and development services

    ReplyDelete
  54. Great post on RTF to PDF Converter,

    Thanks for sharing.

    ReplyDelete
  55. Wow great information thanks to share that information look at that also

    ReplyDelete
  56. Her on example you do it on a local computer file (with path of file). Do you know how to do it with base64 (Convert(base64Input, base 64output, WdSaveFormat.wdFormatPDF))??

    ReplyDelete