Introduction
A DocX user asked me during the week when was I going to support converting Word 2007 documents (.docx) into other useful forms such as (.doc, .pdf, .html). I would love to add this functionality to DocX, however there is a problem.
The Problem
The only easy way to do this conversion, is to use Microsoft’s Office interop libraries. For anyone who doesn't know what Microsoft’s Office interop libraries are, I envy you.
The Microsoft Office interop libraries are available in the Add Reference dialog.
The Code
Once you have added a reference to Microsoft.Office.Interop.Word you can use the below project to convert a Word 2007 .docx into .doc, .pdf, and .html.
- using System;
- using System.Collections.Generic;
- using System.Linq;
- using System.Text;
- using Word = Microsoft.Office.Interop.Word;
- using Microsoft.Office.Interop.Word;
- namespace ConsoleApplication1
- {
- class Program
- {
- static void Main(string[] args)
- {
- // Convert Input.docx into Output.doc
- Convert(@"C:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.doc", WdSaveFormat.wdFormatDocument);
- /*
- * Convert Input.docx into Output.pdf
- * Please note: You must have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed
- * http://www.microsoft.com/downloads/details.aspx?FamilyId=4D951911-3E7E-4AE6-B059-A2E79ED87041&displaylang=en
- */
- Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.pdf", WdSaveFormat.wdFormatPDF);
- // Convert Input.docx into Output.html
- Convert(@"c:\users\cathal\Desktop\Input.docx", @"c:\users\cathal\Desktop\Output.html", WdSaveFormat.wdFormatHTML);
- }
- // Convert a Word 2008 .docx to Word 2003 .doc
- public static void Convert(string input, string output, WdSaveFormat format)
- {
- // Create an instance of Word.exe
- Word._Application oWord = new Word.Application();
- // Make this instance of word invisible (Can still see it in the taskmgr).
- oWord.Visible = false;
- // Interop requires objects.
- object oMissing = System.Reflection.Missing.Value;
- object isVisible = true;
- object readOnly = false;
- object oInput = input;
- object oOutput = output;
- object oFormat = format;
- // Load a document into our instance of word.exe
- Word._Document oDoc = oWord.Documents.Open(ref oInput, ref oMissing, ref readOnly, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref isVisible, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
- // Make this document the active document.
- oDoc.Activate();
- // Save this document in Word 2003 format.
- oDoc.SaveAs(ref oOutput, ref oFormat, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
- // Always close Word.exe.
- oWord.Quit(ref oMissing, ref oMissing, ref oMissing);
- }
- }
- }
The result
Please note
This code will only execute on a machine that has Microsoft’s Office installed on it. The Microsoft’s Office interop libraries actually execute a “hidden” instance of the Office. If you run the above code and then take a look at taskmgr you will see the following.
If you want to convert to .pdf, you must also have the Microsoft Office 2007 Add-in: Microsoft Save as PDF or XPS installed.
It is for this reason that I have not included convert functionality into my DocX library. I do not want DocX to have a dependency on Word.exe.
The future
Is there no way to do conversions without having Word.exe installed on my machine. I didn’t say that, I said there is no easy way. This looks very promising, now if I could only find the time.
Donation?
As always, I offer this code to you for free. I am however a student and if you would like to say thank you, you can buy me lunch by sending a €5 euro donation via paypal.
Cool Trick to Export to PDF, i was looking it for quite some time.
ReplyDeleteThanks for sharing
The perfect!These articles written too great,they rich contents and data accurately.they are help to me.I expect to see your new share.
Delete-----------------
RS Gold Runescape Gold Buy WOW Gold
hi, i am a student of software engineering and i doing work on my final year project, i want help. i want to convert pdf file in different formats in clients side , means in jacascript. could you please help ma ?
Deleteengg.nashib@gmail.com
it doesn't work it shows some command failed exception on saveas property
DeleteI would like to thank you for the efforts you have made in writing this article. I am hoping the same best work from you in the future as well. Thanks... view
DeleteI just started look at your Open Source Project "DOCX" and then saw this blog. I would really suggest you to look at the OpenXmlPowerTools.HtmlConvertor and the iTextSharp. You can use both of these in combination to generate either html and or pdf from the html. It is not as perfect, but does the job pretty nicely without overage of the PIAs. The HTML is also pretty clean.
ReplyDeleteThanks,
I love it,Excellent article.I am decide to put this into use one of these days.Thank you for sharing this.To Your Success!
ReplyDelete_____________________________________________________________________________
Rc Helicopter Parts|Rc Helicopter|Mini Rc Helicopter
Thanks very helpful!
ReplyDeleteIt works good. Excellent article. Thanx.
ReplyDeleteIts pretty good and very easy to understand.
ReplyDeleteThanks
This comment has been removed by the author.
ReplyDeleteThankyou
ReplyDeleteHey guys, here you are a reliable store to buy WoW gold which is really cheap. I know it through my friend's recommendation. If you are a wow fan, you can have a try. You know, it is difficult to buy cheap wow gold with fast delivery. Hope you like it.
ReplyDeletesame error....oDoc.Activate(); error "Object reference not set to an instance of an object." for issue. Help me plssssss :( that's y i tried ur dll but it don't have this functionality...
ReplyDeletecheck below setting
DeleteStart->dcomcnfg.exe
Computer
Local Computer
Config DCOM
Search For Microsoft Word 97-2003 Documents->Properties
Tab Identity ,change from Launching User To Interactive User
Hi Cathal,
ReplyDeleteThis library appears to optimised for writing/editing documents. If there are good interfaces to enumerate the document then I would suggest coding up iTextSharp to output to many different potential formats. (From memory iTextSharp has a generic interface to output to many formats).
I would be willing to donate toward to such a project. At the moment there is a lockup of commercial products. Docx to Pdf particularly would be great to have in the open source realm, as even commercial products can have bugs which you can't fix yourself.
Hi...
ReplyDeleteBut it is not working in IIS .
Can you suggest me how to do this
can we convert doc to pdf without installing microsoft offic and open office and also without using third party dll's
ReplyDeleteNot working need to have Microsoft office to avail Microsoft’s Office interop libraries.
ReplyDeleteI had some errors on this when creating .doc and .html files. I found that i had to use oDoc.Close() to get everything to work properly.
ReplyDelete威而鋼犀利士臺灣壯陽藥專賣店
ReplyDeletehttp://www.viagrastw.com/
威而鋼
壯陽
壯陽藥
壯陽藥品
威而鋼哪裡買
威而鋼副作用
威而鋼專賣店
威而鋼價格
威而鋼藥局
http://pfizer.viagrastw.com/
威而鋼
http://weg.viagrastw.com/
威而鋼
威而鋼犀利士臺灣壯陽藥專賣店
http://www.cialisstw.com/
犀利士
壯陽
壯陽藥
壯陽藥品
犀利士價格
犀利士哪裡買
犀利士副作用
犀利士藥局
犀利士價錢
犀利士藥效
犀利士專賣
http://lilly.cialisstw.com/
犀利士
http://xls.cialisstw.com/
犀利士
If DocX uses Microsoft.Office.Interop's dll then why don't we convert document to pdf using that only.?
ReplyDeleteThanks for sharing wonderful tips.
ReplyDeleteI am very curious to know how doc file can be converted into PDF without installing MS office. The problem is there is restriction to install MS office on production server. Please suggest the best way to convert word to pdf or any third party tools.
To be able to build the solution, you’ll need to have Office installed on your machine. This can be painful if you’re dealing with a CI server.
ReplyDeleteI found a simple way around this problem. First change the Embed Interop Types property to false and rebuild your solution. You’ll end up with a Microsoft.Interop.*.dll in your bin/debug directory. Now take this assembly and store it in your project as a .dll. Next, you’ll have to update your project references to that new location, instead of directly referencing the installed version in your installation path.
After that, you can change the Embed Interop Types to true, and rebuild again. Now you should be able to build without the need of an Office installation on your machine.
i get this error on server 2012r2
ReplyDeleteRetrieving the COM class factory for component with CLSID {000209FF-0000-0000-C000-000000000046} failed due to the following error: 80070005 Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)).
do you know how can convert word to pdf without word installed?
ReplyDeleteThank you for sharing this. I know Microsoft Office has its own plug-in for saving Word and Excel as PDF files. But if anyone need to convert Word to PDF on other applications, a third-party converter is necessary. I use RasterEdge, which support Word, Excel and Tiff to PDF, tiff, word and convert Word to PDF, I think it's also a good tool for you.
ReplyDeleteAnyone else have an issue where it thinks you updated the original word document and prompts if you would like to save?
ReplyDeleteobject doNotSaveChanges = Word.WdSaveOptions.wdDoNotSaveChanges;
ReplyDeletethis.Close(ref doNotSaveChanges, ref missing, ref missing);
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteMore of the handy principles have been prescribed in well written form which will help in keeping your assignments get completed within time and there would be more tasks to done with. png to html
ReplyDeleteFor a Office product key, any editions, check out this site: www.gankings.com, got mine from here, perfectly used!
ReplyDeleteCheck out this site: www.motionkeys.com, you'll gonna find some fine working product keys.
ReplyDelete
ReplyDeleteWindows 7 Key Code (http://www.windows10keysale.com)
Hearing all the negative sentiments toward the Windows Server 2012 R2 Essentials Product Key? OS made me really cautious in upgrading from Windows 7.
Finally, my Windows 7 became buggy enough that I decided it was time for me to go ahead and upgrade.
The days of reformatting my hard drive and reinstalling Windows OS (95, 98, XP...), when the computer started to act up, are over. I got the pro upgrade version....for a really great price on Windows 7 Key Code (http://www.windows10keysale.com) .
I purchased all my software from them. They are the best online store I ever buy.
Installation took a while, with lots of downloading and installing drivers, updates...etc.
When it was finally done, computer was running a lot faster, and all of the weird behaviors of previous installation were gone.
Now as far as using the new OS, it really wasn't that big of a deal. Yes, it's different from the previous Windows; but after 2 or 3 days, I've picked up so many cool features that I truly believe it's a superior OS compared to the previous Windows releases. Really.
The interface is quite convenient once you figure out how to use all the features and what happens when you move your cursor around the screen.
Oh, yeah, and I don't have a touchscreen. Although I'm certain that a touchscreen would definitely be a more effective way to navigate Windows 8, I really have no problems using just the mouse.
So, there you have it. I love this new OS.
Windows 7 Key Code (http://www.windows10keysale.com)
Hi cathal,
ReplyDeleteI know very well we can convert doc to html with this library and also with office.interop. But my requirement is need to convert doc with entire content(headers & footers) to html. I tried more with interop but we couldn't find the solution. can i do this with Docx?? If yes please give references.
Thanks,
Naveenraj.S
This pdf to word conversion tool works well and is free. Keep adding more such information for the benefit of the people.
ReplyDeleteBest PDF Conversion Services are available
ReplyDeleteHi, I need your help. when i'm trying to load a file with.doc extension, it is giving an exception says that "file contains interrupted data. how can i overcome this?
ReplyDeleteis there any way to print html tags to word using docx
ReplyDeleteBest PDF Conversion Services are Available at reduced cost.
ReplyDeleteThere are a lot of tools claiming the best in converting the PDF into HTML but from my personal experience and as an developer, i would recommend the hand crafting conversion of PDF to HTML and the best agency to my knowledge in PDF to HTML Conversion services is HTML Pro
ReplyDeleteVisitOnlineShop
ReplyDeletefree trial .NET tool, converting docx to pdf in c#, converting docx to html in C#
ReplyDeleteThank you for bringing more information to this topic for me. I’m truly grateful and really impressed.
ReplyDeletexml data conversion
This comment has been removed by the author.
ReplyDeletehtml to pdf converterHTML to PDF Converter | Convert from html to pdf online, using SelectPdf web to pdf convert
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteAwesome article…..truly appreciated thanks keep sharing convert word to html
ReplyDeletehtml to pdf converterHTML to PDF Converter for .NET | Select.Pdf offers a powerful html to pdf converter that can be used in any .NET application to convert any web page or raw html string to pdf
ReplyDeletepdf libraryCreate high quality PDFs with SelectPdf Html To Pdf Converter from the best PDF library. HTML to PDF API also available
ReplyDeletehtml to pdf converter
ReplyDeleteSelectPdf Free Html To Pdf Converter Samples for C# / ASP.NET. Pdf Library for .NET with full sample code in C# and VB.NET.
https://selectpdf.com/html-to-pdf/demo/
html to pdf online
ReplyDeleteHTML to PDF Converter Online | SelectPdf offers a powerful and free to use online html to pdf converter, as well as the possibility to add "download as pdf" buttons on your site.
https://selectpdf.com/category/html-to-pdf-online/
I was looking for this process for so long! I went through a lot of different ways and approaches to make this happen but none of the procedures really worked for it. I saw this in practice and always wanted to know the whole dilemma behind it. Thanks to you my quest finally ended here!
ReplyDeleteI also searched for the second option a Bytescout. I read good customer reviews about this tool too. I will definitely try these both options.
you can use this library to convert pdf to html in c#.
ReplyDeletePdf library
ReplyDeleteCreate high quality PDFs with SelectPdf Html To Pdf Converter from the best PDF library. HTML to PDF API also available
to get more - https://selectpdf.com/
Big thanks for sharing this great post about DOCX to PDF Converter that will help all bloggers.
ReplyDeleteHtml to pdf api
ReplyDeleteHTML to PDF Converter Online | SelectPdf offers a powerful and free to use online html to pdf converter, as well as the possibility to add "download as pdf" buttons on your site.
to get more - https://selectpdf.com/category/html-to-pdf-online/
Nice post, I have just read your articles on HTML to PDF Converter which is really amazing and awesome.
ReplyDeleteAscent BPO Services is a one-stop solution for all your BPO services outsourcing requirements. We offer high-quality transcription, Data entry projects, web research, data entry work, Non-voice & form filling projects, data conversion, scanning, and many more outsourcing BPO services at a very economical rate.
ReplyDeleteNice blog.This Batch Word to RTF converter is an indispensable tool for those who have to convert .DOC format file to .RTF format files in bulk on day to day basis. word to rtf converter
ReplyDeleteVery good topic, similar texts are I do not know if they are as good as your work out. windows 10
ReplyDeleteIf you want to know how to open cfg file without knowing what they look like, then you need to know that these programs will get rid of the files.
ReplyDeleteThanks for this. I really like what you've posted here and wish you the best of luck with this blog and thanks for sharing. Convert Docx to PDF
ReplyDeleteIs it possible to use DocX in VB 6.0?
ReplyDeleteGreat article. Thanks for sharing this with us. Keep it up.
ReplyDeleteweb design and development services
Great post on RTF to PDF Converter,
ReplyDeleteThanks for sharing.
Wow great information thanks to share that information look at that also
ReplyDeleteHer on example you do it on a local computer file (with path of file). Do you know how to do it with base64 (Convert(base64Input, base 64output, WdSaveFormat.wdFormatPDF))??
ReplyDelete