A Better Office .docx Converter - OpenOffice.org Ninja

A Better Office .docx Converter

Posted by Andrew Z at Monday, December 8, 2008 | Permalink

There's plenty of ways to convert Microsoft Office 2007 file formats (.docx, .xlsx, .pptx) to OpenOffice.org. Now OpenOffice.org 3.0 imports these Office OpenXML files natively, but natively doesn't mean a fluent translation.

Any translation is subject to imperfections. For OpenOffice.org 3.0, it is the first public release of the Office 2007 converters. Overall, it is very good, but it stuggles mainly in areas related to tracking changes, comments, tables, and drawings. You can wait for OpenOffice 3 to mature, but there's another way for the impatient.

Knowing OpenOffice.org 3.0.0's weaknesses, I designed a .docx document specially to torture OpenOffice.org and to point out its weaknesses. Here is the first page in Microsoft Office Word 2007 where the document was designed:

Here is the same .docx document on the same computer (Windows XP) in OpenOffice.org 3.0.0 using its native filters:

Finally, here is the same .docx document on the same computer in the same OpenOffice.org 3.0.0, but the .docx was passed through odf-converter-integrator 0.2.1. Notice the conversion is much more accurate:

Originally I planned to retire odf-converter-integrator when OpenOffice.org 3.0.0 was released with native .docx, .xlsx, and .pptx support, but then I realized there is still a need for high-accuracy translations. Instead of retiring odf-converter-integrator, I've upgraded it with more features (such as handling templates: .dotx, .xltx, and .potx). The latest version is powered by OdfConverter 2.5, a popular converter usually used the other way: to open ODF files in Microsoft Office.

Expect similarly better conversions with .pptx and .xlsx files, and try it yourself: download odf-converter-integrator and the reference document OpenOfficeOrg300_docx_bugs.docx.

Related articles

31 comments:

Unknown said...

Could you make the Windows build available as a zipped package? Can't download .exe files due to firewall restrictions.

Anonymous said...

Why not contribute to OOo's internal converters rather than build a separate tool?

I've found the development community pretty responsive when I've filed bugs on document incompatibilities. I'm sure OOo would be even better with another talented developer testing and fixing these issues proactively.

Anonymous said...

Could you provide that .docx file you have tested with native oo.org 3 or compare it with go-oo.org version?

Andrew Z said...

gabix: Good idea. I will try to publish the zipped installer this week.

slewis: Mainly because I am just writing a little glue (a small project) between OdfConverter (a large project) and OpenOffice.org. My project is cheap and agile. Also, as I mentioned in the article, originally I wrote it to help OpenOffice.org 2.x users before OpenOffice.org 3 had any docx/xlsx/pptx converters. Just for the OpenOffice.org project, I myself have filed 176 issues (mostly bugs) and assisted with (mostly as a QA volunteer) on 658 other issues. Going back to 2004, 49 of my 133 bug reports are still open. I test, and I have long encouraged others to test too.

Anonymous: Yes, I had it linked at the end of the article, but the link was broken. I fixed it.

Andrew Z said...

slewis: It's also worth noting many people cannot upgrade to OpenOffice.org 3.0.0 for various reasons, and odf-converter-integrator is great for them. One reason is there are showstopper regressions in OpenOffice.org 3.0.0. I used to run OpenOffice.org on a heterogenous office network, and I would not run 3.0.0 on it today because of the file locking changes.

Pengurus said...

It's better you always update new version. Cause some Ubuntu user (Hardy LTS) not get update to OO3. So.. we hv to use OO2 :-(

Before I use odf-converter-integrator-chocolate_0.2.0-1_i386.deb with nothing error. And then your upgrade to odf-converter-integrator-chocolate_0.2.1-1_i386.deb. Now can not open XLSX anymore. Always got this error message:

Read-Error
Data could not be read from the file.

My office desktop and my notebook got same error. I think some thing with the last version.

Andrew Z said...

Lutfi: Thank you. I see it now, and the bug is tracked upstream here: Novell bug 450399: odf-converter-2.5 is unable to convert any XLSX <-> ODP. For now anyone who needs xlsx conversion, please avoid version 0.2.1 (downgrade to 0.2.0 or wait until future version).

Anonymous said...

Andrew, that's great. I think a stand alone converter like this is really good. Then people with other apps like Abi word or Koffice can also benefit. If the OO team want to, they can fix up their converters. Though, I think what you're doing is fantastic and much more useful than just improving OO converters.

Anonymous said...

Is there any reason that it is not in OXT extension format?

Just curious. :-)

Andrew Z said...

int: Good question. It's not really an OpenOffice.org extension. As far as I know, OpenOffice.org does not provide the infrastructure to do all what this extension needs to do.

Those that use Novell's brand of OpenOffice.org can install OdfConverter as an oxt, but even then, I think it doesn't associate the Microsoft Office 2007 file extensions with the operating system.

Andrew Z said...

gabix: A zipped Window installer is now available

Anonymous said...

ı have followed your writing for a long time.really you have given very successful information.
In spite of my english trouale,I am trying to read and understand your writing.
And ı am following frequently.I hope that you will be with us together with much more scharings.
I hope that your success will go on.

Todd Sharp said...

I'm trying to use your project to open a PPTX created in Mac Office 2008. Every time I open with Impress it crashes OO. Also, the command line client fails. Any idea if the converter is compatible with Office 2008? I'd have thought a PPTX is a PPTX....

Andrew Z said...

Todd Sharp: Click on my name at the very bottom of the blog, and email me the PPTX. Also I need to know your operating system and version (e.g. Ubuntu 8.10 or Windows XP), your OpenOffice.org version, and what exactly you mean by crashed (any error message verbatim).

Todd Sharp said...

I'll send it up tonight for you.

Thanks.

Anonymous said...

Andrew, thanks for the test files you've posted.

I've noticed a failure regarding Notes in odt files that your test file can't pick up without a slight mod.

Some wp display the Note but then drop all the remaining content in that paragraph (AbiWord 2.6.6 anyway via portableapps.com). It resumed only with the following new paragraph. There was no way of knowing anything had been dropped.

Since the current test file has nothing to drop, it's always perfect. So, I'd add a "SENTENCE ONE, SENTENCE TWO", and a NOTE BETWEEN SENTENCE ONE AND SENTENCE TWO" in a single paragraph, and then start a "THIS IS A NEW PARAGRAPH" paragraph.

(seen with notes from OOo 3.0.0 and 3.0.1 using default built in template and saved as odt; not checked with old style notes of 2x nor with saving as doc or rtf)

bh

Anonymous said...

Hi Andrew,

You might be interested to see that I've fixed the tables import in go-oo.
Here is a post on that fix.

Regards

brackish said...

Thank you so much for your work on this!

Anonymous said...

Hi Andrew,

As I have now fixed several bugs in this area, written several posts, you'll be able to find them all using the OOXML tag on my blog:

http://cedric.bosdonnat.free.fr/wordpress/?tag=ooxml.

Steve said...

Andrew Z - deep respect. Brilliant. Converter works a treat.
Very grateful to ya.

Unknown said...

Many thanks for this!

Benjamin said...

Does this help out with OpenOffice 3.1? Did OpenOffice correct the conversion problem when they upgraded from 3.0 to 3.1? If not, would this converter be compatible with OpenOffice 3.1?

Thanks, and God bless! John 3:16

Andrew Z said...

Benjamin: docx conversion was improved in OpenOffice.org 3.1, but there are still issues. The odf-converter-integrator application should be compatible with all future OpenOffice.org versions, and if you have issues, I still recommend trying it.

Matthew said...

Great work!! We all appreciate that there are people like you out there who make very useful software available to the Open Source community. KUDOS!!

Del said...

Cool, and still helps out OOo 3.1.1. I had 2.3 and 3.1 installed on my system, and the converter picks 2.3; I then removed 2.3 from my OS (CentOS 5.3) and now the converter puts up a file browser (presumably a zip browser). Does it follow my PATH or does it use some other method to find OOo? Thanks!

Andrew Z said...

Del: odf-converter-integrator uses a combination of techniques including gnome-open and xdg-open. If you can read Python code, it is the linux_open_document() function.

REA said...

That's perfect!... That precisely works in my OpenOffice.org 3.1.1 in Ubuntu 9.04.

But I'm still confused about the versions:

In Windows XP, I'm using Go-oo 3.1.1 which is similar to Novell's OpenOffice.org. For that reason, Novell's "OpenOffice.OpenXML Translator 3.0 e-Media Kit" can work as an extension in Go-oo. Now, which one is better; odf-converter-integrator Strawberry for Windows or Novell's "OpenOffice.OpenXML Translator 3.0 e-Media Kit"?

Plus, is there any plan or option available for Mac OS X's OpenOffice.org and Go-oo? I know that NeoOffice provides odf-converter source support; but what about the others?

Andrew Z said...

CHAPLAIN_VIRUS:
If you use the Novell edition of OpenOffice.org, use their add-on instead of OCI. OCI is mainly for editions of OpenOffice.org that don't support Novell's OdfConverter.

No, there is no plan for Mac support, but maybe I'll change my mind if you donate a Mac. :)

REA said...

Andrew Z:
All right, if one day I will win the lottery, you will be the second one to gift a Mac; first one is me... >;-P

Are you planning to add the odf-converter-integrator to the Ubuntu repositories? Having a Lauchpad PPA would be wonderful.

Andrew Z said...

CHAPLAIN_VIRUS: I am not an official Ubuntu packager, so it's up to them to add it. However, I'm not sure Ubuntu will accept it because of the "funny" code, and Ubuntu seems to prefer their own solution (which is to modify OpenOffice.org itself, which often leads to bugs such as crashes).

Mark Q said...

Brilliant! A well appreciated solution to receiving .docx files from others. Work great on Ubuntu using Version 0.2.2-2.