Convert OpenXML (.docx, etc.) in Linux using command line - OpenOffice.org Ninja

Convert OpenXML (.docx, etc.) in Linux using command line

Posted by Andrew Z at Monday, January 7, 2008 | Permalink

Microsoft Office 2007 came out about a year ago. Have you yet started getting .docx, .xlsx, or .pptx files? Whether you are an OpenOffice.org user or a Microsoft Office 2003 user, you are probably frustrated trying to find some way to convert, import, or otherwise open these documents.

These new document formats are called Office Open XML, OOXML, or OpenXML. They are significantly different from the Microsoft Office 97, 2000, 2003, or XP formats. The new formats are based on XML while the old formats are binary.

If you don't use Linux or you are afraid of the terminal, skip to the very end. Otherwise here is one easy way to convert these documents thanks to Novell. They developed a tool called the OpenOffice OpenXML Translator, and soon it will be fully integrated into the mainstream OpenOffice.org.

Because this procedure is for the command line (and does not directly integrate into the vanilla OpenOffice.org), this procedure is well suited for automated, batch conversions between OpenXML and OpenDocument Format. This procedure should work on all Linux distributions: Ubuntu, Fedora, SUSE, Mandriva, Debian, and so on.

Installation prerequisites

Install the programs rpm2cpio and cpio. If you run a system such as Fedora, run this command:

sudo yum -y install cpio rpm

If you run Ubuntu, run this command:

sudo apt-get install rpm libgif4

Installation procedure

The general idea is the same no matter which Linux distribution you use. You are basically copying one file out of the RPM as if it were a tarball or a zip file. You are not installing the RPM in the traditional sense, so don't worry if you run a non-RPM-based system such as Ubuntu, Debian, or Slackware

  • Download odf-converter-1.1-7.i586.rpm for i386 systems or odf-converter-1.1-7.x86_64.rpm for x86_64 systems.
  • Open a console.
  • Change directory to your download directory. Depending on your setup, it may be: cd ~/Desktop
  • To unpackage the rpm, run this command: rpm2cpio odf-converter*rpm | cpio -ivd
  • To copy the binary run this command: sudo cp usr/lib/ooo-2.0/program/OdfConverter /usr/bin
  • Optionally you may now delete the opt and usr directories you just unpacked as well as the .rpm file you downloaded. However, you may wish to keep the files under usr if you are interested in documentation and sample OpenXML documents.

Usage

The usage is simple. To convert a .docx file (Word 2007) to a .odt (OpenDocument Format) file, just run:

OdfConverter /i example.docx

Then, you will find the .odt file in the same directory as the .docx. Then, open the ODF file in OpenOffice.org or your favorite office suite.

For more help on arguments, just run OdfConverter without arguments. If you are curious, there are some OpenXML sample documents included in the RPM: check the directory usr/share/doc/packages/odf-converter/.

Related articles

60 comments:

Redric said...

Thank you so mach!

Andrew Z said...

Redric,

You're welcome!

Tsiolkovsky said...

On thing to always keep in mind is that OOXML is a broken format and the user sending files in this format should be warned about this. See more on NoOOXML.

Anonymous said...

helped a lot!

Thanks!!

Scott said...

Trying to run it on CentOS, and I'm getting:

OdfConverter: error while loading shared libraries: libgif.so.4: cannot open shared object file: No such file or directory

Thoughts?

Andrew Z said...

Scott:

Here is the command (which you can enter in a terminal) to find which package you need:
sudo yum whatprovides libgif.so


Here is how to install the necessary package:
sudo yum install giflib


Andrew

jeremo said...

I'm trying to run this in ubuntu, but i get:

OdfConverter: error while loading shared libraries: libtiff.so.3: cannot open shared object file: No such file or directory

how can i resolve this?
many thanks!!

jeremo said...

ok i figured it out.

by running "locate libtiff.so" I found that i have libtiff.so.4

so i made a symbolic link with
"sudo ln -s /usr/lib/libtiff.so.4 /usr/lib/libtiff.so.3"

then a bit of "sudo ldconfig" to update the library cache and bob is now my uncle

i'd love to take credit for that bit of wisdom but i actually found it here:
http://linux.derkeiler.com/Mailing-Lists/Ubuntu/2006-04/msg02897.html

thanks for the guide, it's a real treat

jeremo said...

http://linux.derkeiler.com/Mailing-Lists/Ubuntu/
2006-04/msg02897.html

Josiah said...

thanks, jeremo. That fixed my problem!

Anonymous said...

I've just installed it, but I only get

bash: /usr/bin/OdfConverter: cannot execute binary file

any help?

Andrew Z said...

@anonymous: Which Linux distribution do you use? Is it 32-bit or 64-bit? Do you have the 32-bit libraries installed? Have you tried searching for the error?

Charlie said...

thanks to everyone's help (especially jeremo's follow up) I was able to get this running on Ubuntu as well...turns out google-earth had libtiff.so.3 and i just copied it into /usr/lib ...don't get confused when you run OdfConvert and it doesn't work, it's "OdfConverter" :)

Giray said...

thanks for all

it's great to be able to find whatever you want in linux.
Linux forever

etittley said...

To unpackage, I had to filter through bunzip2:

rpm2cpio odf-converter-*.i586.rpm | bunzip2 - | cpio -ivd

Debian 4.0

Anonymous said...

Thanks!

This worked fine on Fedora 8 and default OpenOffice.

I had to 'rpm -i --nodeps odf-convertor...' because it complained about openoffice not being > 2.0 (which it is: 2.3)

Thanks, it was a big help.

Uwe said...

Mathematical equations are not
translated.


Hello, I wrote a ODT file, containing enumerated list, tables colors and mathematical formula. I converted it to docx format.

- I could not open that file with the Oygen Openoffice which contains a docx import filter, files corrupted.
- converted the docx file back to odt, but then the math formula were gone.

can anybody comment on this?

TJ said...

Thanks a lot, this is REALLY helpful. I was using zamzar to convert and that works but is slow with the free account.

One suggestion to make it easier. If you prefer GUI rather than CLI you can right-click on a docx file and say "open with" then choose custom command. For the custom command type "OdfConverter /i" After that you can go to properties of any docx file and go to the "Open With" tab and choose OdfConverter as the default program. From now on, when you click to open a docx it will not open it but will instead spit out a odt file in the same directory which you can open. This works well in Gnome/Nautilus. I am sure the instructions are slightly different in KDE/Konqueror.

BK SimonB said...

No joy with PCLinuxOS 2007 yet. This is similar to Mandriva.

I get
error: Failed dependencies:
OpenOffice_org >= 2.0 is needed by odf-converter-1.1-7.i586
libgif.so.4 is needed by odf-converter-1.1-7.i586

OO is installed, just has a different package name. Same with libgif, since libgif.so.4 is present in the /usr/lib directory.

If I install with nodeps I get
/var/tmp/rpm-tmp.6321: line 22: SuSEconfig: command not found

When it finishes there is no file called OdfConverter installed in /usr/bin.

Andrew Z said...

BK SimonB: Did you install OdfConverter with "rpm -Uvh ...."? You aren't supposed to do that unless you run the Novell version, so please read the instructions again.

dmk said...

In order to get the conversion in the action context menu under KDE/Konqueror you need to do two things
1. save the following as $HOME/.kde/share/apps/
konqueror/servicemenus/
convertmsXMLtoodt.desktop

#Word XML>Openoffice.odt
[Desktop Entry]
ServiceTypes=application/mswordXML
Actions=docx2odt

[Desktop Action docx2odt]
Name=Convert XML to Openoffice .odt
Icon=doc
Exec=Odfconverter /i %f


2. create an application/mswordXML mimetype by
Control Center -> KDE Components -> File Associations -> Add... Group = applications, Type Name = mswordXML, Filename Patterns = *docx

bhups said...

Is it necessary to have openoffice installed on the system before using OdfConverter?
I am able to convert pptx into odp's but the output is not good enough i.e. in lots of slides TEXT in the slide is getting clipped.
So is OdfCoverter somehow dependent on OpenOffice i.e. some schema files, it looks for in oo installation directory?
Thanks!

Andrew Z said...

dmk: I was thinking of something like that, and you encouraged me finally to do it. Check out odf-converter-integrator which should be usable in the next few days.

bhups: odf-converter is completely independent of OpenOffice.org, and you can use odf-converter if OpenOffice.org is never installed. Check odf-converter on how to report a bug

Anonymous said...

SWEET! Thanks! This worked like a champ :)

Jacques Charroy said...
This comment has been removed by the author.
Jan said...

A cool thing this converter :) . BTW, in your guide you're mentioning "OdfConvert" instead of "OdfConverter" command. I spend a few minutes figuring out what's the problem with the command:).

Tried with SuSe 10.1

Andrew Z said...

Jan: Thanks. I fixed it and made a few updates.

Lacsi said...

Is there any other conversion tools available for the common office formats(XLS,DOC,), like this handy docx converter?
I need them as concole apps on linux server whitout oo.org installed.
Thanks for any help!

Kseniya [magenta@tut.by] said...

Hi to all! Thanks for a cool thing, but there's a small problem. It takes time to write some junk to log (I mean "[INFO]...process:cprocess:cprocess" and etc.) I use odf-converter to process very big amounts of data (I convert xlsx files to ods). Could you help me to get source code of converter ? I'd like to stop using method AddLog(), that writes log according to the /LEVEL parameter. I'm very-very-very need in this code! Please, help me.

Andrew Z said...

Kseniya: The source code is here
http://odf-converter.sourceforge.net/

Boo Radley said...

Just FYI, but the latest OpenOffice (2.4.1) included with Ubuntu (Hardy Heron & above) seems to open docx files *natively* (Office 2007 format)...

Though oddly enough the same version of OO in Windows won't open the file(?!) o_O

Andrew Z said...

Boo: Ubuntu ships a heavily modified version of OpenOffice.org. In some ways it is even a fork. Some of the features are great, but I've seen scores of report of significant bugs specific to Ubuntu OOo/Go-oo/ooo-build (same thing basically).

You can also get something similar for Windows from Go-oo.org, but on all my systems I use the vanilla edition.

Last time I checked about six months ago, the odf-converter and odf-converter-integrator had a higher quality of conversion than Ubuntu (which was a backport of the OOo 3.0 alpha code).

Anonymous said...

for gentoo you can use rpm2targz to convert the rpm to tar.gz.
in case you get the libgif.so.4 message: you need giflib

Anonymous said...

Thnx for the nice tutorial! really helpful...keep it up!

Health Insurance Quotes

Olle said...

Thank you, after linking a library in Ubuntu 8.04 (Hardy Heron) by
sudo ln -s libtiff.so.4 libtiff.so.3
OdfConverter works

Adrian said...

Many thanks for docx converter.
Work nice!

Adi

Abhinit Tiwari said...

Use zamzar.com and convert your Files for free. thats it! I did this coz my libtiff.so.3 was non-existent or whatever.

Andrew Z said...

Abhinit Tiwari: The new odf-converter-integrator 0.2.0 released this week doesn't require libtiff.so anymore.

Robin said...

I'm really grateful for this tutorial - thanks.

Does anyone know a command line tool for converting xlsx to csv (the file has more rows than OOCalc can handle)?

I was wondering about a xsl translation kit for OOXML...

Anonymous said...

I have a much easier and simpler solution for those of us who aren't so tech savvy.

From a gmail account, attach the docx file and send it to yourself or to another gmail account.

When you receive it you can view it as html, then just copy and paste.

It's not perfect, you'll have to do some ironing out of the format, but I found it much easier than trying to install programs & dependencies etc.

P.S. After this process send an email to the eedjit who sent you the docx telling them how irritating it is!

Andrew Z said...

Anonymous: Even easier than Gmail is odf-converter-integrator. It converts docx/xlsx/pptx files with a click.

convert xml said...

Data Transformation Server includes and or supports pre-built adapters for ASCII flat files, Microsoft Excel, Fixed-length files, complex XML messages and EDI

Andrew said...

Thanks a lot.

No problems on Fedora 8

Anonymous said...

Please donate your old boxes to a church-group or some needy student in these hard times! To comply with the law, and with Microsoft's leasing policy, you can now replace Microsoft OS with the free (download from the net) Ubuntu OS, which can be set to erase the hard drive of all traces of the “illegal to give away ” Microsoft system and your private information, before donation! Now, explain to your lucky recipient that all the manuals they will ever need are available for free on the internet! Just ask for them in Google! OpenOffice, which is installed already is plenty adequate for homework assignments and with a little exploring, everything else can work well too! Happy computing!

Kjt said...

Yes man, it works!! Thanks for your post, it helps me a lot.

All the best,
Mihai

pangea33 said...

This helped me immensely, thanks for posting it. Also big thanks to jeremo. I don't think I would have gotten this working without his post.

Anonymous said...

Did helped me too
thankyou
saved my day actually

Anonymous said...

If you want to output PDF, you may also wish to consider anytopdf, a perl based script that wraps and installs the openoffice.org macros for you.

Usage: anytopdf <infile> <outfile.pdf>

oub said...

Hello

I just visted that side:
anytopdf has no download candidate

Uwe Brauer

Pitro said...

thanks!

btw. it's not a very good idea to make link from libtiff.so.3 to libtiff.so.4. you better download, and build the libtiff3 package. it's not in Ubuntu packages anymore, but it is here:

ftp://ftp.remotesensing.org/pub/libtiff/

also you may want to replace the link to odfconverter above that points to that specific version with a link pointing to download directory... there is a new version from April laying around.

http://download.go-oo.org/red-carpet/ooo-680/sled-10-sp-i586/

sumit said...

Even easy method...

Upload the file into Googlem Docs...and download it in the format you wish to...

I appreciate your work sir, i just mensioned an even easy method for " those who are afraid of command line..."

Keep your works on...

Take Care
Sumit Asok

http://sumitasok.wordpress.com

esilver said...

FYI - simply linking libtiff.so.4 as libtiff.so.3 didn't work for me -- the converter would hang -- so I had to install libtiff.so.3 by building it from source here:

ftp://ftp.remotesensing.org/pub/libtiff/

Chandru said...

Below link is broken

http://download.go-oo.org/red-carpet/ooo-680/

Is any other link available to download the same?

I already downloaded the rpm using the below link

Chandru said...

Is there any odf-converter-integrator rpm download for centos 64 bit linux?

I searched it in the below link.

http://katana.oooninja.com/w/odf-converter-integrator/download

Any help appreciated

Andrew Z said...

Chandru: For now you need to use the 32-bit version on 64-bit systems.

xi\'an said...

Thanks! I get this error message when trying to install the converter:

$ rpm2cpio odf-converter-1.1-7.i586.rpm | cpio -ivd
argument is not an RPM package
cpio: premature end of archive

Andrew Z said...

xi\'an: Your download may be incomplete, so try downloading again

jayeshlalk said...

It worked. Thanks a lot.

MarceloSegura said...

How can I uninstall the plugin? I see it created copies in multiple places of my computer (running peppermint linux)

Jannik said...

Installation in Ubuntu 64Bit:

1. sudo apt-get install rpm libgif4
2. wget http://download.go-oo.org/tstnvl/odf-converter/builds/4.0-rc2/odf-converter-4.0-12.1.x86_64.rpm
3. rpm2cpio odf-converter*rpm | cpio -ivd
4. sudo cp usr/lib64/ooo-2.0/program/OdfConverter /usr/bin
5. sudo ln -s /usr/lib/libtiff.so.4 /usr/lib/libtiff.so.3
6. sudo ldconfig