Now available for testing is the PDF import extension, which also includes hybrid PDF-ODF export. PDFs are designed for layout instead of for further editing, so when a word processor, spreadsheet, or presentation application exports to a PDF, the layout and document structure are lost. To avoid disappointment, you must keep in mind creating a PDF is not a reversible process because of the limitations of PDF as a format.
Implementations may either favor editing or layout. This extension's current implementation favors layout, so it imports PDFs into Draw. Keeping all that in mind, this extension still has many uses such as adding annotation, filling out forms, making minor edits, using a PDF as a picture, and reusing PDFs for which the original source is lost. Because of the limitations of PDFs, these uses may not be completely painless.
The extension installs as easily as any OpenOffice.org or Firefox extension. OpenOffice.org extensions cannot register file associations with the operating system (though you can set them up manually), but importing a PDF is as simple as clicking on File and then Open. The import process takes a long time compared to opening an OpenOffice.org document because of the necessary guesswork caused by the limitations of PDF.
For a test, I exported ODF_text_reference_v1_1.odt from OpenOffice.org and imported it again. When the initial screen appeared with the results, I stared at it in disbelief. It looked just like the original. The text, layout, font faces, text colors, bold, italics, underline, and picture were well preserved.
Below are the original in Writer and the imported document in Draw. Doesn't it take more than a glance to identify which is the original?
It was only later after closer look that imperfections appeared. For example, interactive PDF form elements (including a text input form and a button) were visibly mangled. That may be fixed in a future release. Then, there are the limitations of PDF import. Each line of text is one or more text boxes. Hyperlinks are merely blue, underlined text without interaction. Superscript is just a smaller font positioned in higher text box. Comments are discarded.
Overall, this is a remarkable result for an early release. Future versions of the extension may take on other forms of PDF import, such as favoring text streams.
Alternative PDF import
OpenOffice.org did not pioneer PDF import—not even in the open source market. Some of the work in OpenOffice.org is done by xpdf, a PDF viewer. To import PDFs, open source alternatives include pdftohtml, Abiword, KWord, and Inkscape. There are also a host of proprietary applications.
Depending on your needs, there are other ways to import PDFs into OpenOffice.org. To import PDFs into Writer or Impress, you may be able to combine the new PDF import extension with copy and paste. If you just need to extract text, copy the text in Adobe Acrobat Reader and paste it into OpenOffice.org. This retains some formatting. If you just need to place a picture from a PDF into OpenOffice.org, take a screenshot or use Adobe Acrobat Reader's snapshot tool. To place a whole PDF as a read-only image, insert the PDF as an OLE object on Windows, click Insert - Object - OLE Object - Create from file. On Linux, you can convert PDFs to bitmap images (such as PNGs) using ImageMagick's convert tool with a command such as
convert foo.pdf bar.png
What makes OpenOffice.org stand from out these other solutions is hybrid PDFs.
Hybrid ODF-PDF files
Combine the viewing and printing portability of a PDF with the editing capabilities of OpenDocument Format. "Have your cake and eat it too," promises ODF is embedded in PDF. When these two open standards team up, you better watch your back, OOXML.
Most applications (such as Adobe Acrobat Reader) ignore the ODF bits and treat the whole hybrid file as a normal PDF. Presentation is pixel perfect. Wait. That's not all. OpenOffice.org 3.0 with this extension treats the hybrid as a normal ODF, so the ODF document opens in Writer, Impress, Calc, or Draw according on the original. (You didn't just expect Writer, did you?) Now you have lossless, editable, round-trip PDFs.
To export hybrid PDF files, you need the (inaccurately named) PDF import extension which adds a new checkbox to the PDF export dialog box. To import hybrid files, you also need this extension.
One downside of this hybrid system is adoption by users will be slow. Especially at first, not everyone will have OpenOffice.org 3.0 and this extension. Other applications may not adopt support for hybrid PDF-ODFs, but the genius is the dual-format strategy mitigates the problem, so everyone will gain at least some use from these hybrids.
Another downside is hybrid documents are larger files because some of the information is duplicated. For what it's worth, ODFs are compressed, and storage capacities are steadily climbing. Monster 2 TB drives are coming next year. Space is cheap, and you probably won't store all your documents as hybrids anyway.
What will be Microsoft's reaction? Office 2007 already has PDF export, and next year Office 2007 SP2 will support ODF natively. Will Microsoft Office one day see hybrid OOXML-PDFs, ODF-PDFs, both, or neither? Don't forget any move Microsoft makes will likely favor XPS, its PDF competitor.
Test this extension in OpenOffice.org 3.0 or later. Though 3.0 comes out in September, the 3.0 beta and developers snapshots are available now. Currently the developer snapshot DEV300_m14 is newer than 3.0 beta.
Remember this extension is not yet a stable 1.0 release. PDF import extension builds are currently available for Linux, for Windows, or for Mac (download links updated 9 June 2008) courtesy of Pavel Janik. (Update 11 June 2008) The extension is available on the OpenOffice.org extensions web site, but it does not exactly require OpenOffice.org 3.0 Beta 2 (not yet available) as written. It works in DEV300_m18 (available now).