Example regular expressions for Writer - OpenOffice.org Ninja

Example regular expressions for Writer

Posted by Andrew Z at Sunday, December 30, 2007 | Permalink


Here are some sample regular expressions for OpenOffice.org Writer. Use these example as is or as a basis for building your own regular expressions.

In the Find & Replace dialog box, don't forget to check the box Regular Expressions. Also, you usually will want Match case to be unchecked.

DescriptionSearch for
Empty paragraph without whitespace^$
Empty paragraph with whitespace^[ \t]$
MM/DD/YYYY and M/D/YY dates[01]?[0-9]/[0-3]?[0-9]/[21]?[0-9]{0,3}
Email addresses (not a perfect regex pattern)\<[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}\>
10-digit US phone number such as 123-456-6789[0-9]{3}-[0-9]{3}-[0-9]{4}
Second letter in a word is capitalized like FOo. Make sure the box Match case is checked.\<[A-Za-z][A-Z][a-z]*\>
Long words (10 characters or more)\<[a-z]{10,}\>
Paragraphs beginning with demonstrative pronouns^(This|That|These|Those)
Palindromes with letter letters\<(.).\1\>
HTML, XML, SGML, and similar tags<[a-z/][a-z]*>
IP addresses\<((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]
|2[0-4][0-9]|[01]?[0-9][0-9]?)\>

Note: You must combine the above expression into one line. It was broken to fit the table.

Search and replace

Here are some common regular expression for search and replace.

DescriptionSearch forReplace with
Replace each tab with three spaces\tthree spaces
Replace three spaces with a tabthree spaces\t
Replace non-breaking spaces with regular spaces([:space:)]space
Replace manual line breaks with paragraph breaks\n\n
Replace double smart quotes (aka book quotes or curly quotes) with straight quotes (aka dumb quotes)[\x201C\x201D\x201F]"
Replace single smart quote with straight quotes[\x2018\x2019\x201B]'
Converting dates in YYYY-MM-DD format to MM/DD/YYYY format([0-9]{4})-([0-9]{2})-([0-9]{2})$2/$3/$1

Removing non-breaking spaces without regular expressions

Non-breaking spaces (gray rectangles) often come when pasting from PDFs of web pages. To view them, choose View > Field Shadings. An alternate way to remove them is to:

  1. Highlight one non-breaking space.
  2. Copy it to the clipboard.
  3. Paste it into the Search for field.
  4. Type a space in Replace with.

Replacing with line breaks

Using the standard Find & Replace dialog, it is not possible to use manual line breaks in the Replace with field (issue 46165).

One source of confusion is \n has different meanings between the Search for and Replace with fields. In Search for, \n matches a line break. In Replace with, \n inserts a paragraph ending. (Paragraph breaks are more common, and they are inserted in a document by simply striking the ENTER key.) Another source of confusion is the term carriage return which may refer to either paragraph breaks or line breaks.

To replace with a line break, your choices include these:

  1. Do it manually.
  2. Record a macro to do it once. Assign the macro to a keyboard shortcut. Then, type the shortcut key as many times as required.
  3. Use Tomas Bilek's Alternative dialog Find & Replace for Writer. It is an extension, so it is relatively easy to install.
  4. Use Ian Laurenson's Find and Replace macro. It is not packaged as an extension, so it requires more steps to install.
If you are replacing paragraph breaks with line breaks, keep in mind that you could end up creating a very long paragraph, and Writer has a limitation of 65,534 characters per paragraph (issue 17171).

Replacing dumb quotes with smart quotes

To replacing all the straight quotation marks with curly quotation marks, follow these steps:

  1. Type any word in smart quotes.
  2. Highlight the right smart quote (”).
  3. Copy the selection to the clipboard (shortcut is CTRL+C).
  4. Choose Edit > Find & Replace from the menu.
  5. If you place periods inside the quotation mark, type [\.\?!]" in the Search for field.
  6. If you place periods outside the quotation mark, type "[\.\?!] in the Search for field.
  7. In the Replace with field, paste the clipboard contents (by right clicking or typing CTRL+V).
  8. Press the More Options button.
  9. Check the box Regular Expressions.
  10. Click the Replace All button.
  11. Highlight the left smart quote (“).
  12. Copy the selection to the clipboard.
  13. In the Search for field, type a single dumb quote (").
  14. In the Replace with field, paste the clipboard contents.
  15. Click the Replace All button.

Removing manual page breaks

This is the simplest way to remove all manual page breaks:

  1. Highlight all text in the document (shortcut is CTRL+A).
  2. Chose Format > Paragraph from the menu.
  3. Choose the Text Flow tab.
  4. Uncheck the box Break > Insert.

Finding, removing, or inserting manual page breaks cannot be done with regular expressions or with the Find & Replace dialog (issues 26719 and 63606).

OpenOffice.org version

A few of these examples use backreferences (like $1) which require OpenOffice.org 2.4.

Introduction to regular expressions

In case you missed it, read the introduction to regular expressions.

1 comments:

Anonymous said...

I read dozens of pages in forums and did not find a so natural thing:
how can one
automatically replace all double blanc lines by a single blank line.
(If configured correctly, whil also
change a triple repetition of a blank line into a single blanc line).

Something so easy in emacs and so difficult in OOW.