Regular expressions and backreferences: a real world example - OpenOffice.org Ninja

Regular expressions and backreferences: a real world example

Posted by Andrew Z at Tuesday, January 1, 2008 | Permalink

Wikipedia logo

On the blog Rebel Without A Mouse, Nyenyec expressed frustration with OpenOffice.org's regular expressions. He was trying to convert numbers like "40.7 MB" to "40,700,000" for further calculation.

At the time of his post, OpenOffice.org didn't support backreferences in regular expressions. Actually, the feature requires OpenOffice.org 2.4.

Without the new feature, I expect there would be a more tedious solution involving stripping the binary prefixes (MB, KB, and so on) using Find & Replace and then using Calc multiplication formulas.

Backreferences are available now, so simply copy a table (I choose "50 most edited articles") from the web page and paste into Calc. Then, Search for ([0-9]+)\.([0-9]).?MB.* and Replace with $1$200000 Make sure Regular expressions is checked in the Find & Replace dialog. Then, repeat accordingly for KB, K, and any others.

Did you notice the .* at the end of the regular expression? That removes a regular space followed by a non-breaking space. Without it, OpenOffice.org treats the values as text instead of numbers and it would cause problems in calculations.

Yes, I agree the online help is not as helpful as it could be. That's why I wrote a introduction to regular expressions, an introduction to backreferences in substitutions, and many examples.

1 comments:

Sigurd said...

Thank you very much for your efforts!

I'm just learning regular expressions to speed up my editing and your work is very helpful.

I hope it applies to writer as well as calc.