O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Print Subscribe to Databases Subscribe to Newsletters

Using Regular Expressions in PostgreSQL

by Robert Bernier
02/02/2006

You know what I love about open source? The tricks you learn in one activity can effectively transfer over to another one. It's like using the skills learned from mathematics on the dance floor. Take regular expressions, for example. Did you know that becoming an expert in using regular expressions in Perl also makes you a PostgreSQL power user?

About Regular Expressions

Regular expressions, for the uninitiated, are a set of key combinations that allow people to search for character matches, thus enabling a large variety of control in a text environment. Regular expressions are a staple of the Unix-like operating systems, although they are virtually unknown in one notable exception (go ahead and guess). They are a common feature of many applications, including text editors, command-line utilities, and programming languages to search and manipulate bodies of text based on certain patterns.

My own experiences with regular expressions began with vi (actually, Vim).

PostgreSQL Pattern Matching

PostgreSQL supports many forms of pattern matching: the traditional SQL LIKE operator, the more recent SIMILAR TO operator (added in SQL 1999), and the Posix-style regular expressions (also implemented in the pattern-matching function substring).

More than likely a database administrator will have cut his teeth by first learning some form of programming or, better yet, systems administration on his chosen platform before playing with a database engine. In other words, he likely has already seen regular expressions long before writing his first SQL statement.

What's the big advantage using regular expressions in PostgreSQL, besides the fact that you already know about regular expressions? Two words: power and speed.

Regular expressions offer the DBA the ability to compose queries using highly complex criteria in a compact and sophisticated manner. Better, they can make your statements execute faster. Regular expressions are simpler than the longer and, sometimes, more resource-intensive operators LIKE and SIMILAR TO. Many SQL statements also need to couple traditional operators with user-defined functions to accomplish the same query as a single statement using regular expressions.

Using Regexes in PostgreSQL

To use regexes in PostgreSQL, you must have sufficient privileges in your installation to create and populate a table. Check the postgresql.conf configuration file; you need either advanced or extended permissions.

The rest of this article presents examples based upon two single-column tables. For the sake of brevity, I've omitted the results. Therefore review the query results with a replicated session on your machine.

First, create a new database containing a table:

CREATE DATABASE regex;
CREATE TABLE myrecords(record text);

From the psql command line, populate the myrecords table. (Note: you must include a backslash followed by a period on a separate line to end date input for the COPY command.)

COPY myrecords (record) FROM stdin;
a
ab
abc
123abc
132abc
123ABC
abc123
4567
5678
6789
\.

Matching single characters

The simplest query uses the tilde operator, ~, followed by literal quoted characters. These examples return all records that contain specific character(s) as defined by the criteria:

SELECT record FROM myrecords WHERE record ~ '1';
SELECT record FROM myrecords WHERE record ~ 'a';
SELECT record FROM myrecords WHERE record ~ 'A';
SELECT record FROM myrecords WHERE record ~ '3a';

The addition of the asterisk, *, after the tilde renders the query search case-insensitive:

SELECT record FROM myrecords WHERE record ~* 'a';
SELECT record FROM myrecords WHERE record ~* '3a';

The exclamation mark (or "bang"), !, modifies the tilde operator and excludes strings containing the character(s):

SELECT record FROM myrecords WHERE record !~ '1';

You can combine case insensitivity and exclusion:

SELECT record FROM myrecords WHERE record !~* 'c';

The caret, ^, known as an anchor, returns records with the specific character(s) located at the beginning of the string:

SELECT record FROM myrecords WHERE record ~ '^1';
SELECT record FROM myrecords WHERE record ~ '^a';
SELECT record FROM myrecords WHERE record ~* '^a';

The dollar sign, $, is also an anchor but at the end of the string:

SELECT record FROM myrecords WHERE record ~ 'c$';
SELECT record FROM myrecords WHERE record ~ 'bc$';
SELECT record FROM myrecords WHERE record ~* 'bc$';

Pages: 1, 2, 3

Next Pagearrow




Tagged Articles

Be the first to post this article to del.icio.us

Related to this Article

Data Jujitsu: The Art of Turning Data into Product Data Jujitsu: The Art of Turning Data into Product
November 2012
$0.00 USD

Designing Great Data Products Designing Great Data Products
March 2012
$0.00 USD

Sponsored Resources

  • Inside Lightroom
Advertisement
O'reilly

© 2013, O’Reilly Media, Inc.

(707) 827-7019 (800) 889-8969

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.

About O'Reilly

  • Academic Solutions
  • Jobs
  • Contacts
  • Corporate Information
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly

Community

  • Authors
  • Community & Featured Users
  • Forums
  • Membership
  • Newsletters
  • O'Reilly Answers
  • RSS Feeds
  • User Groups

Partner Sites

  • makezine.com
  • makerfaire.com
  • craftzine.com
  • igniteshow.com
  • PayPal Developer Zone
  • O'Reilly Insights on Forbes.com

Shop O'Reilly

  • Customer Service
  • Contact Us
  • Shipping Information
  • Ordering & Payment
  • The O'Reilly Guarantee