Batch Updates with PL/pgSQL
by David Wheeler09/07/2006
The previous article in this series, Practical PL/pgSQL: Managing Ordered Sets, created four functions to simplify the management of ordered collections as many-to-many relationships. The two more complex functions, entry_coll_tag_set() and entry_coll_tag_add(), take an iterative approach to managing those relationships. By iterative,
I mean that they use loops to iterate over an array of IDs in order to do the right thing for each.
The downside to this approach is that the performance of those functions is directly proportional to the number of IDs in the array (Ο(n)). It would be ideal to make the runtime of the functions constant, regardless of the number of IDs in the array (Ο(1)).
Fortunately, there is a way to do just that in PostgreSQL. Before then however, think back to the Fibonacci examples from the first article in this series, Introduction to PostgreSQL PL/pgSQL. Returning to those examples, I'll introduce some new concepts in a simpler format than the collection functions allow.
Set-Returning Functions
As I mentioned in the first article, PostgreSQL functions can return a value of any supported data type. I didn't mention that they can also return sets of a particular type. A set is a list of values of a particular type, but rather than returning those values as a list or an array (as you might expect in a dynamic programming language), PostgreSQL functions return them as rows of data.
Suppose that you need to get a list of Fibonacci numbers up to a particular
place in the Fibonacci sequence. Writing such a function in PL/pgSQL is as
simple as modifying the fib_fast() function to return each
Fibonacci number as it's calculated. It does so using the PL/pgSQL
RETURN NEXT statement. Here are fib_fast() and the
new set-returning function fibs_to():
1 CREATE OR REPLACE FUNCTION fib_fast(
2 fib_for integer
3 ) RETURNS integer AS $$
4 DECLARE
5 ret integer := 0;
6 nxt integer := 1;
7 tmp integer;
8 BEGIN
9 FOR num IN 1..fib_for LOOP
10
11 tmp := ret;
12 ret := nxt;
13 nxt := tmp + nxt;
14 END LOOP;
15
16 RETURN ret;
17 END;
18 $$ LANGUAGE plpgsql;
1 CREATE OR REPLACE FUNCTION fibs_to(
2 max_num integer
3 ) RETURNS SETOF integer AS $$
4 DECLARE
5 ret integer := 0;
6 nxt integer := 1;
7 tmp integer;
8 BEGIN
9 FOR num IN 1..max_num LOOP
10 RETURN NEXT ret;
11 tmp := ret;
12 ret := nxt;
13 nxt := tmp + nxt;
14 END LOOP;
15
16 RETURN NEXT ret;
17 END;
18 $$ LANGUAGE plpgsql;
There are really only three differences aside from the function names, and I've emphasized them in fibs_to(). The first difference is on line three, where the fibs_to() declaration indicates that it returns a SETOF integer instead of simply an integer. The SETOF keyword tells PostgreSQL that this function returns a set.
The other differences are that, rather than simply returning the value of the ret integer variable, fibs_to() uses the RETURN NEXT statement to return each Fibonacci number after its calculation in the loop. The final RETURN NEXT statement returns the final Fibonacci number in the sequence.
Those are the only changes necessary to create a set-returning function. As such a function, fibs_to() must also be called in a different context. While you can call fib_fast() in a SELECT statement:
try=% select fib_fast(8);
fib_fast
----------
21
(1 row)
fibs_to() essentially behaves like a table, and you must treat it as such by using it in a FROM clause:
try=% select * from fibs_to(8);
fibs_to
---------
0
1
1
2
3
5
8
13
21
(9 rows)
Be warned, however, that while it looks like fibs_to() and behaves like a continuation, (and, for most practical purposes, is treatable like a continuation), PostgreSQL actually buffers all of the values returned by RETURN NEXT and only returns them to the calling context after the function has calculated them all. That means that if you write a set-returning function that returns a lot of values, you need to make sure that your server's memory can handle it.
That caveat aside, set-returning functions can be extremely useful.



