The writings of Merlin Moncure, professional database developer, about work, life, family, and everything else.

Monday, September 03, 2007

PostgreSQL 8.3 Features: Arrays of Compound Types

This will be the first in what hopefully be a series of entries describing the new and interesting features in PostgreSQL 8.3. The database is entering in the final stages of the development cycle before going into beta with all the major patches having been submitted weeks ago (with one possible stunning exception, lazy xid assignment described here. Thus, aside from the hard decisions that are going to have to be made on the HOT patch (more on that later), 8.3 is shaping up nicely. On a scale of one to ten, this release is a strong eight -- and if all the outstanding patches get in -- a vicious '10'. Excited yet? If not, let me prime the pump with my favorite new feature, Arrays of Composite Types.

David Fetter laid the groundwork for this new feature and was ultimately accepted with the help from many others, including PostgreSQL core developers Tom Lane and Andrew Dunstan. The feature combines two neat features of PostgreSQL, the ability to build complex types out of Plain Old Data Types and arrays (follow the links for an overview of both features).
create table foo(id int, b text);
create table bar(id int, foos foo[]); -- create an array of the type 'foo' in bar
create table baz(id int, bars bar[]); -- create an array of the type 'bar' in baz

insert into foo values(1, 'abc');
insert into foo values(2, 'def');
insert into bar values(1, array(select foo from foo)); -- there are two elements in this array
insert into bar values(2, array(select foo from foo)); -- and this
insert into bar values(3, array(select foo from foo)); -- and this
insert into baz values(1, array(select bar from bar)); -- three elements in here

select ((bars[1]).foos[2]).b from baz; -- use parenthesis to get elements out of composites
postgres=# select ((bars[1]).foos[2]).b from baz;
b
-----
def
(1 row)
Being able to nest composite types in array structures is a powerful feature. While abusive in the extreme to good normalization and relational principles it allows
flexible approaches to some difficult problems:

  • Input Data to Functions: pl/sql and pg/pgsql functions have some limitations in how you can get data into the function. There are various workarounds for this that all have application in specific circumstances, but composite arrays allow you to pass a heap of data to a function that doesn't happen to be in a table. While this could have scalability issues for large arrays (> 10k elements), it can be convenient.
  • Sometimes, Arrays are Just Better. Although this is rarely a good idea, sometimes its beneficial to store data in arrays and the ability to nest complex structures is helpful in this regard. Just remember that in most cases the entire array has to be read from and written back to disk (the server only understands the array as a field and treats it as such).
  • Targeted Denormalization: When passing data between functions or to the client, it may be efficient to accumulate a particular column into an array to reduce duplication of data in the other columns. We may now do this with composites:


drop table bar, baz cascade;       -- we would rarely rely directly on the generated types
alter table foo add column data text default 'data';
insert into foo values (1, 'ghi'); -- make some duplicated on foo by id
insert into foo values (2, 'jlk');
create type foo_data as (b text, data text);
select id, array_accum ((b, data)::foo_data) from foo group by id order by id;

id | array_accum
----+-----------------------------
1 | {"(abc,data)","(ghi,data)"}
2 | {"(def,data)","(jlk,data)"}


Arrays perform best as a server side feature. Because it's only currently recommended to render them to the client in text, Arrays of Composites introduce some interesting parsing issues going back and forth from the server. While this is ok in limited cases like the above example, it's inefficient and error prone to build and properly escape a nested array structure. While PostgreSQL has the ability to transfer arrays in binary over the protocol, there is no API to access the custom format data on the client side. Until that changes, I expect this feature will be most useful to stored procedure developers.

While relatively short on substance, I hope that this writing provides a good introduction to this interesting feature. Some of you will probably seize on this feature for it's utility in solving a narrow, but important class of problems. It's one more weapon in the mighty arsenal of the PostgreSQL developer who is already armed with the most practical, versatile, and generally kick-ass piece of software for solving the world's data management issues.

4 comments:

Michael said...

Right, we DO want more new posts about Postgres 8.3 features! Rock on!

Anonymous said...

More news on Postgresql 8.3, thanks

Einhverfr said...

One additional note, this may be extremely good to certain types of applications, such as those which have very complex set-based constraints over a write set. For example, using standard procedures, there is no way to enforce the constraint that accounting transactions must be balanced. YOu could pass an array of account entries to a function which could check this before storing the data. Hence this need not affect the normalization of tables.

Anonymous said...

Hi Einhverfr
I wish to borrow anal sex, but I do not know from what to begin! Can eat experience in this question?