PostgreSQL, partitioned index, concurrently, procedure, stored procedure, transaction

Pankaj kushwaha
3 min readJan 16, 2021

PostgreSQL, partitioned index, concurrently, procedure, stored procedure, transaction

Background
I have a partition table with tens of thousands of partitions, and I need to create indexes on tens of thousands of partitions. What should I do?

1. I am a quite busy or lazy

2. I am afraid that creating an index will affect the business (creating an index will block the dml if it is not added concurrently), I am afraid of getting out.

3. I don’t want to write a bunch of sql

4. I want to complete it quickly. As far as I know, concurrently has improved after PG 12. If there are a lot of DML in the previous version, concurrently indexing will slow down.

5. There are many partitions. If they are placed in a do, the relevant dml will be blocked from the beginning to the end. If the time is long, I am afraid that it will affect the business and then get fucked.

6. Each partition is relatively small, and it can be done in a few seconds by adding a partition (a few seconds has little impact on the business)

To sum up, how to quickly create indexes for all partition tables without blocking dml (or temporarily blocking dml)?

Yes, procedure (PG 11 as an example)

set lock_timeout=xx; - Avoid waiting for a long time for the lock to cause an avalanche.
loop xx..xxxx
create index [if not exists] [concurrently] ?; - just block dml in the process
commit or rollback; - Each time an index is created, the transaction ends and the lock related to this index is released.
end loop;
reset lock_timeout;

Example:

Example
Create several indexes on one table, and create indexes on several tables, the effect is the same, just change the content of the stored procedure.
create table tab2(uid int, info text);
insert into tab2 select generate_series(1,100), random()::text;
Create a function to execute SQL and support setting lock timeout:
create or replace function exec_sql(text,text) returns boolean as $$
declare
begin
execute format('set lock_timeout=%L', $1);
execute $2;
reset lock_timeout;
return true;
exception when others then
reset lock_timeout;
return false;
end;
$$ language plpgsql strict;
Create a stored procedure, create a bunch of indexes on tab2
create or replace procedure do_sqls() as $$
declare
begin
for i in 1..65 loop
if exec_sql('1s', format('create index IF NOT EXISTS idx_%s on tab2 (uid)', i)) then
raise notice'success index: %','idx_'||i;
commit; - After each partition index is created, the transaction ends and the lock is automatically released
else
rollback; - After each partition index is created, the transaction ends and the lock is automatically released
raise notice'not success, lock_timeout index: %','idx_'||i;
end if;
- perform pg_sleep(5); - Adding a sleep can lengthen the entire time, so that there is a time window to simulate the dml non-blocking situation.
end loop;
- reset lock_timeout;
end;
$$ language plpgsql;

If there are any partitions that are successfully added (for example, since lock timeout was not successfully added), check the note, run again until all of them are active.

Advantage:
1. Only the partition in the index is generated during the operation to block dml relevant to this partition.
2. After each partition’s index has been established, it is immediately available.

Notice that if you are using a previous version of PG12, and you have direct access to the main writing table, congestion can still occur. Direct access to partitions not in the index development process will not be congested, or PG 12 or later versions will be used..

I like to learn new and better ways of doing things when working on a scale, and feel free to ask questions and make suggestions.
Also, check out another story on this.
Thanks for reading this.

--

--

Pankaj kushwaha

Database/System Administrator | DevOPS | Cloud Specialist | DevOPS