Vacuum and Vacuum Full POSTGRESQL

4 min readJun 25, 2020

The vacuum of the PostgreSQL system is a routine maintenance process. The system will also start the autovacuum daemon when starting the service to maintain this. Of course, the vacuum command also allows users to manually perform the vacuum operation. In addition to clearing the dead tuples, vacuum also has the function of freezing the transaction ID, because now the transaction ID of PostgreSQL only supports 32 bits (about 4 billion transactions), when the transaction ID reaches the maximum value, it will start from 0 again, before this time All transactions will become “future” transactions become invisible.

The following mainly introduces the process of vacuum cleaning dead tuples. When clearing dead tuples, the system has two processing methods: vacuum and vacuum full. The main differences are as follows:

Vacuum process

The execution process of vacuum is mainly divided into the following three steps:

1. Clear the index tuples pointed to by dead tuples

During this process, vacuum will scan the target table in sequence and build a list list composed of dead tuples. The list list will be stored in the maintenance_work_mem cache. Then vacuum removes the index pointed to by the dead tuples according to the dead tuples list.

2. Remove dead tuples, update VM and FSM

The removal of the dead tuples here is only marked as reusable space, and there is no real physical deletion. So after vacuum cleaning the table, the actual space of the table has not decreased. After the dead tuples are marked for removal, the vacuum will rearrange the remaining tuples for defragmentation. Then, the VM (Visibility Map File) and FSM (Free Space Map File) of the target table need to be updated.

3. Update statistics and related system tables

Finally, you need to update the statistics of the vacuum target table (to adapt to the latest query optimization) and related system tables.

Vacuum Full process

The biggest difference between Vacuum Full and Vacuum is that Vacuum Full physically deletes dead tuples and re-releases the released space to the operating system, so after vacuum full, the size of the table will be reduced to the actual space size. The processing process is very different from vacuum, the processing steps are as follows:

1. When vacuum full starts to execute, the system will first create an AccessExclusiveLock on the target, which does not allow outside access (preparation for later copy), and then create a new table with the same table structure as the target table.

2. Scan the target table and copy the live tuples from the table to the new table.

3. Delete the target table, recreate the index on the new table, update the VM, FSM and statistical information, related system tables, etc.

Therefore, the essence of vacuum full is to generate a new data file, and then store the live tuples of the original table in the data file. Compared with vacuum, the disadvantage of vacuum full is that the table cannot be accessed during execution. Due to the need to import live tuples data into the new table, its execution efficiency will be very slow. The advantage is that after execution, the table space only stores live tuples, and there is no redundant dead tuples, which will improve the efficiency of query execution.

Note:1. Vacuum full will lock the table, vacuum will not.2. The vacuum just converts the space of the deadtuple row into a usable state. The vacuum full will move the data behind these space fragments up.

Since vacuum is a routine maintenance of the database system, after the system is started, an autovacuum daemon will be started to deal specifically with this. The autovacuum cleaning process generally uses vacuum instead of vacuum full. The reason is that although vacuum can maintain the minimum size of the table, if the table still has room to continue to grow, this does not make much sense, especially for frequently written tables. And vacuum is also better than vacuum full in execution efficiency.

The autovacuum process mentioned above is a launcher process, it does not perform vacuum operation, it is responsible for starting the vacuum worker process, and then the vacuum worker process performs the corresponding vacuum and analyze operations.

The autovacuum process starts the autovacuum worker process for the database every autovacuum_naptime interval. Then each worker process will check every table in the database and perform vacuum or analyze operations when needed. If there are N databases, a new worker process will be started every autovacuum_naptime/N seconds. Only a maximum of autovacuum_max_workers worker processes are allowed to run at the same time. If there are more than autovacuum_max_workers databases that need to be processed, the next database will be the first worker It is processed immediately after the end.

Vacuum and Vacuum Full POSTGRESQL

Vacuum process

Vacuum Full process

Written by Pankaj kushwaha

Responses (1)