debian-mirror-gitlab/doc/administration/troubleshooting/postgresql.md
2023-03-05 14:24:40 +05:30

11 KiB

stage group info
Data Stores Database To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments

PostgreSQL (FREE SELF)

This page contains information about PostgreSQL the GitLab Support team uses when troubleshooting. GitLab makes this information public, so that anyone can make use of the Support team's collected knowledge.

WARNING: Some procedures documented here may break your GitLab instance. Use at your own risk.

If you're on a paid tier and aren't sure how to use these commands, contact Support for assistance with any issues you're having.

Other GitLab PostgreSQL documentation

This section is for links to information elsewhere in the GitLab documentation.

Procedures

Support topics

Database deadlocks

References:

ERROR: deadlock detected

Three applicable timeouts are identified in the issue #30528; our recommended settings are as follows:

deadlock_timeout = 5s
statement_timeout = 15s
idle_in_transaction_session_timeout = 60s

Quoting from issue #30528:

"If a deadlock is hit, and we resolve it through aborting the transaction after a short period, then the retry mechanisms we already have will make the deadlocked piece of work try again, and it's unlikely we'll deadlock multiple times in a row."

NOTE: In Support, our general approach to reconfiguring timeouts (applies also to the HTTP stack) is that it's acceptable to do it temporarily as a workaround. If it makes GitLab usable for the customer, then it buys time to understand the problem more completely, implement a hot fix, or make some other change that addresses the root cause. Generally, the timeouts should be put back to reasonable defaults after the root cause is resolved.

In this case, the guidance we had from development was to drop deadlock_timeout or statement_timeout, but to leave the third setting at 60 seconds. Setting idle_in_transaction protects the database from sessions potentially hanging for days. There's more discussion in the issue relating to introducing this timeout on GitLab.com.

PostgreSQL defaults:

  • statement_timeout = 0 (never)
  • idle_in_transaction_session_timeout = 0 (never)

Comments in issue #30528 indicate that these should both be set to at least a number of minutes for all Omnibus GitLab installations (so they don't hang indefinitely). However, 15 s for statement_timeout is very short, and is only effective if the underlying infrastructure is very performant.

See current settings with:

sudo gitlab-rails runner "c = ApplicationRecord.connection ; puts c.execute('SHOW statement_timeout').to_a ;
puts c.execute('SHOW deadlock_timeout').to_a ;
puts c.execute('SHOW idle_in_transaction_session_timeout').to_a ;"

It may take a little while to respond.

{"statement_timeout"=>"1min"}
{"deadlock_timeout"=>"0"}
{"idle_in_transaction_session_timeout"=>"1min"}

These settings can be updated in /etc/gitlab/gitlab.rb with:

postgresql['deadlock_timeout'] = '5s'
postgresql['statement_timeout'] = '15s'
postgresql['idle_in_transaction_session_timeout'] = '60s'

Once saved, reconfigure GitLab for the changes to take effect.

NOTE: These are Omnibus GitLab settings. If an external database, such as a customer's PostgreSQL installation or Amazon RDS is being used, these values don't get set, and would have to be set externally.

Temporarily changing the statement timeout

WARNING: The following advice does not apply in case PgBouncer is enabled, because the changed timeout might affect more transactions than intended.

In some situations, it may be desirable to set a different statement timeout without having to reconfigure GitLab, which in this case would restart Puma and Sidekiq.

For example, a backup may fail with the following errors in the output of the backup command because the statement timeout was too short:

pg_dump: error: Error message from server: server closed the connection unexpectedly

You may also see errors in the PostgreSQL logs:

canceling statement due to statement timeout

To temporarily change the statement timeout:

  1. Open /var/opt/gitlab/gitlab-rails/etc/database.yml in an editor

  2. Set the value of statement_timeout to 0, which sets an unlimited statement timeout.

  3. Confirm in a new Rails console session that this value is used:

    sudo gitlab-rails runner "ActiveRecord::Base.connection_config[:variables]"
    
  4. Perform the action for which you need a different timeout (for example the backup or the Rails command).

  5. Revert the edit in /var/opt/gitlab/gitlab-rails/etc/database.yml.

Troubleshooting

Database is not accepting commands to avoid wraparound data loss

This error likely means that autovacuum is failing to complete its run:

ERROR:  database is not accepting commands to avoid wraparound data loss in database "gitlabhq_production"

To resolve the error, run VACUUM manually:

  1. Stop GitLab with the command gitlab-ctl stop.

  2. Place the database in single-user mode with the command:

    /opt/gitlab/embedded/bin/postgres --single -D /var/opt/gitlab/postgresql/data gitlabhq_production
    
  3. In the backend> prompt, run VACUUM;. This command can take several minutes to complete.

  4. Wait for the command to complete, then press Control + D to exit.

  5. Start GitLab with the command gitlab-ctl start.

GitLab database requirements

The database requirements for GitLab include:

Serialization errors in the production/sidekiq log

If you receive errors like this example in your production/sidekiq log, read about setting default_transaction_isolation into read committed to fix the problem:

ActiveRecord::StatementInvalid PG::TRSerializationFailure: ERROR:  could not serialize access due to concurrent update

PostgreSQL replication slot errors

If you receive errors like this example, read about how to resolve PostgreSQL HA replication slot errors:

pg_basebackup: could not create temporary replication slot "pg_basebackup_12345": ERROR:  all replication slots are in use
HINT:  Free one or increase max_replication_slots.

Geo replication errors

If you receive errors like this example, read about how to resolve Geo replication errors:

ERROR: replication slots can only be used if max_replication_slots > 0

FATAL: could not start WAL streaming: ERROR: replication slot "geo_secondary_my_domain_com" does not exist

Command exceeded allowed execution time

PANIC: could not write to file 'pg_xlog/xlogtemp.123': No space left on device

Review Geo configuration and common errors

When troubleshooting problems with Geo, you should:

Mismatch in pg_dump and psql versions

If you receive errors like this example, read about how to back up and restore a non-packaged PostgreSQL database:

Dumping PostgreSQL database gitlabhq_production ... pg_dump: error: server version: 13.3; pg_dump version: 14.2
pg_dump: error: aborting because of server version mismatch

Extension btree_gist is not allow-listed

Deploying PostgreSQL on an Azure Database for PostgreSQL - Flexible Server may result in this error:

extension "btree_gist" is not allow-listed for "azure_pg_admin" users in Azure Database for PostgreSQL

To resolve this error, allow-list the extension prior to install.