Mastering PSQL: 7 Essential Commands for Database Efficiency

Postgres or PSQL Database is a type of relational database widely used across the industry. Various big tech corporations across the globe use it to store mission-critical data. The language it understands is SQL. People are already familiar with many DML and DDL commands for interacting with the database. However, there are some important commands that are often overlooked by beginners. Knowing them can make you confident while working with a PSQL database.

Explain Analyze

This command is the "think before you speak equivalent" of the database world. We often struggle with slow-running queries in our database. To understand why they are slow, you need to understand how the database engine actually executes your query. In brief, it creates a plan to execute your query. This plan includes information about how many rows the planner thinks it will have to scan in your database to execute the query and the indexes (if any) it will be using.

EXPLAIN ANALYZE 
SELECT * FROM table_name 
WHERE col_name = 'value';

QUERY PLAN
---------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..11614.43 rows=1 width=244) (actual time=0.345..185.120 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on users  (cost=0.00..10614.33 rows=1 width=244) (actual time=120.450..180.200 rows=1 loops=3)
         Filter: ((email)::text = 'john.doe@example.com'::text)
         Rows Removed by Filter: 333333
 Planning Time: 0.120 ms
 Execution Time: 185.250 ms

As you can see in the above output, it explains that if we execute the select query, it will result in a sequential scan. You can analyze the execution time to identify scope of improvements.

Index Usage

Creating indexes is crucial for ensuring fast read performance. However, it's also important to regularly review the indexes you've created and remove any that aren't being used. To find indexes that are rarely used in your table, use the following command:

SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS number_of_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM
    pg_stat_user_indexes
WHERE
     relname = your_table_name
ORDER BY
    idx_scan DESC;

The number_of_scans column of the above result, shows how many times your created index was used by the PSQL query planner to execute queries. Indexes with zero / low scan count often contribute negligible to read queries performance and add to write queries bloat. Dropping them can be a good option.

Data Distribution Inside a Table

Ever faced a situation where your created index wasn't being used? This next command is all you need. An important factor often overlooked when creating a new index is the data distribution within a table. For columns with very few distinct values, adding an index doesn't improve performance in large tables. Use the following command to visualize data distribution inside your table.

SELECT
    attname AS column_name,
    n_distinct,
    most_common_vals AS top_values,
    most_common_freqs AS frequencies
FROM
    pg_stats
WHERE
    tablename = 'your_table_name'

The query planner uses the result of the above query to identify how many rows it will have to fetch to execute the query. On a high level, it estimates by multiplying the frequency of the value with the total number of rows in a table. This row count is then used to decide whether an index scan should be performed or a sequential scan.

Vacuum - For Cleaning Up Dead Rows

This is one of the most useful commands. Every time you update or delete a row, PSQL does not immediately remove that row from your physical storage. It just changes the row reference. This makes the update/delete operations incredibly fast. It then periodically removes dead tuples in a separate process known as vacuum. You can either run this manually or configure your database to run it at automatic intervals. Its regular execution ensures your database reclaims available space by cleaning up dead tuples. Running this command manually is especially important after performing bulk update/delete operations.

VACUUM your_table_name;

Note: while this is a non-table locking command, it can shoot up the overall CPU usage a little bit. Therefore, try running it when the DB is not at its peak usage.

Monitoring Connections

To run any query, your application must connect to the database. Depending on your database settings, there is a maximum number of connections you can have at one time. Usually, your application will close the connection after running the query. However, sometimes due to a problem in your code, connections may stay open, causing connection leaks. It's crucial to find and fix these issues before they overwhelm your database. Use the following command to check the connection status of your database at any time.

SELECT
    pid,
    usename AS username,
    datname AS database_name,
    client_addr AS client_ip,
    application_name,
    state,
    now() - query_start AS duration,
    query
FROM
    pg_stat_activity
ORDER BY
    duration DESC;

In case of monolithic services, where multiple applications share the same database, you can use the above query to identify which application is using the most number of active connections.

Long Executing Queries

Your DB is choking. CPU utilization is very high. Freeable memory is depleting very fast. You don’t know what to do. Relax, run this command.

SELECT
    pid,
    usename AS user,
    pg_blocking_pids(pid) as blocked_by,
    now() - query_start AS duration,
    state,
    query
FROM
    pg_stat_activity
WHERE
    state = 'active'
    AND (now() - query_start) > interval '1 minute'
ORDER BY
    duration DESC;

It will display all the long running queries which are executing for over a minute. Killing such queries directly can give breathing space to your choked database.

Kill Process

I may be exaggerating a bit, but this final command is the Bharamastra. In production, if you ever face a situation where you see any long running query hurting the database, the first thing that you should do is kill that process manually. You can identify the PID of the long running process by executing the above command. Once you have that, run the commands below to actually kill it.

#Soft Kill (Cancel): Sends a polite signal to stop the query but keep the connection active.
SELECT pg_cancel_backend(process_id); 

#Hard Kill (Terminate): Closes the entire connection (use if Cancel doesn't work).
SELECT pg_terminate_backend(process_id);

The End

That's it for now. PSQL is a great database with many built-in features. You can really dive deep into understanding the query planner and how it estimates the cost of running a query. This knowledge helps you become more confident in debugging production issues. I'm also learning, so if I missed anything, feel free to leave a comment below. Thanks for reading. Namaste!

Mastering PSQL: 7 Essential Commands for Database Efficiency

Explain Analyze

Index Usage

Data Distribution Inside a Table

Vacuum - For Cleaning Up Dead Rows

Monitoring Connections

Long Executing Queries

Kill Process

The End

Comments

More from this blog

Partitioning a Table in PostgreSQL - Pros and Cons Explained

Read this Before Creating Your Next Index in PSQL

5 Things About Kafka Consumers That Beginners Should Know

How I Reduced Infra Costs by 50% Using GitHub Actions

Command Palette

Explain Analyze

Index Usage

Data Distribution Inside a Table

Vacuum - For Cleaning Up Dead Rows

Monitoring Connections

Long Executing Queries

Kill Process

The End

Comments

More from this blog