In March 2018 Microsoft announced general availability of Azure Database for PostgreSQL, a simple and fully managed database service for PostgreSQL that removes the complexities around infrastructure management, data availability, protection, and scale. The service continues to see tremendous growth, and in the months since Amit Banerjee's blog posting Performance best practices for using Azure Database for PostgreSQL we have added even more performance improvements to the service. In addition, through hands-on experience with customers' workloads we have determined several best practices for tuning workloads. This post describes service performance improvements and performance tuning best practices to consider based on the characteristics of your workload. We recommend testing all these best practices because of the potential tradeoffs associated with each one.
The recent service improvements relate to storage and CPU optimizations resulting in faster IO latency and CPU efficiency. I/O intensive workloads and read heavy workloads will experience the most benefit from these improvements.
Tuning best practices
There are many ways to optimize your PostgreSQL database and applications for better performance. In this post, we share ideas around optimizing:
- Bulk inserts
- Client applications
PostgreSQL uses MVCC to allow greater database concurrency. Every update results in an insert and delete, and every delete results in the row(s) being soft-marked for deletion. This results in dead tuples that have to be purged later. PostgreSQL achieves this by running a vacuum job.
The vacuum job can be triggered manually or automatically. There will be more dead tuples when the database is experiencing heavy update or delete operations and fewer when idle, making the need for running vacuum more frequently under load. This makes running vacuum jobs manually inconvenient.
Autovacuum on the other hand, can be configured and benefits from tuning. The default values that PostgreSQL ships with try to ensure the product works on all kinds of devices including Raspberry Pis, and the ideal configuration values depend on a number of factors:
- Total resources available - SKU and storage size.
- Resource usage.
- Individual object characteristics.
Visit "Optimizing autovacuum on Azure Database for PostgreSQL server" for detailed information about using, configuring, monitoring and assessing the cost of autovacuum.
PostgreSQL documentation – Chapter 18, Server Configuration.
PostgreSQL documentation – Chapter 24, Routine Database Maintenance Tasks.
Optimizing bulk inserts
For customers that have workload operations that involve transient data or that insert large datasets in bulk, consider using unlogged tables.
Visit "Optimizing bulk inserts and use of transient data on Azure Database for PostgreSQL server" for more detailed information about using unlogged tables.
PostgreSQL documentation – Create Table SQL Commands
Optimizing client applications
By default, TCP uses Nagle's algorithm which optimizes by batching up outgoing packets. This means fewer sends and this works well if the application sends packets frequently and latency is not the highest priority. Latency gains can be realized by sending on sockets created with the TCP_NODELAY option enabled. This results in lower latency but more sends. TCP_NODELAY is a client-side setting that should be considered on a client Virtual Machine (VM). Applications that benefit from the TCP_NODELAY option typically tend to do smaller infrequent writes and are particularly sensitive to latency. As an example, latency can be reduced from 15-40 ms to 2-3 ms with this setting.
If CPU usage/CPU percent for an Azure Database for PostgreSQL server is saturated at 100 percent, first use pg_stat_statements in PostgreSQL to determine which queries are taking longest. If after optimizing the long running queries CPU usage is still high, consider scaling up to the next vCore tier. For example, if the CPU usage is hovering around 100 percent consistently for General Purpose 4 vCore, scale up to a General Purpose 8 vCore.
High CPU usage is not the only indicator of CPU bottleneck. Single-threaded applications can also result in CPU exhaustion of one CPU while the other CPUs are under-utilized. Consider parallelizing your workload to take advantage of all the vCores available.
Visit "Optimize query statistics collection in Azure Database for PostgreSQL server" for detailed information about using pg_stat_statemetns and Query Store.
Optimizing query time with toast table storage strategy
There are four different strategies to store toast-able columns on disk representing various combinations between compression and out-of-line storage. The strategy can be set at the level of data type and at the column level.
Visit "Optimizing query time with TOAST table storage strategy" for detailed information about toast table storage strategies.
PostgreSQL documentation – Chapter 66, Database Physical Storage.
Special thanks to Saikat Sen, Sunil Kamath, Fady Sedrak, Raymond Martin, Sameer Arora, Kamil Kedzia, Rachel Agyemang, Harini Gupta, Jan Engelsberg, Bhavin Gandhi, Arun Sundaram and Michal Primke for their contributions to this posting.
We hope that you will take advantage of the Azure Database for PostgreSQL service. Please continue to provide feedback on the features and functionality that you want to see next. Start today by creating your free Azure account, and then create an instance of Azure Database for PostgreSQL. If you need any help or have questions, please check out the Azure Database for PostgreSQL documentation. You can also reach out to us by using the Ask Azure DB for PostgreSQL alias, and be sure to follow us on Twitter @AzureDBPostgres and #postgresql for the latest news and announcements.