Lille, 14 September 2023
On 12 September, the PG Day UK event took place in London. Given the convenient travel from Lille to London, I found it fitting to attend this day and meet members of the PostgreSQL community from the UK (and Europe as well). I offer you a summary of the topics discussed by the speakers.
Welcome and Opening
The 2023 edition marks a new governance structure with the participation of PostgreSQL Europe, leveraging the reputation and experience gained from other events such as PGConf Europe, PGConf Germany, PGDay Paris, or FOSDEM PGDay. You can find a list of all events on this page.
Dave Page takes the stage to provide an overview of the history of PostgreSQL events in the UK. This is an opportunity to recall that previous editions (CHAR, then PGDay UK, and later PostgresLondon) were carried for many years by Simon Riggs and the 2ndQuadrant team, and later by the EDB team. A brief pre-recorded message from Simon Riggs himself opens this day of conferences.
What’s new in PostgreSQL 16
The presentation is available at this address.
The floor is given to Magnus Hagander, one of the members of the Core Team, as tradition dictates at these community conferences. Magnus provides an engaging overview of the timeline for a major release, including the schedules of the Commitfests from the past year and the freeze period that began several months ago in preparation for the release of version 16, along with the release of interim release candidates.
Magnus then highlights the numerous new features that can be found in the release notes. I’d like to mention the ones that caught my attention.
- Abandon of the
promote_trigger_file
parameter (which replaced thetrigger_file
parameter since version 12) in favor of other promotion methods. - Support for regular expressions to simplify (or not?) access rules for the
pg_hba.conf
file. - The ability to connect randomly from a list of servers using the
load_balance_hosts=random
option in thelibpq
library. - Optimisation of the
VACUUM FREEZE
process. - A new
pg_stat_io
view for better disk access statistics. - The
COPY
command now accepts a wildcard to substitute the default value of a column during insertion. - Logical replication benefits from replay on multiple processes, and it’s possible to replicate changes from a standby instance.
- Numerous optimisations.
Using buffers for query optimisation
The presentation is available at this address.
Michael Christofides introduced himself. He is involved in projects like pgMustard and postgres.fm for several years, focusing on performance-related topics in PostgreSQL. At Dalibo, we highly appreciate his work because it serves as inspiration for our experts and tools like PEV2 (also known as explain.dalibo.com).
During his presentation, Michael discussed the EXPLAIN (ANALYSE, BUFFERS)
command. He emphasised that when optimising queries, it’s not enough to consider
only the execution time (provided with ANALYSE
). The amount of pages
(BUFFERS
) actually manipulated is equally important and should be considered
more frequently to ensure stable long-term performance.
I found this perspective very relevant, especially as he gradually presented optimisation examples where execution time remained stable but the number of pages decreased with each new solution.
He summed it up by concluding: “Less, but better.”
IoT with PostgreSQL
Chris Ellis, Lead Tech at Nexteam, introduced himself as a generalist who has had the opportunity to work on various data collection systems, particularly in the field of the Internet of Things (IoT). He discussed his experience and presented PostgreSQL as the universal solution for storing data collected by IoT devices and sensors.
He delved into the temporal storage of data, explaining that PostgreSQL natively serves as a time-series database thanks to advanced data types that allow uniform storage within a row, without the need to create relationships and waste precious bytes on metadata for each row (24 bytes):
- Range types (
tsrange
,daterange
) - Array (
bigint[]
, etc.) - JSONB format
He also shared some tricks, like using the COPY
statement for loading data,
which is more efficient than using INSERT
statements alone or in batches.
PostgreSQL offers the option to disable logging for “UNLOGGED” tables to further
accelerate data loading. PostgreSQL’s robustness allows for concurrent row
insertion using multiple processes.
Chris highlighted the benefits of time-ranged partitioning, which provides flexibility in data retention and archiving to slower disks. In terms of access performance and storage optimisation, he emphasised that correlated data on a disk can benefit from BRIN indexing, which is ideal for chunk-based searches and highly storage-efficient.
PostgreSQL addresses various challenges using SQL and advanced concepts. Chris
explained how PostgreSQL can easily group (GROUP BY
) and combine (JOIN
) data
sets to extract meaningful insights. He also presented window functions and the
generate_series
method for filling/interpolating data gaps.
During the Q&A session, an audience member asked why he hadn’t mentioned the specialised solution TimescaleDB. Chris responded that he appreciates TimescaleDB but prefers to have full control over the technique.
Ensuring database security in DevSecOps
Vincent O’Dea from Fujitsu took on the challenging task of making the security component of a DevOps approach appealing. Personally, I wasn’t particularly captivated by the first part, which discussed the very commercial and theoretical aspects of DevOps, including the continuous deployment loop.
During his talk, Vincent mentioned that the Center for Internet Security (CIS) regularly publishes hardening recommendations for PostgreSQL. Among these, he highlighted several key points:
- Keeping minor versions up to date every quarter.
- Strengthening authentication with an external system like PAM or LDAP.
- Encrypting data (disks, file systems, TDE,
pgcrypto
). - Encrypting communications (SSL/TLS).
- Masking data in compliance with the PCI-DSS standard.
- Changing default settings (listening port, instance owner).
- Revoking rights on the public schema (before PostgreSQL v14).
- Auditing operations (DML, DDL) and centralising logs.
What I learned interviewing the PostgreSQL Community
Multiple versions of this presentation are available at this link.
Andreas Scherbaum, one of the founders of PostgreSQL Europe and a key figure in the PostgreSQL community in Germany, took the stage after a brief standing meal break. I was delighted to attend his presentation, although he had already given it at other events in 2022 and 2023, including the very first one at the Citus Con 2022.
Andreas shared anecdotes about the origins of the idea for community interviews, which he discussed with his colleagues Devrim and Magnus in Rome in 2019. For those unfamiliar with the concept, it involves a weekly publication on the site https://postgresql.life.
He briefly touched on the logistics involved in these publications, including data distributed across many tabs in Google Docs spreadsheets, and the articles are written in markdown format and published using the Hugo framework. Finally, he shared statistical results regarding various questions such as:
- Who pronounces it as “Post-gray”?
- Which countries are the community members from?
- What was their first version of PostgreSQL used?
- Is there diversity in the community?
- How many share the hobby of chess?
(Extra questions time)
With some time to spare on the schedule, the presenters took the initiative to engage in an open question session, allowing as many attendees as possible to benefit from the presence of the experts in the room. Current topics were discussed, including the debate on multi-threaded architecture and the logical replication of DDL instructions.
During this session, the presenters revisited how the pgsql-hackers mailing list functions and the importance of participating in code reviews. According to the audience, developing can be quite challenging, but compiling a patch and validating a feature that meets user expectations is relatively straightforward. They also mentioned that the community wiki contains several articles on adhering to various processes, such as how to submit a patch or how to review it.
BRIN improvements and new opclasses
An earlier version of his presentation is available at this link
Tomas Vondra is one of the committers of the PostgreSQL project. He contributes to internal performance mechanisms like statistics and BRIN indexing. He is a strong advocate for BRIN indexing, as he believes it deserves more attention and exploration. According to him, its development is quite isolated from the rest of the project. His presentation started with a theoretical reminder of the differences between a B-tree index (the default) and a BRIN index.
-
The first type of index (B-tree) attaches a row to an index entry, optimising access to information based on equality criteria, or membership in a subset. The main drawback is its high storage cost.
-
BRIN indexing, on the other hand, records addresses of a range of rows (the min and max) in an index entry, assuming that data within the range correlates with their storage on disk. Tomas’ illustrations help to understand this concept, explaining that chunks are particularly space-efficient, with reasonable access times.
Tomas then went through the enhancements in version 14, bringing several
optimisations, such as new operator classes int8_minmax_multi_ops
and
*_bloom_ops
. The latter is useful, especially when data comes from a random
calculation (like the uuid
type) and strict equality criteria are used.
Version 16 also brings several significant advances for BRIN indexing, including
partial support for managing NULL
data in the index. Additionally, there’s an
optimisation for writes with a HOT mechanism, allowing the index entry to be
skipped if the row hasn’t changed pages during copying.
Finally, Tomas made predictions about features and optimisations that might
appear in version 17 (he has been actively working on these for several
Commitfests). These include parallel creation of BRIN indexes, stable
planning for multiple searches involving IN (...)
with many values,
and the planner’s ability to use an index in a sort node.
Embeddings, not embedded - Postgres in the age of AI
Alastair Turner from VMware delivered the last presentation of the day, focusing on a topic quite distant from traditional DBA concerns: machine learning (ML) and natural language processing (NLP).
What I gathered from the presentation is that developments in semantic analysis and large language models (LLM) are progressing rapidly. A part of the PostgreSQL community is actively contributing to the emergence of tools for training models using data. In this race, competitors include pgvector and pg_embedding.
Conclusion
I was delighted to participate in this event. The topics covered were diverse and of good to very good quality. Thanks to the local organisers and those from the PostgreSQL Europe association for making this day possible!
The next major community event will be the PGConf.EU in Prague, taking place from December 12th to 15th, 2023. It promises to be another great opportunity for the PostgreSQL community to come together and share knowledge and experiences.