Advanced database index for time-series

Master Thesis

Advanced database index for time-series

Background

We are a data & visualisation company. We have built a distributed database system on top of postgresql that currently serves over 100 customers in Europe, Middle east & South America. Our largest installations process and store 4TB (10^12 data points) per day, with total data managed being measured in hundreds of terabytes.

Assignment

With standard postgresql tables and indexes, working with time series data can be quite heavy, especially when inserting data in a big table (100 million rows) with standard indexes on relevant columns. Our data is structured as “time”, “key columns” and “data columns”. Key columns are “meta data” used for grouping. Data columns are the actual measurements which in general do not have any index.

Currently we have the following properties for online data insertions:

  • The time always increases. That is, we do not insert data for the past
  • The combination of the time column and all key columns is always unique

We are interested in an index that can be made more efficient considering these properties. It should still be efficient to search in the key columns, both for aggregation and finding individual entries. The properties of the index that we are hoping to achieve with this are:

  • Constant time insert.
  • Same performance as standard B-tree index in postgresql.

Performance metrics for testing the solution should include insert-performance, lookup-performance and aggregation-performance, both on time and on key columns. The implementation should be done as a standalone extension to postgresql.

Apply

To learn more about this master thesis or to apply, contact:

Aner Gusic
aner.gusic@agama.tv

To learn more about this master thesis or to apply, contact:

Aner Gusic
aner.gusic@agama.tv

Want to know more?

Would you like to know more about Agama and our story?