TimescaleDB: Time-series with PostgreSQL

In this blog post, we will take a look at TimescaleDB, a PostgreSQL extension for storing and analyzing time-series data. You can find the source code for TimescaleDB on GitHub.

Storing time-series data comes with some demanding challenges: high ingest rates, large data volumes, and the need for fast queries on recent data.

TimescaleDB is designed to meet these challenges by extending PostgreSQL with time-series-aware storage and query features while keeping the familiar PostgreSQL interface.

For developers, that means you do not have to learn a completely new database or query language. You can still write SQL, use normal Postgres drivers, and keep your existing operational model.

Core features ¶

TimescaleDB is a PostgreSQL extension that adds time-series capabilities on top of the standard relational model. It brings several powerful features for handling time-series data efficiently. These features are designed to improve write throughput and keep analytical queries fast as data grows over time.

Hypertables: A hypertable is a virtual table that automatically partitions data into smaller chunks based on time. This allows efficient storage and query performance on large time-series datasets. Because chunks are time-bounded, TimescaleDB can skip irrelevant chunks during range queries, which reduces I/O and planning overhead.
Time bucketing: Functions like time_bucket and time_bucket_gapfill allow you to group data into fixed time intervals, which is essential for reporting and dashboards.
Continuous aggregates: Materialized views that refresh in the background as new data arrives, providing fast query performance for common rollups and summaries. They are especially useful when you have a lot of data and need predictable read latency for dashboards.
Columnstore: TimescaleDB supports converting older data to a more analytics-friendly storage format. Columnstore allows efficient storage and query performance for historical data.
Retention policies: Automated retention policies can drop old chunks based on your defined data lifecycle.

Running TimescaleDB ¶

TimescaleDB is available as a Docker image, which makes it easy to get up and running without installing anything directly on your machine. You can also install it on a local Postgres instance if you prefer.

For the following examples, I use this Docker Compose configuration. The timescale/timescaledb image contains everything you need to run TimescaleDB, including the PostgreSQL server and the TimescaleDB extension.

For more information about installing and running TimescaleDB, see the official documentation.

Before you use TimescaleDB features in a database, make sure the extension is enabled in that database.

CREATE EXTENSION IF NOT EXISTS timescaledb;

timescaledb.sql

Creating a hypertable ¶

For the following examples, I use Go with the pgx driver to connect to TimescaleDB. All the SQL features shown here are available in any language that can connect to PostgreSQL.

The first step is to create a table that stores your time-series data. In this example, we create a simple table for sensor data with a timestamp, sensor ID, temperature, and humidity.

CREATE TABLE IF NOT EXISTS sensor_data (
    time        TIMESTAMPTZ NOT NULL,
    sensor_id   TEXT NOT NULL,
    temperature DOUBLE PRECISION,
    humidity    DOUBLE PRECISION,
    PRIMARY KEY (time, sensor_id)
);

001_schema.sql

It is important to have a primary key that includes the time column for efficient partitioning and querying. In this example, the primary key is a composite of time and sensor_id, which works well for queries that filter by time and by sensor.

Next, we convert this table into a hypertable. To do that, we use create_hypertable, which takes the table name, a partitioning definition, and optional parameters. The if_not_exists option tells TimescaleDB to skip the command if the hypertable already exists.

SELECT create_hypertable('sensor_data', by_range('time'), if_not_exists => TRUE);

timescaledb.sql

Under the hood, TimescaleDB stores rows in chunk tables partitioned by time. You still query sensor_data, but the planner can prune irrelevant chunks for time-bounded queries. This improves performance because inserts are routed to active chunks, range queries usually touch only a subset of chunks, and indexes stay local to chunks instead of one giant global structure.

For this project, I chose a one-day chunk interval. You can set the chunk interval when creating the hypertable, and you can alter it later with set_chunk_time_interval.

SELECT set_chunk_time_interval('sensor_data', INTERVAL '1 day');

timescaledb.sql

The chunk interval determines how data is partitioned and has a big impact on query performance. Setting it too small or too large can lead to suboptimal performance, so it is important to choose an interval that aligns with your ingest rate, retention period, and query patterns. If you query recent data very frequently, a smaller chunk interval can help. If queries commonly scan broader windows and ingest is moderate, a larger interval may be more appropriate.

Inserting data ¶

There is nothing special about inserting data into a hypertable. You can use the same INSERT statements you would use for a regular table, and TimescaleDB will automatically route the data to the correct chunk based on the timestamp.

INSERT INTO sensor_data (time, sensor_id, temperature, humidity)
VALUES (NOW(), 'sensor_1', 22.5, 60.0);

Querying ¶

TimescaleDB's bucketing query functions let you group data into fixed time intervals. This is a common pattern in time-series analysis.

`time_bucket` ¶

The time_bucket function groups data into fixed time intervals. For example, to get the average temperature per hour for the last 24 hours, you can use the following query:

SELECT time_bucket('1 hour', time)::timestamptz AS hour,
       AVG(temperature) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour DESC
LIMIT 10;

timescaledb.sql

`time_bucket_gapfill` ¶

The time_bucket_gapfill function is similar to time_bucket, but it also generates buckets for time intervals that have no data. This is useful for visualizations and analysis where you want to see gaps in the data explicitly. For example, to get the average temperature every 10 minutes for the last 30 minutes, including empty buckets, you can use the following query. The empty buckets will have NULL values for avg_temp.

SELECT time_bucket_gapfill(
           '10 minutes',
           time,
           start => NOW() - INTERVAL '30 minutes',
           finish => NOW()
    )::timestamptz AS bucket,
       sensor_id,
    CASE
        WHEN COUNT(temperature) = 0 THEN NULL::float8
        ELSE AVG(temperature)::float8
    END AS avg_temp
FROM sensor_data
WHERE sensor_id = $1
    AND time >= NOW() - INTERVAL '30 minutes'
    AND time < NOW()
GROUP BY bucket, sensor_id
ORDER BY bucket DESC
LIMIT 5;

timescaledb.sql

Fill missing values with `locf` and `interpolate` ¶

time_bucket_gapfill only creates the missing timestamps. By default, value columns are still NULL for empty buckets.

Sometimes that is exactly what you want. For monitoring and incident analysis, NULL clearly means "no sample arrived." In other cases, for example in trend charts, you may want to fill the missing values.

TimescaleDB provides helper functions for this:

locf(...) (last observation carried forward): repeats the last known value.
interpolate(...): linearly interpolates between surrounding values.

Example with carry-forward. When a value is missing, it repeats the previous value instead of showing NULL. This is useful for state-like metrics where the last known value is still relevant until a new one arrives.

SELECT time_bucket_gapfill('10 minutes', time) AS bucket,
       sensor_id,
       locf(AVG(temperature)) AS temp_locf
FROM sensor_data
WHERE sensor_id = 'sensor_1'
  AND time >= NOW() - INTERVAL '2 hours'
  AND time < NOW()
GROUP BY bucket, sensor_id
ORDER BY bucket;

With linear interpolation, when a value is missing, it is estimated from the values before and after the gap. This is useful for smoother physical signals like temperature, where you expect gradual changes over time.

SELECT time_bucket_gapfill('10 minutes', time) AS bucket,
       sensor_id,
       interpolate(AVG(temperature)) AS temp_interp
FROM sensor_data
WHERE sensor_id = 'sensor_1'
  AND time >= NOW() - INTERVAL '2 hours'
  AND time < NOW()
GROUP BY bucket, sensor_id
ORDER BY bucket;

For more in-depth explanations, check out this blog post on gap-filling functions.

When you use gap-filling functions, you might want to distinguish between original values and filled values in your results. This can be important for analysis and visualizations, so you do not accidentally treat filled values as real data points.

You can do this by adding an extra column that indicates whether the value is original or filled. For example, with interpolated values, you can add a value_type column that shows original for real data points and filled for interpolated points:

SELECT time_bucket_gapfill('10 minutes', time) AS bucket,
       sensor_id,
       interpolate(AVG(temperature)) AS temp_interp,
       CASE
         WHEN AVG(temperature) IS NULL THEN 'filled'
         ELSE 'original'
       END AS value_type
FROM sensor_data
WHERE sensor_id = 'sensor_1'
  AND time >= NOW() - INTERVAL '2 hours'
  AND time < NOW()
GROUP BY bucket, sensor_id
ORDER BY bucket;

Continuous aggregates ¶

Raw hypertable queries are great for ad-hoc analysis, but they can have variable read latency. If you need predictable read latency, for example for dashboards, you usually want to pre-aggregate data into rollups. You could do this with regular PostgreSQL materialized views, but continuous aggregates are usually a better fit for ongoing time-series workloads.

This query creates a continuous aggregate that calculates hourly averages, minimums, maximums, and counts for each sensor. The WITH (timescaledb.continuous) clause tells TimescaleDB to treat this as a continuous aggregate, which means it is refreshed incrementally in the background as data changes in the underlying hypertable.

CREATE MATERIALIZED VIEW IF NOT EXISTS sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS bucket,
    sensor_id,
    avg(temperature) AS avg_temp,
    min(temperature) AS min_temp,
    max(temperature) AS max_temp,
    avg(humidity) AS avg_humidity,
    count(*) AS sample_count
FROM sensor_data
GROUP BY bucket, sensor_id;

timescaledb.sql

You also need to set up a refresh policy for the continuous aggregate. This tells TimescaleDB how often to refresh the materialized view and how far back to reprocess data. The following policy refreshes every 10 minutes, reprocessing data from the last 2 days while leaving a 10-minute buffer near now() to avoid refreshing very hot buckets too aggressively.

SELECT add_continuous_aggregate_policy(
    'sensor_hourly',
    start_offset => INTERVAL '2 days',
    end_offset => INTERVAL '10 minutes',
    schedule_interval => INTERVAL '10 minutes',
    if_not_exists => true
);

timescaledb.sql

Policy semantics:

start_offset: how far back refresh jobs should reprocess (handles late-arriving data).
end_offset: keep a buffer near "now" to avoid refreshing very hot buckets too aggressively.
schedule_interval: how often the background refresh runs.

When you query dashboard data, you do not select from the raw hypertable. Instead, you select from the materialized view. This is much faster because it is pre-aggregated and optimized for reads.

One important detail: in current TimescaleDB versions, real-time aggregates are disabled by default. That means the continuous aggregate view lags slightly behind the newest raw rows until the next refresh runs or you enable real-time aggregates to have up-to-date results.

SELECT
    bucket,
    sensor_id,
    round(avg_temp::numeric, 2)::float8 AS avg_temp,
    round(min_temp::numeric, 2)::float8 AS min_temp,
    round(max_temp::numeric, 2)::float8 AS max_temp,
    sample_count
FROM sensor_hourly
WHERE bucket > now() - interval '7 days'
ORDER BY bucket DESC, sensor_id
LIMIT 10;

timescaledb.sql

Columnstore, compression, and retention policies ¶

When you store millions of time-series events, it is important to have a clear data lifecycle strategy. TimescaleDB's storage-management features help you manage storage growth and keep your database performant over time.

TimescaleDB uses Hypercore / columnstore as the current recommended path for optimizing storage and query performance of older data. The older compression API is still supported, but if you are starting fresh on a current TimescaleDB release, use columnstore / Hypercore.

Columnstore ¶

With Hypercore, new data stays in the rowstore for fast writes, and older chunks can be moved to the columnstore for better analytical performance and lower storage usage. In practice, this gives you a hybrid row-columnar model: hot data remains write-friendly, while colder data becomes scan-friendly.

You enable columnstore on the hypertable with ALTER TABLE. In this example, data is segmented by sensor_id and ordered by time DESC. These settings should match your query patterns.

ALTER TABLE sensor_data
SET (
    timescaledb.enable_columnstore = true,
    timescaledb.compress_segmentby = 'sensor_id',
    timescaledb.compress_orderby = 'time DESC'
);

timescaledb.sql

You can then add a policy that automatically moves older chunks to the columnstore. The following policy converts chunks older than 7 days.

CALL add_columnstore_policy(
    'sensor_data',
    after => INTERVAL '7 days',
    if_not_exists => true
);

timescaledb.sql

For more details, see the Hypercore documentation, the ALTER TABLE Hypercore reference, and add_columnstore_policy.

Compression ¶

The older compression API is still supported, and you will still see it in existing deployments and older examples. However, Hypercore / columnstore is recommended for new projects on current TimescaleDB releases.

Retention policies ¶

Retention policies are similar to compression policies, but instead of compressing old data, they drop it. This is useful for data that is no longer relevant after a certain period. The following policy automatically drops chunks that are older than 30 days. For a long-running system, it is important to have a retention policy in place to prevent unbounded storage growth. You can adjust the interval based on your retention requirements.

Retention works at the chunk level, not row by row. In practice, that means TimescaleDB can efficiently remove old data only when a whole chunk falls outside your retention window.

SELECT add_retention_policy(
    'sensor_data',
    drop_after => INTERVAL '30 days',
    if_not_exists => true
);

timescaledb.sql

Wrapping up ¶

TimescaleDB is a powerful extension that brings time-series capabilities to PostgreSQL. If you only store a few events per day, a regular Postgres table may be enough. But as your data grows, you are more likely to run into performance problems with high ingest rates and large data volumes. TimescaleDB is designed to handle these challenges with hypertables, time bucketing, continuous aggregates, storage-management features, and retention policies.

With TimescaleDB, you also stay in the PostgreSQL ecosystem, which means you can use the tools and libraries that already work with Postgres. You can reuse your existing SQL and Postgres knowledge instead of learning a completely different database or query language.

Core features ¶

Running TimescaleDB ¶

Creating a hypertable ¶

Inserting data ¶

Querying ¶

time_bucket ¶

time_bucket_gapfill ¶

Fill missing values with locf and interpolate ¶

Continuous aggregates ¶

Columnstore, compression, and retention policies ¶

Columnstore ¶

Compression ¶

Retention policies ¶

Wrapping up ¶

`time_bucket` ¶

`time_bucket_gapfill` ¶

Fill missing values with `locf` and `interpolate` ¶