Release PostgreSQL AI

pgvector 0.8.0 Released — HNSW Improvements and Sparse Vectors

October 30, 2024

pgvector 0.8.0 has been released, bringing a significant set of improvements for AI and vector similarity workloads running inside PostgreSQL.

What’s new

Sparse vector type (`sparsevec`)

Version 0.8.0 introduces a new sparsevec type for storing sparse vectors — vectors where most dimensions are zero. This is particularly useful for embeddings generated by models like SPLADE and BM25, which produce high-dimensional sparse representations. Sparse vectors consume significantly less storage and can be queried more efficiently than their dense equivalents.

HNSW build performance

The HNSW index build algorithm has been optimised for better parallelism. Build times on large datasets are reduced by up to 30% compared to 0.7.x, and peak memory usage during index construction is lower.

Iterative index scans

A new iterative scan mode for HNSW indexes improves recall on filtered queries. Instead of scanning a fixed number of candidates and then applying a filter, PostgreSQL can now iterate through additional HNSW candidates until the requested number of results passes the filter condition. This eliminates the recall degradation that occurs when a tight WHERE clause discards most index results.

Distance function additions

hamming_distance() for binary vectors
jaccard_distance() for binary vectors

Supported PostgreSQL versions

pgvector 0.8.0 supports PostgreSQL 13 through 17.

Installation and upgrade instructions are available on the pgvector GitHub repository.