Graph DB with Postgres

󰃭 2024-07-10

The Big Idea

In this post we will leverage the power of PostgreSQL Common Table Expression to create a makeshift graph database. The goal is to represent nodes (things like stories, tags, etc.) and edges (connections between those things) using relational tables. This allows us to query and traverse the data as if we were working with a ‘real’ graph database.

entity_map: This table is like our directory of entity types. It maps different types of entities (e.g., stories, tags) to their respective tables. It tells us where to find the actual data for each type of entity.
nodes: Think of nodes as the individual items in your graph. For example, each story or tag is a node. Each node gets a unique ID, and it references an entry in the entity_map to tell us what type of entity it is (story, tag, etc.).
edges: These are the connections between nodes. For example, if a story is tagged with a particular tag, there’s an edge between the “story” node and the “tag” node. The parent_node_id and child_node_id represent that connection.

Example table schema

CREATE TABLE stories (
    id          uuid default uuid_generate_v4() PRIMARY KEY,
    title       TEXT NOT NULL,
    body        TEXT NOT NULL
);

CREATE TABLE tags (
    id          uuid default uuid_generate_v4() PRIMARY KEY,
    name        TEXT NOT NULL UNIQUE
);

-- entity types map to tables
-- will have tables for
-- stories & tags  etc
CREATE TABLE entity_map (
    -- singular entity type e.g. Story
    entity_type     TEXT NOT NULL,
    -- table that holds these entities e.g. 'stories'
    entity_table    TEXT NOT NULL,
    PRIMARY KEY (entity_type)
);

-- Nodes of our graph.
-- A row represents a node which represents a single entity
-- via a link through the entity_type and entity_id.
CREATE TABLE nodes (
    id              uuid default uuid_generate_v4() NOT NULL,
    entity_type     TEXT NOT NULL constraint nodes_entity_map_entity_type_fk references entity_map,
    entity_id       uuid NOT NULL,
    PRIMARY KEY (id)
);

CREATE UNIQUE INDEX nodes_entity_type_entity_id_uindex ON nodes(entity_type, entity_id);

-- Edges of a graph. Also known as vertex in graph theory
-- It represents the connections between nodes in a graph
CREATE table edges (
    parent_node_id   uuid NOT NULL,
    child_node_id    uuid NOT NULL,
    PRIMARY KEY (parent_node_id, child_node_id)
);

CREATE INDEX edges_child_node_id_parent_node_id_uindex ON edges(child_node_id, parent_node_id);

Querying with Recursive CTEs

We will use PostgreSQL’s Common Table Expressions (CTEs) to perform a recursive search through the graph. Here’s a quick breakdown of how this works:

First, we grab the node that represents a specific Story (by its UUID).
Then, the UNION ALL part recursively joins nodes through the edges table. This means it follows the connections between nodes, expanding the search.
Finally, we filter down to just the nodes that represent Tag entities connected to the Story.

Example to find all Tag entities for a Story

-- find all `tags` connected to a `story`
-- by traversing the graph using a recursive CTE
WITH RECURSIVE traverse(id, entity_type, entity_id) AS (
    -- This initial part (before the UNION) runs once
    SELECT
        id,
        entity_type,
        entity_id
    FROM
        nodes
    WHERE
        nodes.entity_type = 'Story' AND
        nodes.entity_id = $1 -- the story uuid
    UNION ALL
    -- This part does the recursion
    SELECT
        nodes.id,
        nodes.entity_type,
        nodes.entity_id
    -- with each 'iteration' traverse will have new data
    FROM traverse
        JOIN
    edges ON traverse.id = edges.child_node_id
        JOIN
    nodes ON edges.parent_node_id = nodes.id
)
SELECT
    traverse.entity_id as stor_id
FROM traverse
WHERE traverse.entity_type = 'Tag'
GROUP BY traverse.entity_id
ORDER BY traverse.entity_id
LIMIT 10;

Now, the keen minded among you might be thinking this is gross over complication just to get the tags for some stories and you would be right. The power of the graph comes into play when you add more types and walk the graph.

This setup is perfect when you need to model relationships like:

A story with multiple tags.
A system where categories can be nested, and stories belong to multiple categories.

By using the recursive CTE, we can explore relationships between entities that aren’t just linear or hierarchical. You can use this kind of structure to model more complex networks of data, like social connections, product recommendations, or content tagging systems.

Wrapping up

While PostgreSQL isn’t a what most would consider a graph database, the method presented in the post show how you can still model graph-like data in a relational database using recursive queries. However, if your graph becomes super complex or deep (like a social network with millions of connections), performance could start to suffer. But you probably are not Facebook or Google and this approach will work just fine—and you get to stick with PostgreSQL, keeping added complexity at bay and work with a tool which you already know and love.