Graph DB with Postgres
2024-07-10
The Big Idea
In this post we will leverage the power of PostgreSQL Common Table Expression to create a makeshift graph database. The goal is to represent nodes (things like stories, tags, etc.) and edges (connections between those things) using relational tables. This allows us to query and traverse the data as if we were working with a ‘real’ graph database.
- entity_map: This table is like our directory of entity types. It maps different types of entities (e.g., stories, tags) to their respective tables. It tells us where to find the actual data for each type of entity.
- nodes: Think of nodes as the individual items in your graph. For example, each story or tag is a node. Each node gets a unique ID, and it references an entry in the entity_map to tell us what type of entity it is (story, tag, etc.).
- edges: These are the connections between nodes. For example, if a story is tagged with a particular tag, there’s an edge between the “story” node and the “tag” node. The
parent_node_id
andchild_node_id
represent that connection.
Example table schema
CREATE TABLE stories (
id uuid default uuid_generate_v4() PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL
);
CREATE TABLE tags (
id uuid default uuid_generate_v4() PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
-- entity types map to tables
-- will have tables for
-- stories & tags etc
CREATE TABLE entity_map (
-- singular entity type e.g. Story
entity_type TEXT NOT NULL,
-- table that holds these entities e.g. 'stories'
entity_table TEXT NOT NULL,
PRIMARY KEY (entity_type)
);
-- Nodes of our graph.
-- A row represents a node which represents a single entity
-- via a link through the entity_type and entity_id.
CREATE TABLE nodes (
id uuid default uuid_generate_v4() NOT NULL,
entity_type TEXT NOT NULL constraint nodes_entity_map_entity_type_fk references entity_map,
entity_id uuid NOT NULL,
PRIMARY KEY (id)
);
CREATE UNIQUE INDEX nodes_entity_type_entity_id_uindex ON nodes(entity_type, entity_id);
-- Edges of a graph. Also known as vertex in graph theory
-- It represents the connections between nodes in a graph
CREATE table edges (
parent_node_id uuid NOT NULL,
child_node_id uuid NOT NULL,
PRIMARY KEY (parent_node_id, child_node_id)
);
CREATE INDEX edges_child_node_id_parent_node_id_uindex ON edges(child_node_id, parent_node_id);
Querying with Recursive CTEs
We will use PostgreSQL’s Common Table Expressions (CTEs) to perform a recursive search through the graph. Here’s a quick breakdown of how this works:
- First, we grab the node that represents a specific
Story
(by its UUID). - Then, the UNION ALL part recursively joins nodes through the
edges
table. This means it follows the connections between nodes, expanding the search. - Finally, we filter down to just the nodes that represent
Tag
entities connected to theStory
.
Example to find all Tag
entities for a Story
-- find all `tags` connected to a `story`
-- by traversing the graph using a recursive CTE
WITH RECURSIVE traverse(id, entity_type, entity_id) AS (
-- This initial part (before the UNION) runs once
SELECT
id,
entity_type,
entity_id
FROM
nodes
WHERE
nodes.entity_type = 'Story' AND
nodes.entity_id = $1 -- the story uuid
UNION ALL
-- This part does the recursion
SELECT
nodes.id,
nodes.entity_type,
nodes.entity_id
-- with each 'iteration' traverse will have new data
FROM traverse
JOIN
edges ON traverse.id = edges.child_node_id
JOIN
nodes ON edges.parent_node_id = nodes.id
)
SELECT
traverse.entity_id as stor_id
FROM traverse
WHERE traverse.entity_type = 'Tag'
GROUP BY traverse.entity_id
ORDER BY traverse.entity_id
LIMIT 10;
Now, the keen minded among you might be thinking this is gross over complication just to get the tags for some stories and you would be right. The power of the graph comes into play when you add more types and walk the graph.
This setup is perfect when you need to model relationships like:
- A story with multiple tags.
- A system where categories can be nested, and stories belong to multiple categories.
By using the recursive CTE, we can explore relationships between entities that aren’t just linear or hierarchical. You can use this kind of structure to model more complex networks of data, like social connections, product recommendations, or content tagging systems.
Wrapping up
While PostgreSQL isn’t a what most would consider a graph database, the method presented in the post show how you can still model graph-like data in a relational database using recursive queries. However, if your graph becomes super complex or deep (like a social network with millions of connections), performance could start to suffer. But you probably are not Facebook or Google and this approach will work just fine—and you get to stick with PostgreSQL, keeping added complexity at bay and work with a tool which you already know and love.