databaseDatabase Schema

DupeTrace uses PostgreSQL for persistent storage. Let's break down the database structure, indexing strategy, and query patterns.


Why PostgreSQL?

Speed. Scalability. Reliability.

  • Handles millions of rows without breaking a sweat

  • Superior indexing and query optimization

  • JSON support for future expansion

  • Widely available (even free hosting options exist)

SQLite would choke on large transfer logs. MySQL would work but... why settle for second best?


Database Tables

dupetrace_items

Stores every unique item UUID that's been seen.

CREATE TABLE IF NOT EXISTS dupetrace_items (
    id UUID PRIMARY KEY,
    first_seen TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
)

Columns

Column
Type
Description

id

UUID

Unique item identifier (primary key)

first_seen

TIMESTAMP

When this UUID was first registered

Sample Data

Growth Rate

Slow. This table only grows when a new unique item is seen for the first time.

  • Small server (10-50 players): ~100-1,000 rows per day

  • Large server (200+ players): ~1,000-10,000 rows per day

After a few months, growth stabilizes (most items have already been seen).


dupetrace_item_transfers

Logs every interaction with tracked items. This is where the magic (and most of the data) lives.

Columns

Column
Type
Description

id

BIGSERIAL

Auto-incrementing primary key

item_uuid

UUID

Item identifier (foreign reference)

player_uuid

UUID

Player who interacted with the item

action

VARCHAR(64)

Type of action (PICKUP, CRAFTED, etc.)

location

TEXT

World coordinates (format: world:x,y,z)

ts

TIMESTAMP

When the interaction occurred

Sample Data

Growth Rate

Fast. This is your high-traffic table.

  • Small server: ~10,000-50,000 rows per day

  • Medium server: ~50,000-200,000 rows per day

  • Large server: ~200,000-1,000,000+ rows per day

Storage Planning:

  • Each row is roughly 150-200 bytes

  • 1 million rows ≈ 150-200 MB

  • Indexes add another 50-100 MB

Plan for 1-2 GB per month on active servers.


Indexes

Indexes make queries fast. Without them, PostgreSQL would scan millions of rows for every query (slow!).

idx_transfers_item_uuid

Purpose: Speed up queries that filter by item_uuid (like /dupetest uuid)

Query Example:


idx_transfers_player_uuid

Purpose: Speed up queries that filter by player_uuid (like /dupetest search)

Query Example:


idx_transfers_ts

Purpose: Speed up time-based queries and sorting by most recent transfers

Query Example:


idx_transfers_item_player (Composite Index)

Purpose: Optimize queries that filter by both item_uuid AND player_uuid (used by auto-removal logic)

Query Example:

Why Composite? PostgreSQL can use a single composite index for queries that filter on:

  • item_uuid only

  • item_uuid + player_uuid

  • item_uuid + player_uuid + sorted by ts

This reduces the number of indexes needed!


Common Queries

1. Check if an Item Exists

Used By: /dupetest uuid, /dupetest history

Index Used: Primary key on id (automatic)


2. Get Item Transfer History

Used By: /dupetest uuid, /dupetest history

Index Used: idx_transfers_item_uuid


3. Get Player's Items

Used By: /dupetest search

Index Used: idx_transfers_player_uuid


4. Get Earliest Transfer Timestamp

Used By: Auto-removal logic (determining who had the item first)

Index Used: idx_transfers_item_player


5. Get Database Statistics

Used By: /dupetest stats

Note: These are slow on large datasets (full table scans). Consider caching results if running frequently.


Data Retention Strategies

As your database grows, you might want to prune old records to save storage space.

Strategy 1: Delete Old Transfers

Keep items table forever, but delete transfers older than X days:

Pros:

  • Reduces database size

  • Keeps recent history for investigations

Cons:

  • Loses historical forensic data

  • Can't track long-term item movements

Strategy 2: Archive to Cold Storage

Move old records to a separate archive table or export to CSV:

Pros:

  • Keeps all data (just elsewhere)

  • Main table stays fast

Cons:

  • More complex to query archived data

Strategy 3: Partition by Date

Use PostgreSQL table partitioning to split data into monthly tables:

Pros:

  • Automatic data organization

  • Can drop old partitions easily

  • Queries on recent data stay fast

Cons:

  • Requires PostgreSQL 10+

  • More complex setup


Database Maintenance

Vacuum Regularly

PostgreSQL needs periodic vacuuming to reclaim space from deleted rows:

Run this:

  • After bulk deletes

  • Weekly on high-traffic tables

  • Monthly on low-traffic tables

Reindex Occasionally

Rebuild indexes to optimize performance:

Run this:

  • After major data changes

  • If queries become slow

  • Monthly on active servers


Monitoring & Optimization

Check Table Sizes

Check Index Usage

If an index has 0 scans, it's not being used and can be dropped.

Slow Query Analysis

Enable slow query logging in postgresql.conf:

Review logs to find inefficient queries.


Backup Strategies

Full Backup (pg_dump)

Incremental Backup

Use PostgreSQL's WAL (Write-Ahead Logging) for point-in-time recovery:

Automated Backups

Set up a cron job:

Runs daily at 2 AM, keeps compressed backups.


Scaling Considerations

When Database Gets Large (100M+ rows)

  1. Partition tables by date

  2. Archive old data to cold storage

  3. Use read replicas for /dupetest queries

  4. Increase connection pool size in config

  5. Upgrade to better hardware (more RAM = faster queries)

When Database Gets REALLY Large (1B+ rows)

  1. Consider switching to a time-series database (TimescaleDB)

  2. Implement data retention policies (auto-delete old records)

  3. Use materialized views for statistics queries

  4. Split into multiple databases by world or date range


Connection Pool Tuning

Config Setting: database.postgres.pool-size

Default: 10 connections

Tuning Guide:

Server Size
Recommended Pool Size

Small (< 50 players)

5-10

Medium (50-200 players)

10-20

Large (200-500 players)

20-30

Mega (500+ players)

30-50

Formula:

Don't go crazy: More connections ≠ better performance. PostgreSQL has overhead per connection. Past 50 connections, you'll see diminishing returns.


Security Best Practices

Use a Dedicated Database User

Don't use the postgres superuser! Create a limited user:

Restrict Network Access

Edit pg_hba.conf to limit connections:

Use SSL/TLS

For remote databases, enforce encrypted connections:


What's Next?

Dive into how events are tracked:

  • Event System → – Complete guide to event monitoring

Want to contribute? Check out the Architecture Overview → first!

Last updated