Database Schema
DupeTrace uses PostgreSQL for persistent storage. Let's break down the database structure, indexing strategy, and query patterns.
Why PostgreSQL?
Speed. Scalability. Reliability.
Handles millions of rows without breaking a sweat
Superior indexing and query optimization
JSON support for future expansion
Widely available (even free hosting options exist)
SQLite would choke on large transfer logs. MySQL would work but... why settle for second best?
Database Tables
dupetrace_items
dupetrace_itemsStores every unique item UUID that's been seen.
CREATE TABLE IF NOT EXISTS dupetrace_items (
id UUID PRIMARY KEY,
first_seen TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
)Columns
id
UUID
Unique item identifier (primary key)
first_seen
TIMESTAMP
When this UUID was first registered
Sample Data
Growth Rate
Slow. This table only grows when a new unique item is seen for the first time.
Small server (10-50 players): ~100-1,000 rows per day
Large server (200+ players): ~1,000-10,000 rows per day
After a few months, growth stabilizes (most items have already been seen).
dupetrace_item_transfers
dupetrace_item_transfersLogs every interaction with tracked items. This is where the magic (and most of the data) lives.
Columns
id
BIGSERIAL
Auto-incrementing primary key
item_uuid
UUID
Item identifier (foreign reference)
player_uuid
UUID
Player who interacted with the item
action
VARCHAR(64)
Type of action (PICKUP, CRAFTED, etc.)
location
TEXT
World coordinates (format: world:x,y,z)
ts
TIMESTAMP
When the interaction occurred
Sample Data
Growth Rate
Fast. This is your high-traffic table.
Small server: ~10,000-50,000 rows per day
Medium server: ~50,000-200,000 rows per day
Large server: ~200,000-1,000,000+ rows per day
Storage Planning:
Each row is roughly 150-200 bytes
1 million rows ≈ 150-200 MB
Indexes add another 50-100 MB
Plan for 1-2 GB per month on active servers.
Indexes
Indexes make queries fast. Without them, PostgreSQL would scan millions of rows for every query (slow!).
idx_transfers_item_uuid
idx_transfers_item_uuidPurpose: Speed up queries that filter by item_uuid (like /dupetest uuid)
Query Example:
idx_transfers_player_uuid
idx_transfers_player_uuidPurpose: Speed up queries that filter by player_uuid (like /dupetest search)
Query Example:
idx_transfers_ts
idx_transfers_tsPurpose: Speed up time-based queries and sorting by most recent transfers
Query Example:
idx_transfers_item_player (Composite Index)
idx_transfers_item_player (Composite Index)Purpose: Optimize queries that filter by both item_uuid AND player_uuid (used by auto-removal logic)
Query Example:
Why Composite? PostgreSQL can use a single composite index for queries that filter on:
item_uuidonlyitem_uuid+player_uuiditem_uuid+player_uuid+ sorted byts
This reduces the number of indexes needed!
Common Queries
1. Check if an Item Exists
Used By: /dupetest uuid, /dupetest history
Index Used: Primary key on id (automatic)
2. Get Item Transfer History
Used By: /dupetest uuid, /dupetest history
Index Used: idx_transfers_item_uuid
3. Get Player's Items
Used By: /dupetest search
Index Used: idx_transfers_player_uuid
4. Get Earliest Transfer Timestamp
Used By: Auto-removal logic (determining who had the item first)
Index Used: idx_transfers_item_player
5. Get Database Statistics
Used By: /dupetest stats
Note: These are slow on large datasets (full table scans). Consider caching results if running frequently.
Data Retention Strategies
As your database grows, you might want to prune old records to save storage space.
Strategy 1: Delete Old Transfers
Keep items table forever, but delete transfers older than X days:
Pros:
Reduces database size
Keeps recent history for investigations
Cons:
Loses historical forensic data
Can't track long-term item movements
Strategy 2: Archive to Cold Storage
Move old records to a separate archive table or export to CSV:
Pros:
Keeps all data (just elsewhere)
Main table stays fast
Cons:
More complex to query archived data
Strategy 3: Partition by Date
Use PostgreSQL table partitioning to split data into monthly tables:
Pros:
Automatic data organization
Can drop old partitions easily
Queries on recent data stay fast
Cons:
Requires PostgreSQL 10+
More complex setup
Database Maintenance
Vacuum Regularly
PostgreSQL needs periodic vacuuming to reclaim space from deleted rows:
Run this:
After bulk deletes
Weekly on high-traffic tables
Monthly on low-traffic tables
Reindex Occasionally
Rebuild indexes to optimize performance:
Run this:
After major data changes
If queries become slow
Monthly on active servers
Monitoring & Optimization
Check Table Sizes
Check Index Usage
If an index has 0 scans, it's not being used and can be dropped.
Slow Query Analysis
Enable slow query logging in postgresql.conf:
Review logs to find inefficient queries.
Backup Strategies
Full Backup (pg_dump)
Incremental Backup
Use PostgreSQL's WAL (Write-Ahead Logging) for point-in-time recovery:
Automated Backups
Set up a cron job:
Runs daily at 2 AM, keeps compressed backups.
Scaling Considerations
When Database Gets Large (100M+ rows)
Partition tables by date
Archive old data to cold storage
Use read replicas for
/dupetestqueriesIncrease connection pool size in config
Upgrade to better hardware (more RAM = faster queries)
When Database Gets REALLY Large (1B+ rows)
Consider switching to a time-series database (TimescaleDB)
Implement data retention policies (auto-delete old records)
Use materialized views for statistics queries
Split into multiple databases by world or date range
Connection Pool Tuning
Config Setting: database.postgres.pool-size
Default: 10 connections
Tuning Guide:
Small (< 50 players)
5-10
Medium (50-200 players)
10-20
Large (200-500 players)
20-30
Mega (500+ players)
30-50
Formula:
Don't go crazy: More connections ≠ better performance. PostgreSQL has overhead per connection. Past 50 connections, you'll see diminishing returns.
Security Best Practices
Use a Dedicated Database User
Don't use the postgres superuser! Create a limited user:
Restrict Network Access
Edit pg_hba.conf to limit connections:
Use SSL/TLS
For remote databases, enforce encrypted connections:
What's Next?
Dive into how events are tracked:
Event System → – Complete guide to event monitoring
Want to contribute? Check out the Architecture Overview → first!
Last updated