Designed and delivered a production-grade PostgreSQL banking database covering the full data engineering
lifecycle: schema design, data ingestion, security, performance, and analytics. Built a 7-table normalised
schema — branches, customers, employees, accounts, transactions, loans, and an account_balance_audit_log —
using a parent-first migration strategy to enforce FK integrity across all tables. Implemented a
staging-to-production CSV loader that reads raw files into temporary all-text tables, cleanses and
normalises the data (phone formatting, email validation with fallback generation, type standardisation),
then merges into production via upsert — atomically, inside a single transaction with automatic rollback
on failure. Hardened the schema with regex CHECK constraints on emails and UK phone numbers, non-negative
balance guards, audit metadata (created_at, updated_at, created_by) on every table, and PL/pgSQL trigger
functions that automatically maintain updated_at and write an immutable audit record on every account
balance change. Configured role-based access control (app_user for CRUD, readonly_analyst for
SELECT-only), added B-tree indexes on high-frequency join and filter columns, and wrote 10+ business
analytics queries covering customer geography, portfolio breakdown, loan exposure, transaction trends, and
branch performance.
Customer records arrive with validated email addresses and correctly formatted UK phone numbers (+44).
Account balances can never go negative. Foreign key constraints mean no transaction or loan can reference
a non-existent account or customer. The bank's data is reliable from the moment it enters the system — not
after a manual cleanup cycle.
Every account balance change — regardless of which application or user triggered it — is automatically
captured in account_balance_audit_log with the old value, new value, the amount of
change, a timestamp, and the responsible database user. This satisfies a core requirement of financial
audit frameworks (e.g. FCA, SOX) without relying on application developers to implement it correctly.
Built a production-ready data engineering pipeline for Bonga Ecommerce that ingests CSV datasets into
PostgreSQL using Docker, SQL, and automation scripts. The project includes clean folder organization,
ERD-backed schema design, idempotent load scripts, validation queries, local dev and production-like
environment profiles, and CI/CD automation with GitHub Actions.
Results:
100 realistic rows loaded for each core table (products, customers, orders, orderitems).
Repeatable pipeline execution with schema creation, data load, and integrity validation.
Documented query questions and executable SQL solutions for analytics tasks.
Secure setup with environment separation and GitHub Secrets for CI credentials.
Client Benefits:
Faster onboarding for new engineers through a simple, documented workflow.
Reliable and consistent data loading process with fewer manual errors.
Better decision-making readiness through structured, queryable ecommerce data.
Safer operations through secret management and data exposure policy controls.
Project Overview:
Bonga CommerceFlow is a production-ready ecommerce data engineering project that automates the ingestion
of CSV datasets into PostgreSQL using Docker, SQL scripts, Bash automation, and GitHub Actions. The
pipeline was designed to be repeatable, structured, and secure, with clear separation between development
and production-style workflows.
A key upgrade in the project is the introduction of Amazon S3 as the source for private production
datasets. Instead of depending only on local files, the pipeline can now fetch protected CSV files from an
S3 bucket before loading them into the database. This makes the workflow more realistic, more secure, and
better aligned with real-world data engineering practices.
I designed and implemented a complete data pipeline that:
Creates the PostgreSQL schema with the right table relationships and constraints.
Loads ecommerce datasets for products, customers, orders, and orderitems.
Validates row counts and referential integrity after every run.
Supports repeatable local execution with Docker and environment-specific profiles.
Automates pipeline execution in GitHub Actions.
Integrates Amazon S3 for secure private-data ingestion in CI/CD and local runs.
Falls back to demo data in `data/raw/` when S3 data is unavailable.
Result:
The project delivers a reliable and reusable pipeline that successfully loads structured ecommerce data
into PostgreSQL and verifies data quality at each run. It supports secure secret handling, cloud-backed
file ingestion, and a documented workflow that is easy to run, test, and maintain.
Client Benefit:
This project gives the client a cleaner and more production-aligned data workflow. It reduces manual data
loading effort, improves consistency across environments, protects sensitive datasets by moving them to
S3, and makes the pipeline easier to operate in both local development and CI/CD. The result is faster
onboarding, safer data handling, and a stronger foundation for reporting, analytics, and future data
platform growth.
Designed and deployed a relational data platform for a logistics company, replacing fragmented operational
records with a governed PostgreSQL system hosted on Supabase. The solution organizes customers, orders,
deliveries, drivers, vehicles, warehouses, inventory, and payments across four business schemas while
preserving end-to-end referential integrity.
Built a one-command Bash loader that initializes the schema, applies validation constraints, imports CSV
files in foreign-key-safe order, repairs PostgreSQL sequences, and verifies row counts after ingestion.
Added DDL, DML, DQL, and DCL layers, environment-based credential handling, cascading rules, and
role-based access patterns for production-minded database operations.
Measured Results
- 3,960 records integrated across 8 relational tables and 4 schemas.
- 600 orders, deliveries, and payments analyzed across the operational lifecycle.
- 110.2M in total order value surfaced for revenue and customer-value analysis.
- 62 pending deliveries and 10 low-stock items identified for management action.
Converted SQL analysis into stakeholder-ready outputs. The reports translate customer value, driver workload, delivery performance,
payment health, revenue, and inventory risk into clear operational recommendations.
To better understand customer behavior in the culinary business, data was collected through sales records,
customer feedback, and purchase patterns. The analysis focused on identifying popular menu items, peak
ordering times, and customer preferences.
Based on the findings, key recommendations were made, including optimizing the menu by promoting
high-demand items, adjusting pricing strategies, and improving service during peak hours. Additionally,
targeted promotions were introduced to increase customer engagement.
As a result, the business experienced improved customer satisfaction, increased repeat orders, and a
noticeable growth in overall sales. The data-driven approach enabled more informed decision-making and
enhanced operational efficiency.