My Portfolio

From Raw CSVs to a Hardened Production Database: MetroBank Case Study

Designed and delivered a production-grade PostgreSQL banking database covering the full data engineering lifecycle: schema design, data ingestion, security, performance, and analytics. Built a 7-table normalised schema — branches, customers, employees, accounts, transactions, loans, and an account_balance_audit_log — using a parent-first migration strategy to enforce FK integrity across all tables. Implemented a staging-to-production CSV loader that reads raw files into temporary all-text tables, cleanses and normalises the data (phone formatting, email validation with fallback generation, type standardisation), then merges into production via upsert — atomically, inside a single transaction with automatic rollback on failure. Hardened the schema with regex CHECK constraints on emails and UK phone numbers, non-negative balance guards, audit metadata (created_at, updated_at, created_by) on every table, and PL/pgSQL trigger functions that automatically maintain updated_at and write an immutable audit record on every account balance change. Configured role-based access control (app_user for CRUD, readonly_analyst for SELECT-only), added B-tree indexes on high-frequency join and filter columns, and wrote 10+ business analytics queries covering customer geography, portfolio breakdown, loan exposure, transaction trends, and branch performance.

Customer records arrive with validated email addresses and correctly formatted UK phone numbers (+44). Account balances can never go negative. Foreign key constraints mean no transaction or loan can reference a non-existent account or customer. The bank's data is reliable from the moment it enters the system — not after a manual cleanup cycle.

Every account balance change — regardless of which application or user triggered it — is automatically captured in account_balance_audit_log with the old value, new value, the amount of change, a timestamp, and the responsible database user. This satisfies a core requirement of financial audit frameworks (e.g. FCA, SOX) without relying on application developers to implement it correctly.

Bonga CommerceFlow: Production-Ready Ecommerce Data Pipeline

Built a production-ready data engineering pipeline for Bonga Ecommerce that ingests CSV datasets into PostgreSQL using Docker, SQL, and automation scripts. The project includes clean folder organization, ERD-backed schema design, idempotent load scripts, validation queries, local dev and production-like environment profiles, and CI/CD automation with GitHub Actions.

Results:

100 realistic rows loaded for each core table (products, customers, orders, orderitems).

Repeatable pipeline execution with schema creation, data load, and integrity validation.

Documented query questions and executable SQL solutions for analytics tasks.

Secure setup with environment separation and GitHub Secrets for CI credentials.

Client Benefits:

Faster onboarding for new engineers through a simple, documented workflow.

Reliable and consistent data loading process with fewer manual errors.

Better decision-making readiness through structured, queryable ecommerce data.

Safer operations through secret management and data exposure policy controls.

Bonga CommerceFlow: Secure Cloud-Integrated Ecommerce Data Pipeline

Project Overview:
Bonga CommerceFlow is a production-ready ecommerce data engineering project that automates the ingestion of CSV datasets into PostgreSQL using Docker, SQL scripts, Bash automation, and GitHub Actions. The pipeline was designed to be repeatable, structured, and secure, with clear separation between development and production-style workflows.

A key upgrade in the project is the introduction of Amazon S3 as the source for private production datasets. Instead of depending only on local files, the pipeline can now fetch protected CSV files from an S3 bucket before loading them into the database. This makes the workflow more realistic, more secure, and better aligned with real-world data engineering practices.

I designed and implemented a complete data pipeline that:

Creates the PostgreSQL schema with the right table relationships and constraints.

Loads ecommerce datasets for products, customers, orders, and orderitems.

Validates row counts and referential integrity after every run.

Supports repeatable local execution with Docker and environment-specific profiles.

Automates pipeline execution in GitHub Actions.

Integrates Amazon S3 for secure private-data ingestion in CI/CD and local runs.

Falls back to demo data in `data/raw/` when S3 data is unavailable.

Result:
The project delivers a reliable and reusable pipeline that successfully loads structured ecommerce data into PostgreSQL and verifies data quality at each run. It supports secure secret handling, cloud-backed file ingestion, and a documented workflow that is easy to run, test, and maintain.

Client Benefit:
This project gives the client a cleaner and more production-aligned data workflow. It reduces manual data loading effort, improves consistency across environments, protects sensitive datasets by moving them to S3, and makes the pipeline easier to operate in both local development and CI/CD. The result is faster onboarding, safer data handling, and a stronger foundation for reporting, analytics, and future data platform growth.

SwiftRide Logistics: Production-Ready PostgreSQL Data Platform

Designed and deployed a relational data platform for a logistics company, replacing fragmented operational records with a governed PostgreSQL system hosted on Supabase. The solution organizes customers, orders, deliveries, drivers, vehicles, warehouses, inventory, and payments across four business schemas while preserving end-to-end referential integrity.

Built a one-command Bash loader that initializes the schema, applies validation constraints, imports CSV files in foreign-key-safe order, repairs PostgreSQL sequences, and verifies row counts after ingestion. Added DDL, DML, DQL, and DCL layers, environment-based credential handling, cascading rules, and role-based access patterns for production-minded database operations.

Measured Results

3,960 records integrated across 8 relational tables and 4 schemas.
600 orders, deliveries, and payments analyzed across the operational lifecycle.
110.2M in total order value surfaced for revenue and customer-value analysis.
62 pending deliveries and 10 low-stock items identified for management action.

Converted SQL analysis into stakeholder-ready outputs. The reports translate customer value, driver workload, delivery performance, payment health, revenue, and inventory risk into clear operational recommendations.

Customer Analysis & Business Impact

To better understand customer behavior in the culinary business, data was collected through sales records, customer feedback, and purchase patterns. The analysis focused on identifying popular menu items, peak ordering times, and customer preferences.

Based on the findings, key recommendations were made, including optimizing the menu by promoting high-demand items, adjusting pricing strategies, and improving service during peak hours. Additionally, targeted promotions were introduced to increase customer engagement.

As a result, the business experienced improved customer satisfaction, increased repeat orders, and a noticeable growth in overall sales. The data-driven approach enabled more informed decision-making and enhanced operational efficiency.

Sulaimon Ekundayo Portfolio