Sulaimon Ekundayo Portfolio

This portfolio presents my data engineering projects, demonstrating my ability to design scalable data pipelines,
manage data infrastructure, and deliver data-driven solutions.

From Raw CSVs to a Hardened Production Database: MetroBank Case Study

Designed and delivered a production-grade PostgreSQL banking database covering the full data engineering lifecycle: schema design, data ingestion, security, performance, and analytics. Built a 7-table normalised schema — branches, customers, employees, accounts, transactions, loans, and an account_balance_audit_log — using a parent-first migration strategy to enforce FK integrity across all tables. Implemented a staging-to-production CSV loader that reads raw files into temporary all-text tables, cleanses and normalises the data (phone formatting, email validation with fallback generation, type standardisation), then merges into production via upsert — atomically, inside a single transaction with automatic rollback on failure. Hardened the schema with regex CHECK constraints on emails and UK phone numbers, non-negative balance guards, audit metadata (created_at, updated_at, created_by) on every table, and PL/pgSQL trigger functions that automatically maintain updated_at and write an immutable audit record on every account balance change. Configured role-based access control (app_user for CRUD, readonly_analyst for SELECT-only), added B-tree indexes on high-frequency join and filter columns, and wrote 10+ business analytics queries covering customer geography, portfolio breakdown, loan exposure, transaction trends, and branch performance.

Customer records arrive with validated email addresses and correctly formatted UK phone numbers (+44). Account balances can never go negative. Foreign key constraints mean no transaction or loan can reference a non-existent account or customer. The bank's data is reliable from the moment it enters the system — not after a manual cleanup cycle.

Every account balance change — regardless of which application or user triggered it — is automatically captured in account_balance_audit_log with the old value, new value, the amount of change, a timestamp, and the responsible database user. This satisfies a core requirement of financial audit frameworks (e.g. FCA, SOX) without relying on application developers to implement it correctly.

Bonga CommerceFlow: Production-Ready Ecommerce Data Pipeline

Built a production-ready data engineering pipeline for Bonga Ecommerce that ingests CSV datasets into PostgreSQL using Docker, SQL, and automation scripts. The project includes clean folder organization, ERD-backed schema design, idempotent load scripts, validation queries, local dev and production-like environment profiles, and CI/CD automation with GitHub Actions.

Results:

  • 100 realistic rows loaded for each core table (products, customers, orders, orderitems).
  • Repeatable pipeline execution with schema creation, data load, and integrity validation.
  • Documented query questions and executable SQL solutions for analytics tasks.
  • Secure setup with environment separation and GitHub Secrets for CI credentials.
  • Client Benefits:

  • Faster onboarding for new engineers through a simple, documented workflow.
  • Reliable and consistent data loading process with fewer manual errors.
  • Better decision-making readiness through structured, queryable ecommerce data.
  • Safer operations through secret management and data exposure policy controls.
  • Bonga CommerceFlow: Secure Cloud-Integrated Ecommerce Data Pipeline

    Project Overview:
    Bonga CommerceFlow is a production-ready ecommerce data engineering project that automates the ingestion of CSV datasets into PostgreSQL using Docker, SQL scripts, Bash automation, and GitHub Actions. The pipeline was designed to be repeatable, structured, and secure, with clear separation between development and production-style workflows.

    A key upgrade in the project is the introduction of Amazon S3 as the source for private production datasets. Instead of depending only on local files, the pipeline can now fetch protected CSV files from an S3 bucket before loading them into the database. This makes the workflow more realistic, more secure, and better aligned with real-world data engineering practices.

    I designed and implemented a complete data pipeline that:

  • Creates the PostgreSQL schema with the right table relationships and constraints.
  • Loads ecommerce datasets for products, customers, orders, and orderitems.
  • Validates row counts and referential integrity after every run.
  • Supports repeatable local execution with Docker and environment-specific profiles.
  • Automates pipeline execution in GitHub Actions.
  • Integrates Amazon S3 for secure private-data ingestion in CI/CD and local runs.
  • Falls back to demo data in `data/raw/` when S3 data is unavailable.
  • Result:
    The project delivers a reliable and reusable pipeline that successfully loads structured ecommerce data into PostgreSQL and verifies data quality at each run. It supports secure secret handling, cloud-backed file ingestion, and a documented workflow that is easy to run, test, and maintain.

    Client Benefit:
    This project gives the client a cleaner and more production-aligned data workflow. It reduces manual data loading effort, improves consistency across environments, protects sensitive datasets by moving them to S3, and makes the pipeline easier to operate in both local development and CI/CD. The result is faster onboarding, safer data handling, and a stronger foundation for reporting, analytics, and future data platform growth.

    Customer Analysis in Airline Business

    Conducted a data-driven analysis of airline customers to understand travel behavior, preferences, and pain points. The analysis focused on booking patterns, customer segmentation, and service feedback to identify key factors influencing customer satisfaction and retention.

    Using customer data, distinct segments were identified based on travel frequency, class preference, and purchasing behavior. Insights revealed that delays, pricing transparency, and personalized offers were the main drivers of customer experience.

    Based on these findings, recommendations included improving communication during delays, introducing targeted promotions, and enhancing loyalty programs for frequent travelers.

    As a result, the airline achieved improved customer satisfaction, increased repeat bookings, and more effective marketing campaigns tailored to specific customer segments.

    Customer Analysis & Business Impact

    To better understand customer behavior in the culinary business, data was collected through sales records, customer feedback, and purchase patterns. The analysis focused on identifying popular menu items, peak ordering times, and customer preferences.

    Based on the findings, key recommendations were made, including optimizing the menu by promoting high-demand items, adjusting pricing strategies, and improving service during peak hours. Additionally, targeted promotions were introduced to increase customer engagement.

    As a result, the business experienced improved customer satisfaction, increased repeat orders, and a noticeable growth in overall sales. The data-driven approach enabled more informed decision-making and enhanced operational efficiency.