cohort mentorship program

Learn Cloud, DevOps
and Platform Engineering

An intensive 8-week hands-on program for absolute beginners. You will not just follow tutorials. You will think like an engineer, break things on purpose, and build infrastructure that actually runs.

8Weeks
$0Cost
100%Hands-on
LiveMentorship
29 JunStarts

Learning How to Think Like an Engineer

Four principles that separate engineers who understand what they are doing from those who just copy commands.

Why Before How

Before running any command, you will understand the problem it solves and the tradeoff it makes. Context first, syntax second.

Break Things on Purpose

Intentional failure is a learning technique. You will break containers, misconfigure IAM, and corrupt state to understand how systems recover.

Document Everything

Engineers who cannot explain their work cannot improve it. Every week you write a postmortem, a diagram, or a runbook alongside the code.

Real Infrastructure Only

No simulations, no sandboxed toy environments. You will deploy to actual cloud accounts and operate real workloads from day one.

The Full Curriculum

Eight focused weeks. Each one builds directly on the last. By the end you will have a running Kubernetes workload, an IaC repo, a CI/CD pipeline, and dashboards proving it all works.

WEEK 01 Linux, Networking and Git bash tcp/ip git

Why this week exists

Everything in cloud runs on Linux. Every deployment is triggered by a Git push. Every network call obeys TCP/IP. This week you build the mental models before they appear inside an abstraction layer.

  • Linux filesystem hierarchy: what lives in /etc, /var, /proc and why
  • Processes, signals, and the init system (systemd)
  • Networking primitives: IP addressing, CIDR, routing, DNS resolution, TCP handshake
  • SSH: key-based auth, agent forwarding, port tunnelling
  • Git internals: blobs, trees, commits, the reflog, detached HEAD
  • Branching strategy: trunk-based development vs GitFlow and when each fits

What you will build

Provision a raw Ubuntu VM on any free tier (GCP e2-micro, AWS t2.micro, or local Multipass). Do not use the console wizard for networking config.

  • Write a bash script that audits open ports and writes a report to /var/log/audit.txt on a schedule via cron
  • Diagnose a deliberately broken DNS config using only dig, ss, and traceroute
  • Create a Git repo, force a merge conflict, resolve it, squash the fix into one clean commit, and write a useful commit message
  • Set up SSH key auth and harden /etc/ssh/sshd_config to refuse password login
Week Deliverable

GitHub repo with your audit script, a network troubleshooting runbook in Markdown, and a post-mortem on the one thing that broke during the lab.

Decisions you will encounter in the real world

  • Bash vs Python for automation scripts: bash is everywhere but brittle at scale; Python is portable but adds a runtime dependency
  • SSH keys vs certificates (HashiCorp Vault SSH, AWS SSM): keys do not expire, certificates carry metadata and rotate automatically
  • Merge commits vs squash vs rebase: merge preserves history, squash keeps main clean, rebase rewrites history and can break shared branches
  • Monorepo vs polyrepo: monorepo simplifies atomic changes across services but CI scales poorly without tooling like Turborepo or Nx
WEEK 02 Cloud Fundamentals and IAM Security gcp iam least-privilege

The cloud mental model

Cloud is not a magic computer farm. It is an API over hardware with a billing model. Understanding the shared responsibility model tells you exactly where your obligations start.

  • IaaS vs PaaS vs SaaS and where your code lives on that spectrum
  • Regions, zones, and why you spread workloads across them
  • IAM: principals (users, service accounts, groups), resources, roles, and the policy evaluation logic
  • Principle of least privilege: why roles/owner on a service account is a security incident waiting to happen
  • Service account keys vs Workload Identity Federation: the key is a credential that lives on disk; Workload Identity is ephemeral
  • Audit logging: what Cloud Audit Logs record and why you turn on Data Access logs from day one

What you will build

Set up a GCP project from scratch using only the CLI. Zero console clicks for resource creation.

  • Create a project, enable billing alerts at $5 and $10, and enable only the APIs you need
  • Create a service account with the minimum roles to write to a GCS bucket and nothing else. Verify it cannot read IAM policies.
  • Enable Data Access audit logs and trigger a denied action. Find it in Cloud Logging within 2 minutes.
  • Simulate a credential leak: put a fake key in a public GitHub repo (a test-only repo), watch Secret Scanner flag it, rotate immediately
Week Deliverable

A written IAM policy document for a hypothetical three-tier web application listing every principal, what role they hold, and why no broader role was appropriate.

Decisions you will encounter in the real world

  • Predefined roles vs custom roles: predefined are maintained by Google, custom give you precision but require you to track permission changes yourself
  • Per-project service accounts vs shared service accounts: shared simplifies management, per-project contains blast radius
  • Organization policy constraints vs IAM: org policies are guardrails you cannot override with IAM; use them for non-negotiable controls
  • Billing alerts vs budget caps: alerts notify, caps actually stop spending but can take down production if limits are set too low
WEEK 03 Compute, Storage and Cloud Networking vpc gce gcs

Where your code actually runs

Before you containerise anything you need to understand the machine underneath. This week you learn how virtual machines, object storage, and virtual networks combine into a working application environment.

  • VPC architecture: subnets, routes, firewall rules, NAT, Private Google Access
  • Compute Engine: machine families, persistent disks, startup scripts, preemptible vs standard
  • Cloud Storage: buckets, object lifecycle policies, signed URLs, storage classes (Standard, Nearline, Coldline)
  • Load balancing concepts: L4 vs L7, health checks, backend services
  • Cloud DNS and how internal DNS resolution differs from public
  • VPC peering vs Shared VPC: when each model applies in an org context

What you will build

Deploy a two-tier application: a backend VM in a private subnet with no public IP, fronted by an HTTP load balancer. All network config written as gcloud commands you can repeat.

  • Create a custom VPC with two subnets (public and private) in separate zones
  • Deploy a backend VM in the private subnet, configure Cloud NAT for outbound internet
  • Serve a static site from GCS with a custom domain and HTTPS via a managed certificate
  • Set up a lifecycle rule to move objects older than 30 days to Nearline and delete after 90
Week Deliverable

A network architecture diagram (draw.io or Excalidraw) showing every subnet, firewall rule, and traffic path. Annotate each decision with one sentence explaining why.

Decisions you will encounter in the real world

  • VM vs Cloud Run vs GKE for a stateless service: VMs give control, Cloud Run eliminates operations, GKE gives you the full platform surface
  • Standard vs auto-mode VPC: auto-mode is fast to start with but you cannot control CIDR ranges, which matters for VPC peering
  • External LB vs internal LB: external terminates TLS, internal is cheaper but only reachable inside the VPC
  • Object storage vs block storage vs file storage: each has a different access pattern and cost model; choosing wrong costs real money
WEEK 04 Docker and Container Security docker oci artifact-registry

Containers are not virtual machines

A container shares the host kernel. Understanding that single fact explains every security property, every limitation, and every escape vector containers have.

  • Linux namespaces and cgroups: the primitives Docker wraps
  • Image layers, union filesystem, and why layer order matters for cache and size
  • Dockerfile best practices: multi-stage builds, non-root users, minimal base images
  • Container security: read-only filesystems, dropped capabilities, seccomp profiles
  • Artifact Registry: pushing, pulling, image vulnerability scanning with Container Analysis
  • Container runtime threat model: what an attacker can and cannot do inside a container

What you will build

Containerise a small web application with deliberate security mistakes, then fix every one.

  • Write a Dockerfile that runs as root. Measure the image size. Then rewrite it: multi-stage build, non-root user, minimal base. Compare.
  • Push to GCP Artifact Registry. Enable vulnerability scanning. Fix the first CVE it reports.
  • Run the container with --read-only --cap-drop ALL --security-opt no-new-privileges. Debug why it crashes and fix the app, not the flags.
  • Deploy the container to Cloud Run and confirm it is reachable over HTTPS
Week Deliverable

Two Dockerfiles (before and after), a written security audit listing each original vulnerability and how it was addressed, and the live Cloud Run URL.

Decisions you will encounter in the real world

  • Distroless vs Alpine vs Ubuntu base images: distroless has the smallest attack surface but no shell for debugging; Alpine is small and has a shell; Ubuntu is familiar but large
  • One process per container vs multiple: one process makes health checks precise and restarts fast; multiple can simplify sidecar patterns at the cost of lifecycle coupling
  • Building in CI vs building locally: local builds are fast to iterate, CI builds are reproducible and auditable. Both are necessary.
  • Image tagging strategy: latest is ambiguous in production. Always tag by git SHA or semantic version.
WEEK 05 Infrastructure as Code with Terraform terraform opentofu state

Infrastructure that explains itself

Clicking through the console does not scale, does not survive staff turnover, and does not pass a security audit. IaC is the practice of treating infrastructure with the same engineering discipline as application code.

  • Declarative vs imperative: you declare the desired state, Terraform figures out the diff
  • Terraform core workflow: init, plan, apply, destroy
  • State: what it is, why it must be stored remotely, and what happens when it drifts
  • Modules: how to encapsulate reusable infrastructure patterns
  • Variables, outputs, and data sources
  • Locking state with GCS backend: preventing concurrent apply collisions

What you will build

Recreate everything you built in Weeks 3 and 4 using Terraform. Delete the manually-created resources first. Your Terraform code is the only source of truth.

  • Write a Terraform module for VPC + subnets that accepts region and CIDR as variables
  • Store state in a GCS bucket with versioning enabled. Verify you can roll back state after an accidental terraform apply.
  • Run terraform plan before every apply in a CI-like loop and review the diff
  • Deliberately cause state drift by deleting a resource in the console. Use terraform refresh and document what happened.
Week Deliverable

A Terraform repo in GitHub with a VPC module, a Cloud Run module, and a README explaining how to deploy the full stack from scratch in one terraform apply.

Decisions you will encounter in the real world

  • Terraform vs Pulumi vs CDK: Terraform is the lingua franca; Pulumi and CDK let you use real programming languages but have smaller communities and less third-party module coverage
  • Monolithic root module vs small modules: large modules are simpler early on but become dangerous to apply as the plan grows; split early
  • Remote state locking: GCS provides object-level locking which is good enough; Terraform Cloud provides locking with a UI and team access controls
  • When to import existing resources: importing is the right answer when you cannot afford downtime to recreate; it adds complexity and should be cleaned up afterward
WEEK 06 CI/CD Pipelines and Automation github-actions cloud-build gitops

Every merge is a deployment decision

A CI/CD pipeline is not a deployment script. It is the automated enforcement of your quality and security policy. Every step is a gate that protects production from humans.

  • CI vs CD vs CD: continuous integration, continuous delivery, and continuous deployment and what distinguishes each
  • GitHub Actions: workflow syntax, triggers, jobs, steps, contexts, and secrets
  • Workload Identity Federation: why you do not store GCP service account keys as GitHub secrets
  • Pipeline stages: lint, test, build, scan, deploy, smoke test
  • GitOps: the repo as the single source of truth for cluster state
  • Rollback strategies: redeploy previous tag vs feature flags vs canary deployments

What you will build

Wire your Week 4 containerised app to a full GitHub Actions pipeline that deploys to Cloud Run on every push to main.

  • Configure Workload Identity Federation so the pipeline authenticates to GCP without any long-lived key
  • Add a job that runs container vulnerability scanning and fails the pipeline on CRITICAL severity CVEs
  • Add a manual approval step before production deployment using GitHub Environments
  • Simulate a bad deploy: push a broken image and practice rolling back to the previous revision in Cloud Run
Week Deliverable

A working pipeline with at least 4 stages, the Workload Identity Federation config documented, and a written incident report on the rollback exercise.

Decisions you will encounter in the real world

  • GitHub Actions vs Cloud Build vs Tekton: Actions is easy to start with and has the largest marketplace; Cloud Build is tightly integrated with GCP; Tekton runs in-cluster and is complex but portable
  • Environment-per-branch vs environment-per-PR: per-branch is simpler to manage; per-PR is more isolated but multiplies infrastructure cost
  • Fail open vs fail closed on security scans: fail closed blocks deployments on new CVEs, including CVEs in images you did not change. Plan for the false-positive rate.
  • Blue-green vs canary vs rolling: blue-green is safest to roll back; canary catches issues with a small blast radius; rolling is the default and has no rollback if state changes are involved
WEEK 07 Kubernetes Fundamentals gke kubectl rbac

The platform under the platform

Kubernetes is a container orchestrator but it is more useful to think of it as a declarative API for distributed systems. The control loop concept, where the system perpetually reconciles desired state with actual state, is the idea that everything else builds on.

  • Cluster architecture: control plane (API server, etcd, scheduler, controller-manager) vs worker nodes
  • Core objects: Pod, Deployment, Service, ConfigMap, Secret, Namespace, PersistentVolumeClaim
  • Scheduling: node selectors, affinity, taints and tolerations
  • Networking: ClusterIP vs NodePort vs LoadBalancer, Ingress, kube-dns
  • RBAC: Roles, ClusterRoles, RoleBindings, the relation to GKE Workload Identity
  • Resource requests and limits: what happens when a container exceeds memory vs CPU

What you will build

Deploy your containerised application from Week 4 onto GKE Autopilot. Operate it: scale it, break it, and recover it.

  • Write Deployment and Service manifests. Deploy via kubectl apply. Verify with kubectl rollout status.
  • Configure horizontal pod autoscaler. Load-test with hey or k6 and watch pods scale. Watch them scale back down.
  • Deliberately kill all pods. Observe the ReplicaSet recreate them. Record how long recovery takes.
  • Set up RBAC so a read-only service account can describe pods but cannot exec into them
Week Deliverable

All Kubernetes manifests in a dedicated k8s/ directory in your repo, a load test report showing autoscaler behaviour, and a written explanation of each RBAC binding and why it grants exactly that scope.

Decisions you will encounter in the real world

  • Autopilot vs Standard GKE: Autopilot manages nodes for you and is cheaper for intermittent workloads; Standard gives control over node pools, machine types, and GPU access
  • Helm vs raw manifests vs Kustomize: raw manifests are easiest to understand; Helm packages reusable charts but templating logic gets complex; Kustomize overlays without templating
  • Ingress vs Gateway API: Ingress is stable and understood; Gateway API is the successor and handles more routing patterns but tooling support varies
  • Namespaces for isolation: namespaces are soft boundaries, not hard security boundaries. Multi-tenant workloads with different trust levels need separate clusters.
WEEK 08 Observability, SRE and Capstone prometheus cloud-monitoring slo

You cannot improve what you cannot measure

Observability is not dashboards. It is the property of a system that lets you ask arbitrary questions about its internal state from external outputs. This week you wire up your full stack so nothing can fail silently.

  • The three pillars: metrics, logs, and traces and when each one answers a different class of question
  • SLI, SLO, and SLA: defining a service level indicator, writing a service level objective, and the error budget that follows from it
  • Cloud Monitoring: uptime checks, dashboards, alerting policies, notification channels
  • Cloud Logging: structured logs, log-based metrics, log sinks to BigQuery for analysis
  • Distributed tracing with Cloud Trace: how a trace spans multiple services
  • Incident management: the on-call rotation, incident commander role, post-mortem process

What you will build (Capstone)

Instrument the full stack you built across Weeks 1 to 7. Write SLOs. Break things intentionally and prove your alerting catches it before a user does.

  • Add structured JSON logging to your application. Create a log-based metric for error rate. Alert when error rate exceeds 1% over 5 minutes.
  • Write two SLOs: a latency SLO (95% of requests under 500ms) and an availability SLO (99.5% uptime). Configure error budget burn rate alerts.
  • Conduct a chaos experiment: terminate pods at random using a script. Confirm your dashboards show the event. Write a post-mortem with timeline, root cause, and action items.
  • Add your CI/CD pipeline deployment events as annotations on your dashboards. Correlate a past deployment with a latency spike.
Final Capstone Deliverable

A public GitHub portfolio repo containing: Terraform code, Kubernetes manifests, CI/CD pipeline, application code, monitoring dashboards (exported JSON), two defined SLOs, and a capstone post-mortem. This is your first production-grade portfolio project.

Decisions you will encounter in the real world

  • Cloud-native observability vs self-managed: Cloud Monitoring is zero-ops but costs money and locks you to GCP; Prometheus/Grafana/Loki is portable and customisable but you own the operations
  • Structured logs vs unstructured: structured (JSON) logs are queryable; unstructured logs require regex and are painful at scale. Default to structured from the start.
  • Alerting on symptoms vs causes: alert on slow latency and high error rate (symptoms your users feel), not on CPU and memory (causes you investigate after being paged)
  • SLO strictness: a 99.9% SLO gives you 43 minutes of allowed downtime per month. Every nine you add costs disproportionately in engineering and infrastructure spend.

Everything You Need Costs Nothing

Every resource listed here is free. No paywalls, no trial traps. If it costs money it is not on this list.

Linux and CLI

Command Line Foundations

  • The Linux Command Line (William Shotts, free PDF)
  • OverTheWire: Bandit (Linux wargame)
  • explainshell.com (any command, explained)
  • tldr.sh (practical examples for every tool)

// free-tier VMs: GCP e2-micro, AWS t2.micro, Oracle Cloud ARM

Cloud and IAM

Cloud Fundamentals

  • Google Cloud Skills Boost (free tier courses)
  • AWS Cloud Practitioner Essentials (free)
  • GCP Free Tier ($300 credit for new accounts)
  • Cloud Security Alliance materials

// set billing alerts before you start. Always.

Containers and IaC

Docker and Terraform

  • Play with Docker (browser-based labs)
  • Terraform tutorials on developer.hashicorp.com
  • OpenTofu (open-source Terraform fork, free)
  • Container Security by Liz Rice (free PDF via O'Reilly)

// OpenTofu is fully compatible. Either works for this program.

Kubernetes

Orchestration

  • Kubernetes.io official docs and interactive tutorial
  • killer.sh CKA practice (free 2-session exam sim)
  • minikube / kind for local cluster practice
  • GKE Autopilot free tier (1 cluster, limited hours)

// CKA is one of the most respected cloud certifications. This program prepares you to take it.

Observability and SRE

Operating at Scale

  • Google SRE Book (free at sre.google/books)
  • Prometheus docs and alerting best practices
  • Grafana Labs learning portal (free)
  • OpenTelemetry docs (vendor-neutral instrumentation)

// the SRE book is essential reading. Read chapters 1-4 in week 1.

Community

Stay Connected

  • CNCF Slack (cloud-native community)
  • r/devops and r/kubernetes
  • DevOps Africa Community Discord
  • GitHub (your public portfolio starts now)

// your GitHub activity graph is part of your CV. Start committing from week 1.

Who Is Teaching You

Amina Lawal
Amina Lawal
Platform and Cloud Engineer

I built this program because the resources that existed were either too shallow or buried behind paywalls. I wanted something rigorous, free, and honest about how production systems actually work.

I work on platform engineering and cloud infrastructure. I have gone through the journey of learning this material without a roadmap, and I am here to give you the one I wish I had.

This is not a course. It is a mentorship. I will be in the sessions, in the code reviews, and in the post-mortems. You will not be learning alone.

GCP Kubernetes Terraform Platform Engineering SRE

Before You Apply

Answers to the questions people ask most. If yours is not here, reach out on X.

No. This program is designed for absolute beginners. The only prerequisites are a laptop, an internet connection, and the time to commit. You will install and configure every tool from scratch during the program itself.

Yes. No tuition, no upsell, no hidden fees. The only cost is cloud credits. GCP gives every new account $300 in free credit, which covers everything in this program comfortably.

A live session with Amina covering that week's topic, a hands-on lab you complete independently, and a deliverable you submit before the next session. Sessions are recorded. You are expected to show your work.

WAT (West Africa Time, UTC+1). Exact session times will be confirmed once the cohort is formed and everyone's availability is known.

This is a small cohort. The whole point is mentorship, not a lecture hall. Applications are reviewed individually. Commitment matters more than background.

No. You will get something more useful: a GitHub portfolio with real infrastructure projects and the skills to pass a technical interview. The CKA (Certified Kubernetes Administrator) exam prep is built into Week 7 for those who want a formal credential afterward.

Cohort 1 starts 29 June 2026. Applications are now closed. Join the waitlist to be first in line when Cohort 2 opens.

A laptop, a Gmail account to set up GCP, and Git installed on your machine. That is it. Everything else gets installed during Week 1.

Cohort 1  ·  Starting 29 Jun 2026  ·  Applications Closed

Cohort 1 is Full

Applications for Cohort 1 are now closed. If you want to be first in line when Cohort 2 opens, join the waitlist. The programme is free and that will not change.

Join the Waitlist

You will be notified as soon as Cohort 2 applications open.