Open to SRE & platform roles · Bengaluru, IN

ASIF DRAXI

SiteReliabilityEngineer.

I build the platforms that keep production calm at 2 AM — and contribute upstream to open source along the way. 5+ years across SRE, DevOps, Cloud, and open source on GCP, AWS, and Azure.

Production work Open source GitHub LinkedIn Email Résumé

SRE

Reliability, SLOs, on-call, postmortems

DevOps

CI/CD, GitOps, release engineering

Cloud

GCP · AWS · Azure platform engineering

Open Source

Ansible, Argo CD, Jenkins — upstream PRs

scroll

SRE ◆

Kubernetes ◆

Terraform ◆

Ansible ◆

GitHub Actions ◆

New Relic ◆

GCP ◆

AWS ◆

Azure ◆

Observability ◆

Auto-remediation ◆

GitOps ◆

Python ◆

Open source ◆

SRE ◆

Kubernetes ◆

Terraform ◆

Ansible ◆

GitHub Actions ◆

New Relic ◆

GCP ◆

AWS ◆

Azure ◆

Observability ◆

Auto-remediation ◆

GitOps ◆

Python ◆

Open source ◆

01 · About

Reliability is a design discipline, not a hope.

Site Reliability Engineer and open source contributor with 5+ years on GCP, AWS, and Azure. I build Kubernetes platforms, Infrastructure as Code (Terraform, Ansible), and observability loops that turn 2 AM pages into self-healing systems — plus upstream PRs to Ansible, Argo CD, and Jenkins. Track record: 99.9% uptime SLAs, $250K+ quarterly GCP savings, and 40% less release toil.

The work I care most about lives at the boundary: where infrastructure becomes a product for engineers, where alerts become actions, and where upstream open source fixes make the next on-call shift quieter.

Years in SRE

99.9%

Uptime SLA

$250K+

Quarterly savings

40%

Release toil reduced

02 · Stack

What I work with.

Group 01

Cloud

GCPAWSAzureVPCIAMLoad Balancing

Group 02

Containers & Orchestration

KubernetesGKEEKSDockerHelm

Group 03

Infrastructure as Code

TerraformAnsibleGitOps

Group 04

CI/CD & Automation

GitHub ActionsJenkinsAzure DevOpsGit

Group 05

Observability

New RelicDynatracePingdomPagerDutyOpsgenie

Group 06

Languages

PythonBashJavaScript

03 · Experience

A timeline of the systems I've kept running.

BlackLine

Jun 2024 — Present

Site Reliability EngineerBengaluru, KA

› Operating high-availability production infrastructure with 99.9%+ uptime SLAs and on-call incident management.
› Led GCP cost optimization across right-sizing, resource auditing, and database upgrades — $250K+ saved in a single quarter.
› Migrated legacy release pipelines to GitHub Actions, reducing manual release intervention by 40%.
› Built an auto-healing system wiring New Relic telemetry → PagerDuty → GitHub Actions → Ansible runbooks for instant remediation of known failures.
› Led zero-downtime migration of Apache NiFi clusters from Chef to Ansible, then orchestrated automated multi-region deployments.

GCPKubernetesAnsibleGitHub ActionsNew RelicPagerDuty

Liferay

Dec 2022 — Jun 2024

Associate Site Reliability EngineerBengaluru, KA

› Scaled GKE clusters for mission-critical portals — resource allocation, security compliance, fault tolerance.
› Deployed Dynatrace + Pingdom with proactive thresholds, catching degradation before customer impact.
› Restructured Jenkins pipelines, removing build-stage bottlenecks to stabilize daily release trains.
› Node-pooling + auto-scaling strategy delivered a sustainable 20% compute footprint reduction.
› Authored runbooks and custom Python/Bash automation, saving the team 10+ hours per week.

GKEDynatraceJenkinsPythonBash

Capgemini

Jun 2019 — Aug 2021

Senior Software EngineerMumbai, MH

› Provisioned and secured enterprise AWS/Azure infrastructure — VPCs, EC2/VMs, auto-scaling groups for global clients.
› Troubleshot complex network and routing issues — Load Balancer algorithms, DNS resolution, TCP/IP bottlenecks.
› Introduced IaC practices with Git-managed ARM templates and scripting.
› Built Angular front-ends and integrated Azure DevOps CI/CD pipelines for cross-functional Agile teams.

AWSAzureAngularAzure DevOpsARM

04 · Selected work

Things I've shipped.

Featured

Auto-healing remediation loop

Production system at BlackLine: New Relic alerts trigger PagerDuty events, which fire GitHub Actions workflows running Ansible playbooks against known failure states. Cuts MTTR on common incidents to near-zero.

New RelicPagerDutyGitHub ActionsAnsible

NiFi multi-region orchestration

Migrated Apache NiFi clusters from Chef-based config to Ansible roles, then orchestrated zero-downtime multi-region deployments with consistent configuration drift detection.

AnsibleApache NiFiGCPMigration

Enterprise K8s Automation Lab

Cloud-hybrid testing environment defined entirely in IaC. Terraform bootstraps the VPC and instances; Ansible configures a working Kubernetes cluster for microservice experimentation.

TerraformAnsibleKubernetesGitOps

Serverless cloud alerting engine

Python + AWS Lambda alert engine. Pulls log errors from S3/CloudWatch, categorizes by severity, dispatches via webhook. Reduced false-positive alert bloat.

PythonAWS LambdaObservability

05 · Open source