Site Reliability Engineer

William Javier Acelas Granados

Specializing in AIOps and Automation for Resilient Systems

terminal — ssh william@sre.dev
william@sre.dev:~$whoami

AIOps and Automation Specialist with 5+ years of experience

william@sre.dev:~$cat skills.txt

GCP, Python, NextJS, PostgreSQL, FastAPI, Terraform, GitOps

william@sre.dev:~$uptime

99.99% service availability achieved in 2023

william@sre.dev:~$./view_portfolio.sh

Loading portfolio content...

Blog & Portfolio

Where Knowledge Meets Practice

My blog and portfolio work together to showcase both my expertise and my approach to learning from mistakes.

The Blog

Learning from Mistakes

My blog focuses on the valuable lessons learned from common coding errors and challenges. I believe that mistakes are essential stepping stones to mastery.

Through detailed explanations and multi-level solutions, I help developers at all stages understand not just how to fix problems, but why they occur and how to prevent them.

The Portfolio

Showcasing Expertise

My portfolio demonstrates how the lessons from those mistakes have shaped my professional journey and technical expertise in SRE and DevOps.

It showcases the projects, skills, and experiences that have resulted from years of learning, problem-solving, and continuous improvement.

About Me

William Javier Acelas Granados

Site Reliability Engineer specializing in AIOps and automation for resilient systems. Passionate about transforming complex infrastructure into reliable, self-healing systems.

william.py

class SRE:
def __init__(self):
    self.name = "William Javier Acelas Granados"
    self.role = "AIOps Maestro & Automation Architect"
    self.skills = ["Python", "Kubernetes", "Terraform", "Prometheus"]
    self.mission = "Transforming chaos into harmony"

def optimize_systems(self):
    while True:
        monitor()
        analyze()
        plan()
        execute()
        learn()

def handle_incident(self, alert):
    respond_quickly()
    mitigate_impact()
    resolve_issue()
    conduct_postmortem()

if __name__ == "__main__":
william = SRE()
william.optimize_systems()

Professional Summary

I'm a Site Reliability Engineer with 5+ years of experience designing, implementing, and managing highly available, fault-tolerant systems. I specialize in cloud infrastructure, automation, and observability, with a focus on creating self-healing systems that require minimal human intervention.

Experience

5+ Years

in SRE & DevOps

Projects

20+

successful deliveries

Certifications

K8S

Certified Kubernetes Administrator (CKA)

Cloud Native Computing Foundation, 2022

AWS

AWS Certified Solutions Architect - Professional

Amazon Web Services, 2021

GCP

Google Cloud Professional Cloud Architect

Google Cloud, 2020

TF

Terraform Certified Associate

HashiCorp, 2021

Skills & Expertise

Technical Proficiencies

My technical expertise spans infrastructure, observability, development, and security - all essential components for building reliable systems.

Infrastructure Expertise

Kubernetes

Expert

AWS

Advanced

GCP

Advanced

Terraform

Expert

Docker

Expert

Linux

Advanced

Skills at a Glance

Infrastructure

6 skills

Observability

6 skills

Development

6 skills

Security

6 skills

Databases

4 skills

Automation

3 skills

Recent Work

Project Portfolio

A selection of my recent projects, focusing on infrastructure automation, observability, and system reliability. Each project demonstrates my approach to solving complex technical challenges.

Cloud Migration Framework
Infrastructure

Cloud Migration Framework

A comprehensive framework for migrating on-premises infrastructure to AWS with zero downtime and minimal risk.

Services Migrated
200+
Downtime
0
Cost Reduction
35%
AWS
Terraform
Migration
Zero Downtime
Enterprise Kubernetes Platform
Infrastructure

Enterprise Kubernetes Platform

A production-grade Kubernetes platform with built-in observability, security, and developer-friendly features.

Deployment Time
↓ 80%
Resource Utilization
↑ 40%
MTTR
↓ 65%
Kubernetes
GitOps
Observability
Multi-tenant
Full-Stack Observability Solution
Observability

Full-Stack Observability Solution

An integrated observability solution combining metrics, logs, and traces for complete system visibility.

Alert Noise
↓ 70%
MTTR
↓ 45%
Adoption
15 teams
Prometheus
Grafana
Loki
Tempo
Infrastructure as Code Pipeline
Infrastructure

Infrastructure as Code Pipeline

Automated infrastructure provisioning and management with comprehensive testing and validation.

Deployment Frequency
↑ 300%
Failed Changes
↓ 80%
Compliance Issues
↓ 95%
Terraform
CI/CD
Testing
Automation
Disaster Recovery Automation
Reliability

Disaster Recovery Automation

Automated disaster recovery testing and failover procedures to ensure business continuity.

RTO
↓ 85%
RPO
< 5 min
Test Coverage
100%
DR
Automation
Testing
Resilience
Security Compliance Framework
Security

Security Compliance Framework

Automated security scanning and compliance reporting for cloud infrastructure and applications.

Compliance Rate
99.8%
Mean Time to Remediate
↓ 75%
Vulnerabilities
↓ 60%
Security
Compliance
Automation
Reporting
Database Automation Platform
Databases

Database Automation Platform

Self-service platform for database provisioning, scaling, and management with built-in best practices.

Provisioning Time
↓ 95%
DBA Tickets
↓ 80%
Uptime
99.99%
PostgreSQL
MongoDB
Automation
Self-service
Professional Journey

Work Experience

My career path showcases a progression of increasing responsibility and impact in SRE and DevOps roles, with a focus on automation, reliability, and performance optimization.

Senior SRE

2021 - Present8 engineers
TechCorp

Leading SRE initiatives across the organization, focusing on reliability, observability, and automation.

Deployment Time
↓ 70%
MTTR
↓ 45%
Uptime
99.99%
Cost Reduction
30%

Key Achievements

  • Led migration of 200+ services to Kubernetes, improving deployment time by 70%
  • Implemented comprehensive monitoring with Prometheus and Grafana, reducing MTTR by 45%
  • Designed and implemented disaster recovery procedures, achieving 99.99% uptime
  • Mentored junior engineers and established SRE best practices across teams
  • Developed automated incident response playbooks, reducing manual intervention by 60%
  • Created a self-service platform for developers to manage their services

Technologies & Skills

Kubernetes
AWS
Terraform
Prometheus
Grafana
Python
Go
CI/CD

Key Projects

Kubernetes Migration
Observability Platform
Disaster Recovery Automation
CI/CD Pipeline Redesign
Get In Touch

Contact Me

I'm always open to discussing new projects, opportunities, or partnerships. Feel free to reach out!

Why Work With Me?

Reliability Expert

I build systems that don't just work, but work reliably under pressure and at scale.

Automation Advocate

I believe in automating everything that can be automated, freeing up time for innovation.

Problem Solver

I thrive on solving complex technical challenges with creative and efficient solutions.

Effective Communicator

I excel at translating complex technical concepts into clear, actionable information.