Job Description
As a Senior Cloud DevOps Engineer, you will be responsible for designing, deploying, and maintaining scalable cloud infrastructure solutions. This role requires working with container technologies and orchestration systems such as Kubernetes to ensure efficient application lifecycle management. You will collaborate with cross-functional teams to implement operational solutions that align with industry best practices, while continuously optimizing system performance and reliability. Key responsibilities include monitoring infrastructure at scale using tools like Grafana and Prometheus, conducting capacity and load testing for distributed applications, and analyzing performance trends to anticipate demand changes. The ideal candidate will also identify and resolve availability and performance issues across multiple layers of deployment, from hardware and operating systems to network configurations and application logic.
Key Responsibilities
- Design and implement cloud-native architectures using containers and container orchestration systems (e.g., Kubernetes) to ensure high availability, scalability, and fault tolerance
- Manage hybrid cloud environments across AWS, Azure, and Google Cloud Platform, including infrastructure provisioning, configuration management, and security compliance
- Develop and maintain automated CI/CD pipelines for deploying microservices and distributed applications with zero-downtime updates
- Conduct comprehensive performance testing, benchmarking, and capacity planning to ensure systems meet SLA requirements
- Monitor infrastructure health and performance metrics in real-time using advanced observability tools and establish alerting strategies
- Collaborate with development teams to implement scalable application designs and troubleshoot production issues across all layers
- Document system architecture, operational procedures, and performance analysis findings for knowledge sharing
- Stay updated with emerging cloud technologies and DevOps methodologies to continuously improve infrastructure efficiency
- Lead incident response and root cause analysis for critical system failures, implementing preventive measures
- Coordinate with security teams to ensure compliance with data protection regulations and cloud security best practices
Job Requirements
- Minimum 5 years of hands-on experience in cloud infrastructure operations and DevOps engineering
- Proven expertise in container orchestration systems (Kubernetes) and cloud platforms (AWS, Azure, GCP)
- Strong understanding of microservices architecture, API gateways, and distributed system design principles
- Technical proficiency in infrastructure automation tools (Terraform, Ansible, CloudFormation) and orchestration frameworks
- Experience with monitoring and observability solutions (Grafana, Prometheus, ELK stack) for large-scale deployments
- Knowledge of network protocols, load balancing, and DNS configurations for cloud environments
- Ability to analyze system performance metrics and identify optimization opportunities across infrastructure layers
- Excellent problem-solving skills with experience in troubleshooting complex cloud infrastructure issues
- Proficiency in scripting languages (Python, Bash) and cloud-native toolchains for automation
- Strong communication skills to collaborate with developers, operations teams, and stakeholders
- Preferred: AWS Certified DevOps Engineer or equivalent cloud certification
- Preferred: Experience with serverless architectures and cloud cost optimization strategies
- Preferred: Familiarity with security automation and compliance frameworks (e.g., SOC 2, GDPR)
- Preferred: Leadership experience in managing cloud infrastructure teams and mentoring junior engineers