Job Description
We are seeking a skilled professional to design, implement, and manage comprehensive monitoring solutions to ensure high availability and performance of our infrastructure and applications. The ideal candidate will collaborate with cross-functional teams to integrate monitoring tools into our CI/CD pipeline and lead incident response efforts.
Key Responsibilities
- Design, implement, and manage comprehensive monitoring solutions to ensure high availability and performance of our infrastructure and applications.
- Develop and maintain robust recording and alert mechanisms to proactively identify and mitigate potential issues.
- Collaborate with the infrastructure team to integrate monitoring solutions into the CI/CD pipeline, ensuring seamless deployment and operation.
- Conduct performance analysis, capacity planning, and scalability testing to ensure the system meets current and future needs.
- Lead incident response and troubleshooting efforts, utilizing monitoring data to quickly resolve operational problems.
Job Requirements
- Proven experience in designing and implementing monitoring solutions for complex infrastructure and applications.
- Strong understanding of CI/CD pipelines and experience integrating monitoring tools.
- Excellent problem-solving skills and ability to analyze performance metrics.
- Experience in incident response and troubleshooting using monitoring data.
- Ability to collaborate effectively with infrastructure and development teams.
- Knowledge of capacity planning and scalability testing methodologies.