Job Description
Key Responsibilities
- Deploy updates and fixes to ensure the stability and performance of our services, including version control, testing, and rollback procedures.
- Monitor system health and maintain high uptime by proactively identifying and mitigating potential risks.
- Provide Level 2 technical support to resolve escalated issues, while being on-call to address urgent DevOps team needs during production outages.
- Develop and maintain tools that automate error detection, reduce manual intervention, and improve overall operational efficiency.
- Design and implement integration solutions for internal back-end systems, ensuring compatibility and data consistency across platforms.
- Conduct root cause analysis for production errors, document findings, and propose preventive measures to avoid recurrence.
- Investigate and resolve complex technical issues, including system configuration, network connectivity, and application performance bottlenecks.
- Create and refine scripts for automating visualization tasks, such as data processing, reporting, and dashboard generation.
- Establish standardized procedures for system troubleshooting, maintenance, and incident response to ensure consistency and scalability.
- Collaborate with cross-functional teams to align technical solutions with business objectives and user requirements.
- Continuously optimize system workflows and infrastructure to enhance reliability, security, and user experience.
- Stay updated on emerging technologies and industry best practices to drive innovation in system management and automation.
Job Requirements
- Proven experience in DevOps operations, with a strong track record of maintaining high system uptime and resolving critical issues.
- Advanced knowledge of system administration, automation tools (e.g., Ansible, Puppet), and cloud platforms (e.g., AWS, Azure).
- Excellent problem-solving skills and ability to analyze complex technical scenarios to identify root causes and implement effective solutions.
- Proficiency in scripting languages (e.g., Python, Bash) for automation and visualization tasks, including API integration and data processing.
- Strong understanding of software development lifecycle, with experience in integrating applications with internal back-end systems.
- Ability to design and document standardized procedures for system maintenance, troubleshooting, and incident management.
- Excellent communication skills to collaborate with teams and explain technical solutions to non-technical stakeholders.
- Preferred: Experience with CI/CD pipelines, containerization technologies (e.g., Docker, Kubernetes), and monitoring tools (e.g., Prometheus, Grafana).
- Ability to work independently and as part of a team, with a proactive approach to identifying opportunities for improvement.
- Strong attention to detail and commitment to delivering high-quality, reliable technical solutions that align with business goals.
- Preferred: Familiarity with ITIL frameworks and incident management best practices.
- Ability to adapt to evolving technologies and continuously improve system performance and security protocols.


