DevOps Engineer

Full Time5 days ago
Employment Information
1. Operations team management: - Lead the operations team, responsible for the stability and high availability of the platform, ensuring that all critical services run 24/7 without any malfunctions. -Develop and optimize operation and maintenance processes and standards to improve team efficiency and quality. 2. Platform stability management: Monitor and analyze platform performance, respond quickly and solve system failures, network problems, and other technical issues. -Coordinate and promote cooperation with departments such as development, product, and technical support to ensure the stable operation of the platform. -Develop a disaster recovery plan and conduct regular drills to ensure rapid service recovery in the event of system failure. 3. Technical architecture optimization: Participate in the design and optimization of system architecture to ensure the scalability, availability, and security of the platform. -Assist the product team in updating system functions, optimizing operational support, and enhancing user experience. 4. Development and Implementation of Automation Tools: Promote the development and implementation of automation operation and maintenance tools, reduce manual operations, and improve work efficiency. -Collaborate with the R&D team to drive the CI/CD process and enhance the automation level of continuous integration and deployment. 5. Emergency response and problem management: - Quickly respond to and handle sudden technical failures, coordinate all parties to troubleshoot and solve problems, and ensure rapid recovery of services. -By conducting post audit and continuously optimizing emergency response processes and operation and maintenance systems, we aim to reduce the occurrence of future failures. 6. Monitoring and Reporting: - Build and maintain a monitoring system for the healthy operation of the platform, providing reports on system performance, operating status, and problem trends. -Regularly report the operation and maintenance status, key KPI indicators, and service availability to the management, and propose improvement suggestions.
MyJob.one - Remote work. Real impact

New Things Will Always
Update Regularly

MyJob.one - Remote work. Real impact