Operations and Maintenance Engineer at dappOS

Full Time1 month ago
Employment Information
Job Description
As an Operations Engineer, you will play a critical role in designing, implementing, and maintaining scalable and secure infrastructure solutions. This position requires hands-on expertise in cloud computing platforms, including Amazon AWS and Alibaba Cloud, to ensure optimal system performance and reliability. You will be responsible for managing complex cloud resources such as VPC, CDN, S3, ECS, EKS, ELB, MySQL, Redis, and ElastiSearch, while also collaborating with cross-functional teams to align technical strategies with business objectives. The role involves continuous improvement of operational processes, automation of repetitive tasks, and proactive monitoring of system health to prevent downtime and ensure seamless user experiences.
Key Responsibilities
  • Lead the creation and management of cloud infrastructure resources across Amazon AWS and Alibaba Cloud, including designing and configuring VPC networks, optimizing CDN performance, and managing object storage solutions like S3. Implement container orchestration frameworks (ECS, EKS) and ensure efficient resource allocation for scalable applications.
  • Collaborate with development teams to streamline the code building process, ensuring efficient CI/CD pipelines and seamless integration with container orchestration tools. Develop and maintain container automation operations capabilities using Docker, Kubernetes, and orchestration platforms to reduce manual intervention and improve deployment efficiency.
  • Design and implement high availability solutions for critical systems, ensuring fault tolerance and minimal downtime. Establish comprehensive security monitoring mechanisms using tools like AWS CloudTrail, Alibaba Cloud Security Center, and SIEM platforms. Develop and execute fault recovery mechanisms, including disaster recovery plans and regular incident response drills to test system resilience.
  • Monitor system performance and security metrics in real-time using tools such as Prometheus, Grafana, and ELK stack. Analyze logs and alerts to identify potential issues and implement proactive measures to mitigate risks. Maintain documentation for infrastructure configurations, security protocols, and operational procedures to ensure knowledge sharing and compliance with industry standards.
  • Support incident management and troubleshooting efforts by coordinating with on-call teams and resolving critical issues during production outages. Conduct root cause analysis to identify system vulnerabilities and implement long-term solutions to prevent recurrence. Stay updated on emerging cloud technologies and industry best practices to continuously enhance operational capabilities.
Job Requirements
  • Proven experience (3+ years) in cloud operations, with expertise in Amazon AWS and Alibaba Cloud. Demonstrated ability to design and manage complex cloud architectures, including networking, storage, and database solutions.
  • Strong proficiency in containerization technologies (Docker, Kubernetes) and CI/CD pipeline development. Experience with automation tools such as Terraform, Ansible, and Jenkins to streamline infrastructure provisioning and deployment processes.
  • Deep understanding of high availability, disaster recovery, and security best practices. Familiarity with tools like AWS Auto Scaling, Alibaba Cloud Load Balancer, and security monitoring platforms (SIEM) to ensure system reliability and data protection.
  • Excellent problem-solving skills with the ability to troubleshoot complex system issues. Strong analytical mindset to identify performance bottlenecks and implement data-driven solutions for system optimization.
  • Ability to work in a fast-paced environment with minimal supervision. Strong communication skills to collaborate with developers, security teams, and stakeholders while documenting technical processes and presenting solutions.
  • Preferred qualifications include certifications in cloud computing (AWS Certified Solutions Architect, Alibaba Cloud ACA) and container orchestration (CKA, AWS Certified Kubernetes). Familiarity with DevOps practices and infrastructure-as-code (IaC) methodologies is highly advantageous.
MyJob.one - Remote work. Real impact

New Things Will Always
Update Regularly

MyJob.one - Remote work. Real impact