Job Description
This position is responsible for the design, development, and optimization of a comprehensive data platform development platform. The role involves working on multiple subsystems including data collection, job scheduling, data quality management, metadata handling, indicator systems, and data cleaning processes. Additionally, the candidate will be tasked with building and maintaining service components of big data platforms, ensuring high availability, stability, and low latency in service delivery. The role also requires conducting in-depth business analysis using data visualization tools to present findings effectively. Furthermore, the individual will participate in product and application development, establishing data access standards and protocols. Lastly, the position demands continuous research into emerging technologies to address business challenges and enhance data processing, analysis, and visualization methodologies.
Key Responsibilities
- Lead the design and development of a unified data platform, focusing on subsystems such as data collection, job scheduling, data quality, metadata management, indicator systems, and data cleaning. This includes defining technical specifications, coordinating cross-functional teams, and ensuring alignment with business objectives.
- Develop and maintain core service components of big data platforms, including optimizing existing technical frameworks for scalability, performance, and reliability. This involves implementing solutions to ensure high availability, stability, and low latency in service operations.
- Perform business analysis on data sets to identify trends, patterns, and insights. Utilize advanced visualization tools (e.g., Tableau, Power BI, or custom dashboards) to create intuitive and actionable reports for stakeholders.
- Collaborate with product teams to design and implement data access standards, ensuring consistency, security, and efficiency across applications and services. This includes defining data governance policies and integration protocols.
- Conduct research on cutting-edge technologies and methodologies to solve real-world business problems. This involves evaluating tools like Flink for streaming data processing, developing drag-and-drop reporting systems, and exploring innovative approaches to data analysis and visualization.
- Provide technical leadership in the development lifecycle, from requirements gathering to deployment and post-launch support. This includes mentoring junior developers, documenting processes, and ensuring compliance with industry best practices.
- Monitor and analyze system performance metrics to identify bottlenecks and areas for improvement. Implement solutions to enhance data processing efficiency, reduce latency, and ensure seamless user experiences.
- Engage in continuous learning to stay updated on emerging trends in data engineering, big data technologies, and analytics tools. Share knowledge within the team to foster innovation and technical growth.
Job Requirements
- Proven experience in designing and developing data platforms, with a strong background in subsystems such as data collection, job scheduling, data quality, metadata management, and data cleaning. Familiarity with ETL processes and data pipeline optimization is essential.
- Expertise in big data technologies including Hadoop, Spark, Kafka, and cloud platforms (e.g., AWS, Azure, or GCP). Ability to build scalable and high-performance service components with a focus on reliability and fault tolerance.
- Strong proficiency in data visualization tools (e.g., Tableau, Power BI, or Python libraries like Matplotlib and Seaborn). Experience in creating interactive dashboards and reports to communicate complex data insights effectively.
- Deep knowledge of data processing frameworks and algorithms, particularly in streaming data (e.g., Apache Flink, Apache Storm) and batch processing. Ability to develop and optimize data workflows for real-time and historical data scenarios.
- Excellent analytical and problem-solving skills, with the ability to translate business requirements into technical solutions. Experience in working with diverse data sources and formats, including structured, semi-structured, and unstructured data.
- Strong understanding of data governance, security, and compliance standards. Ability to design data access policies that ensure data integrity, privacy, and regulatory adherence.
- Proficiency in programming languages such as Python, Java, or Scala. Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB) for data storage and retrieval.
- Ability to work in a fast-paced, dynamic environment with tight deadlines. Strong organizational and time management skills to balance multiple projects and priorities.
- Excellent communication and collaboration skills to work with cross-functional teams, including data scientists, product managers, and DevOps engineers. Ability to present technical concepts to non-technical stakeholders clearly and concisely.
- Preferred qualifications include a bachelor’s or master’s degree in computer science, data science, or a related field. Experience with agile methodologies and CI/CD pipelines is advantageous.