Job Description
We are seeking a skilled Data Crawling Specialist to join our team. The ideal candidate will be responsible for developing and maintaining web crawlers to collect data from various sources, ensuring high-quality data extraction and storage.
Key Responsibilities
- Responsible for data crawling, including static web pages, dynamic web pages (JS rendering), API interface data, etc.
- Handle anti-crawling strategies such as User Agent impersonation, proxy pooling, captcha bypass, cookie encryption, body parameter encryption, etc., to improve crawling success rate.
- Analyze webpage data and extract information using techniques such as XPath, CSS selectors, regular expressions, etc.
- Store and crawl data to databases such as MySQL, MongoDB, Redis, Selectdb, etc.
- Write data cleaning and deduplication related code to improve data quality.
- Monitor the running status of crawlers, optimize crawling strategies, and ensure the stability of data crawling.
Job Requirements
- Proven experience in web scraping and data crawling techniques.
- Strong knowledge of handling anti-crawling mechanisms and strategies.
- Proficiency in data extraction techniques like XPath, CSS selectors, and regular expressions.
- Experience with various databases such as MySQL, MongoDB, Redis, or Selectdb.
- Ability to write efficient data cleaning and deduplication scripts.
- Strong problem-solving skills and attention to detail.
- Experience in monitoring and optimizing crawler performance is a plus.