Job Description:
1. Key Highlights:
- Participate in building AI workflow orchestration and automation platforms from scratch (0→1) or scaling existing ones (1→N), covering RAG, tool invocation, Agent collaboration, and asynchronous orchestration.
- Implement Diffy or similar "shadow traffic replay + response diff comparison" solutions for model/version regression testing and gradual rollout.
- Collaborate with multiple business lines (IM, customer service, marketing automation, data processing, etc.) to deliver real-world, production-grade closed-loop solutions.
Key Responsibilities:
2.1 AI/LLM Workflow Orchestration:
- Design and implement multi-step reasoning, Agent collaboration, tool invocation (Tool-Calling/Function-Calling), asynchronous task queues, and compensation mechanisms.
- Build and optimize RAG pipelines: data ingestion, chunking & vectorization, retrieval/reranking, context compression, caching, and cost reduction.
2.2 Evaluation & Quality Assurance:
- Establish automated evaluation and alignment systems (benchmark sets, Ragas/G-Eval/custom metrics), integrating A/B testing and real-time monitoring.
- Leverage Diffy (or equivalent) for shadow traffic replay and response diff analysis to identify regression risks in models/prompts/service upgrades; support canary releases and fast rollbacks.
2.3 Engineering & Observability:
- Develop model/prompt versioning, feature/data versioning, experiment tracking (MLflow/W&B), and audit logs.
- Implement end-to-end observability: latency, error rates, prompt/context length, hit rates, and cost monitoring (tokens/$).
2.4 Platform & Integration:
- Expose workflows via API/SDK/microservices; integrate with business backends (Go/PHP/Node), queues (Kafka/RabbitMQ), storage (Postgres/Redis/object storage), and vector DBs (Milvus/Qdrant/pgvector).
- Ensure security & compliance: anonymization, PII protection, auditing, rate limiting/quota, and model governance.
Job Requirements:
3. Must-Have Qualifications:
- 3+ years of backend or data/platform engineering experience, with 1–2+ years in LLM/generative AI projects.
- Proficiency in LLM application engineering: prompt engineering, function/tool calling, dialogue state management, memory, structured output, alignment, and evaluation.
- Hands-on experience with at least one orchestration framework: LangChain/LangGraph, LlamaIndex, Temporal/Prefect/Airflow, or custom DAG/state machine/compensation solutions.
- End-to-end RAG expertise: data cleaning → vectorization → retrieval → reranking → evaluation; familiarity with Milvus/Qdrant/pgvector.
- Experience with Diffy or equivalent traffic replay/diff comparison tools (shadow traffic, record/replay, regression output analysis, canary releases).
- Strong engineering fundamentals: Docker, CI/CD, Git workflows, logging/metrics (OpenTelemetry/Prometheus/Grafana).
- Proficiency in at least one primary language (Go/Python/TypeScript); ability to write reliable services and tests.
- Excellent remote collaboration and documentation skills; data-driven delivery and retrospectives.
4. Preferred Qualifications:
- Deep Diffy experience (or integrating shadow traffic/split testing with API gateways).
- LLMOps/evaluation platform experience (Arize Phoenix, Evidently, PromptLayer, OpenAI Evals, Ragas).
- Practical Agent framework implementations (LangGraph, autogen/crewAI, GraphRAG, tool ecosystems).
- Security/compliance knowledge (anonymization, access control, PDPA/GDPR) and moderation tools (Llama Guard).
- Familiarity with IM/customer service/marketing automation domains or multilingual scenarios (Chinese/English/Vietnamese).
- Cost optimization skills: caching, retrieval compression, model routing/switching (OpenAI/Anthropic/Google/local models).
5. Tech Stack (Preferred):
- Orchestration/Agent: LangChain/LangGraph, LlamaIndex, Temporal/Prefect/Airflow
- Models & Evaluation: OpenAI/Anthropic/Google, VLLM/Ollama, Ragas, G-Eval, MLflow/W&B
- Retrieval: Milvus, Qdrant, pgvector, Elasticsearch, rerankers (bge/multilingual/E5)
- Services: Go/Python/TypeScript, gRPC/REST, Redis, Postgres, Kafka/RabbitMQ, Docker/K8s
- Observability: OpenTelemetry, Prometheus, Grafana, ELK/ClickHouse
- Diff/Replay: Twitter Diffy or equivalent shadow traffic/replay systems
Benefits:
- Competitive salary
- Career growth opportunities
- Collaborative team environment
- Fully remote work


