Job Description:
1. Role Highlights:
- Participate in building AI workflow orchestration and automation platforms from scratch or scaling existing ones (RAG, tool invocation, Agent collaboration, asynchronous orchestration).
- Implement Diffy or similar "shadow traffic replay + response diff comparison" solutions for model/version regression testing and gradual rollout.
- Collaborate with multiple business lines (IM, customer service, marketing automation, data processing, etc.) to deliver real production-level closed-loop solutions.
2. Key Responsibilities:
- AI/LLM Workflow Orchestration: Design and implement multi-step reasoning, Agent collaboration, tool invocation (Tool-Calling/Function-Calling), asynchronous task queues, and compensation mechanisms.
- RAG Optimization: Build and optimize Retrieval-Augmented Generation pipelines including data ingestion, chunking, vectorization, recall/reranking, context compression, caching, and cost reduction.
- Evaluation & Quality Assurance: Establish automated evaluation and alignment systems (benchmark sets, Ragas/G-Eval/custom metrics), integrate A/B testing, and implement online monitoring.
- Engineering & Observability: Develop model/prompt version management, feature/data versioning, experiment tracking (MLflow/W&B), and audit logs.
- Platform Integration: Expose workflows via API/SDK/microservices; integrate with business backends (Go/PHP/Node), queues (Kafka/RabbitMQ), storage (Postgres/Redis/object storage), and vector databases (Milvus/Qdrant/pgvector).
Job Requirements:
3. Must-Have Qualifications:
- 3+ years backend or data/platform engineering experience with 1-2 years hands-on LLM/generative AI projects.
- Proficiency in LLM application engineering: prompt engineering, function/tool calling, dialogue state management, memory, structured output, alignment and evaluation.
- Experience with workflow frameworks: LangChain/LangGraph, LlamaIndex, Temporal/Prefect/Airflow, or custom DAG/state machine implementations.
- End-to-end RAG experience: data cleaning → vectorization → recall → reranking → evaluation.
- Familiarity with Diffy or equivalent traffic replay/diff comparison solutions.
- Strong engineering fundamentals: Docker, CI/CD, Git, observability tools (OpenTelemetry/Prometheus/Grafana).
- Proficiency in Go/Python/TypeScript with ability to write reliable services and tests.
4. Preferred Qualifications:
- Deep Diffy implementation experience.
- LLMOps/evaluation platform experience (Arize Phoenix, Evidently, PromptLayer).
- Agent framework implementation (LangGraph, autogen/crewAI).
- Security/compliance knowledge (data masking, GDPR).
- Domain experience in IM/customer service/marketing automation.
- Cost optimization techniques for LLM deployments.
Technical Stack:
- Orchestration: LangChain, LlamaIndex, Temporal
- Models: OpenAI, Anthropic, Google, VLLM
- Vector DBs: Milvus, Qdrant, pgvector
- Infra: Docker/K8s, gRPC, Kafka, Postgres
- Observability: Prometheus, Grafana, OpenTelemetry
Benefits:
Competitive compensation, career growth opportunities, collaborative team environment, full remote work flexibility.


