Job Description
This role offers a unique opportunity to contribute to the development of cutting-edge AI workflow orchestration and automation platforms, focusing on RAG (Retrieval-Augmented Generation), tool invocation, Agent collaboration, and asynchronous orchestration. You will be instrumental in implementing innovative solutions like Diffy or similar "shadow traffic replay + response differential comparison" methodologies for model/version regression testing and gradual deployment.
Key Responsibilities
- AI/LLM Workflow Orchestration: Design and implement multi-step reasoning, Agent collaboration, tool invocation (Tool-Calling/Function-Calling), asynchronous task queues, and compensation mechanisms. Optimize RAG pipelines including data ingestion, chunking, vectorization, retrieval/reranking, context compression, caching, and cost reduction.
- Evaluation & Quality Assurance: Establish automated evaluation and alignment systems (benchmark sets, Ragas/G-Eval/custom metrics), integrate A/B testing, and implement real-time monitoring. Utilize Diffy (or equivalent) for shadow traffic replay and response differential analysis to identify regression risks in models/prompts/service upgrades, supporting canary releases and rapid rollbacks.
- Engineering & Observability: Build version control for models/prompts, feature/data versioning, experiment tracking (MLflow/W&B), and audit logs. Implement end-to-end observability covering latency, error rates, prompt/context length, hit rates, and cost monitoring (tokens/$).
- Platform Integration: Expose workflows via API/SDK/microservices; integrate with business backends (Go/PHP/Node), queues (Kafka/RabbitMQ), storage (Postgres/Redis/object storage), and vector databases (Milvus/Qdrant/pgvector). Ensure security and compliance (data anonymization, PII protection, auditing, rate limits, quotas, and model governance).
Job Requirements
- Mandatory: 3+ years in backend/data/platform engineering with 1–2 years of hands-on LLM/generative AI project experience.
- Proficiency in LLM application engineering: prompt engineering, function/tool calling, dialogue state management, memory, structured output, alignment, and evaluation.
- Familiarity with at least one orchestration framework: LangChain/LangGraph, LlamaIndex, Temporal/Prefect/Airflow, or custom DAG/state machine/compensation solutions.
- End-to-end RAG experience (data cleaning → vectorization → retrieval → reranking → evaluation); knowledge of Milvus/Qdrant/pgvector.
- Experience with Diffy or equivalent traffic replay/differential analysis (shadow traffic, recording/replay, regression output comparison, canary releases).
- Strong engineering fundamentals: Docker, CI/CD, Git, logging/metrics (OpenTelemetry/Prometheus/Grafana).
- Proficiency in Go/Python/TypeScript; ability to develop reliable services and tests.
- Excellent remote collaboration and documentation skills; metrics-driven delivery.
Preferred Qualifications
- Deep Diffy expertise (or integrating with API gateways for shadow traffic/routing/comparison).
- LLMOps/evaluation platform experience (Arize Phoenix, Evidently, PromptLayer, OpenAI Evals, Ragas).
- Practical Agent framework implementations (LangGraph, autogen/crewAI, GraphRAG, tool ecosystems).
- Security/compliance knowledge (anonymization, PDPA/GDPR, moderation tools like Llama Guard).
- Domain experience in IM/customer service/marketing automation or multilingual scenarios (Chinese/English/Vietnamese).
- Cost optimization techniques: caching, retrieval compression, model routing/multi-provider switching (OpenAI/Anthropic/Google/local models).
Tech Stack (Partial)
- Orchestration/Agent: LangChain/LangGraph, LlamaIndex, Temporal/Prefect/Airflow
- Models & Evaluation: OpenAI/Anthropic/Google, VLLM/Ollama, Ragas, G-Eval, MLflow/W&B
- Retrieval: Milvus, Qdrant, pgvector, Elasticsearch, rerankers (bge/multilingual/E5)
- Services: Go/Python/TypeScript, gRPC/REST, Redis, Postgres, Kafka/RabbitMQ, Docker/K8s
- Observability: OpenTelemetry, Prometheus, Grafana, ELK/ClickHouse
- Diff/Replay: Twitter Diffy or equivalent shadow traffic/replay systems
Benefits
Fully remote work environment with a collaborative team culture and competitive compensation package.


