Deploy LLM apps with Docker, manage secrets properly, add retries, rate limiting, caching, and monitoring.
Deploying LLM Applications
# Dockerfile for FastAPI LLM app
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt --no-cache-dir
COPY . .
CMD ['uvicorn', 'main:app', '--host', '0.0.0.0', '--port', '8000']
# requirements.txt
fastapi==0.115.0
anthropic==0.40.0
redis==5.0.0
chromadb==0.5.0
sentence-transformers==3.0.0
python-dotenv==1.0.0
# .env (NEVER commit to git!)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
REDIS_URL=redis://localhost:6379
PINECONE_API_KEY=...
# Production checklist:
# 1. Secrets in env vars / secret manager (never in code)
# 2. Rate limiting (per user and global)
# 3. Retry with exponential backoff on API errors
# 4. Circuit breaker for LLM API outages
# 5. Semantic response caching
# 6. Async workers for long-running tasks (Celery)
# 7. PII detection and masking before logging
# 8. Audit log all LLM interactions for compliance
# 9. Content moderation pre and post LLM call
# 10. Cost alerts when spend exceeds threshold