feat: Add scheduled health check and auto-recovery

Major enhancements to Claude Router v1.1.0:

- Add APScheduler for automated Claude Pro health checks
- Schedule checks every hour (0-4 minutes) to detect quota recovery
- Implement intelligent auto-switch back to Claude Pro when available
- Add manual health check endpoint for immediate testing
- Enhance status monitoring with health check metrics
- Improve API compatibility with older Anthropic client versions
- Update documentation with new features and usage examples
- Configure Claude Code CLI integration with environment variables

The router now automatically detects when Claude Pro quota is restored
and switches back to prioritize the premium service.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Will Song
2025-07-14 19:03:02 -05:00
parent d0d797ef46
commit 77096edebd
4 changed files with 181 additions and 17 deletions

View File

@@ -5,7 +5,8 @@
## 功能特性
- **自动故障转移**: 检测到速率限制或使用限制时自动切换provider
- **健康检查**: 实时监控各provider状态
- **定时健康检查**: 每小时前5分钟自动检测Claude Pro限额恢复
- **智能恢复**: 自动切换回Claude Pro优先使用高级功能
- **手动切换**: 支持手动切换到指定provider
- **兼容Claude Code CLI**: 完全兼容Anthropic API格式
- **Docker化部署**: 一键部署,开箱即用
@@ -37,16 +38,21 @@ curl http://localhost:8000/v1/status
### 3. 配置Claude Code CLI
修改Claude Code CLI的配置将API endpoint指向路由器:
设置环境变量将Claude Code CLI指向路由器
```bash
# 设置环境变量
# 设置API endpoint为路由器地址
export ANTHROPIC_API_URL="http://localhost:8000"
export ANTHROPIC_API_KEY="your_claude_api_key"
# 或者修改Claude Code CLI配置文件
# 添加到bashrc使其永久生效
echo 'export ANTHROPIC_API_URL="http://localhost:8000"' >> ~/.bashrc
# 测试配置
echo "Hello Claude Router" | claude --print
```
**注意**: 无需修改ANTHROPIC_API_KEY路由器会自动处理API密钥。
## API端点
### 主要端点
@@ -55,6 +61,7 @@ export ANTHROPIC_API_KEY="your_claude_api_key"
- `GET /health` - 健康检查
- `GET /v1/status` - 获取路由器状态
- `POST /v1/switch-provider` - 手动切换provider
- `POST /v1/health-check` - 手动触发Claude Pro健康检查
### 健康检查响应示例
@@ -64,6 +71,8 @@ export ANTHROPIC_API_KEY="your_claude_api_key"
"current_provider": "claude_pro",
"failover_count": 0,
"last_failover": null,
"last_health_check": "2025-07-14T19:00:00.000Z",
"health_check_failures": 0,
"providers": {
"claude_pro": {"active": true},
"claude_api": {"active": true}
@@ -81,6 +90,13 @@ export ANTHROPIC_API_KEY="your_claude_api_key"
- `MAX_RETRIES`: 最大重试次数(默认: 3
- `RETRY_DELAY`: 重试延迟(默认: 1.0秒)
### 健康检查配置
- `health_check_enabled`: 是否启用定时健康检查(默认: true
- `health_check_cron`: 检查时间表达式(默认: "0-4 * * * *" - 每小时前5分钟
- `health_check_message`: 测试消息内容(默认: "ping"
- `health_check_model`: 使用的模型(默认: claude-3-haiku-20240307
### Token文件
路由器会自动从 `/home/will/docker/tokens.txt` 读取API密钥无需手动配置环境变量。
@@ -121,6 +137,16 @@ curl -X POST http://localhost:8000/v1/switch-provider \
-d '"claude_api"'
```
### 手动健康检查
```bash
# 立即检测Claude Pro是否可用
curl -X POST http://localhost:8000/v1/health-check
# 查看详细状态
curl http://localhost:8000/v1/status
```
## 开发和调试
### 本地开发
@@ -183,6 +209,21 @@ docker logs -f claude-router
- Python: 3.11+
- 支持: Claude-3 系列模型
## 更新日志
### v1.1.0 (2025-07-14)
- ✅ 添加定时健康检查功能
- ✅ 每小时前5分钟自动检测Claude Pro限额恢复
- ✅ 智能自动切换回Claude Pro
- ✅ 新增手动健康检查API
- ✅ 完善日志记录和状态监控
### v1.0.0 (2025-07-14)
- ✅ 基础路由器功能
- ✅ Claude Pro到Claude API自动故障转移
- ✅ Docker容器化部署
- ✅ Claude Code CLI兼容性
## 后续开发计划
- [ ] 添加DeepSeek API支持

138
app.py
View File

@@ -9,6 +9,8 @@ import httpx
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import StreamingResponse, JSONResponse
from anthropic import Anthropic
from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.triggers.cron import CronTrigger
from config import config
@@ -21,6 +23,9 @@ class ClaudeRouter:
self.current_provider = "claude_pro"
self.failover_count = 0
self.last_failover = None
self.last_health_check = None
self.health_check_failures = 0
self.scheduler = None
self.providers = {
"claude_pro": {
"api_key": config.claude_pro_api_key,
@@ -98,13 +103,23 @@ class ClaudeRouter:
logger.info(f"Making request with provider: {self.current_provider}")
# Make the API call
response = await asyncio.to_thread(
client.messages.create,
model=model,
max_tokens=max_tokens,
messages=messages,
stream=stream
)
if hasattr(client, 'messages'):
response = await asyncio.to_thread(
client.messages.create,
model=model,
max_tokens=max_tokens,
messages=messages,
stream=stream
)
else:
# For older anthropic versions
response = await asyncio.to_thread(
client.completions.create,
model=model,
max_tokens_to_sample=max_tokens,
prompt=f"Human: {messages[0]['content']}\n\nAssistant:",
stream=stream
)
return response
@@ -120,6 +135,81 @@ class ClaudeRouter:
raise HTTPException(status_code=500, detail=f"All providers failed. Last error: {str(e)}")
raise HTTPException(status_code=500, detail="No providers available")
async def health_check_claude_pro(self):
"""Check if Claude Pro is available again"""
# Only check if we're not currently using Claude Pro
if self.current_provider == "claude_pro":
logger.debug("Skipping health check - already using Claude Pro")
return
logger.info("Running Claude Pro health check...")
self.last_health_check = datetime.now()
try:
client = Anthropic(
api_key=config.claude_pro_api_key,
base_url=config.claude_pro_base_url
)
# Send a minimal test message
if hasattr(client, 'messages'):
response = await asyncio.to_thread(
client.messages.create,
model=config.health_check_model,
max_tokens=10,
messages=[{"role": "user", "content": config.health_check_message}]
)
else:
# For older anthropic versions
response = await asyncio.to_thread(
client.completions.create,
model=config.health_check_model,
max_tokens_to_sample=10,
prompt=f"Human: {config.health_check_message}\n\nAssistant:"
)
# If successful, switch back to Claude Pro
old_provider = self.current_provider
self.current_provider = "claude_pro"
self.health_check_failures = 0
logger.info(f"Claude Pro health check successful! Switched from {old_provider} to claude_pro")
except Exception as e:
self.health_check_failures += 1
error_str = str(e).lower()
if any(indicator in error_str for indicator in ["rate_limit", "usage limit", "quota exceeded", "429", "too many requests", "limit reached"]):
logger.info(f"Claude Pro still rate limited: {str(e)}")
else:
logger.warning(f"Claude Pro health check failed (attempt {self.health_check_failures}): {str(e)}")
def start_scheduler(self):
"""Start the health check scheduler"""
if not config.health_check_enabled:
logger.info("Health check disabled in config")
return
self.scheduler = AsyncIOScheduler()
# Schedule health check using cron expression
self.scheduler.add_job(
self.health_check_claude_pro,
trigger=CronTrigger.from_crontab(config.health_check_cron),
id="claude_pro_health_check",
name="Claude Pro Health Check",
misfire_grace_time=60
)
self.scheduler.start()
logger.info(f"Health check scheduler started with cron: {config.health_check_cron}")
def stop_scheduler(self):
"""Stop the health check scheduler"""
if self.scheduler:
self.scheduler.shutdown()
logger.info("Health check scheduler stopped")
# Initialize router
router = ClaudeRouter()
@@ -128,7 +218,14 @@ router = ClaudeRouter()
async def lifespan(app: FastAPI):
logger.info("Claude Router starting up...")
logger.info(f"Current provider: {router.current_provider}")
# Start health check scheduler
router.start_scheduler()
yield
# Stop scheduler on shutdown
router.stop_scheduler()
logger.info("Claude Router shutting down...")
app = FastAPI(
@@ -147,9 +244,11 @@ async def health_check():
"failover_count": router.failover_count,
"last_failover": router.last_failover.isoformat() if router.last_failover else None,
"providers": {
name: {"active": config["active"]}
for name, config in router.providers.items()
}
name: {"active": provider_config["active"]}
for name, provider_config in router.providers.items()
},
"last_health_check": router.last_health_check.isoformat() if router.last_health_check else None,
"health_check_failures": router.health_check_failures
}
@app.post("/v1/messages")
@@ -189,8 +288,10 @@ async def create_message(request: Request):
raise HTTPException(status_code=500, detail=str(e))
@app.post("/v1/switch-provider")
async def switch_provider(provider: str):
async def switch_provider(request: Request):
"""Manually switch to a specific provider"""
provider = await request.json()
if provider not in router.providers:
raise HTTPException(status_code=400, detail=f"Unknown provider: {provider}")
@@ -214,9 +315,24 @@ async def get_status():
"current_provider": router.current_provider,
"failover_count": router.failover_count,
"last_failover": router.last_failover.isoformat() if router.last_failover else None,
"last_health_check": router.last_health_check.isoformat() if router.last_health_check else None,
"health_check_failures": router.health_check_failures,
"providers": router.providers
}
@app.post("/v1/health-check")
async def manual_health_check():
"""Manually trigger Claude Pro health check"""
try:
await router.health_check_claude_pro()
return {
"message": "Health check completed",
"current_provider": router.current_provider,
"last_health_check": router.last_health_check.isoformat() if router.last_health_check else None
}
except Exception as e:
raise HTTPException(status_code=500, detail=f"Health check failed: {str(e)}")
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host=config.host, port=config.port)

View File

@@ -19,6 +19,12 @@ class Config(BaseModel):
claude_pro_base_url: str = "https://api.anthropic.com"
claude_api_base_url: str = "https://api.anthropic.com"
# Health check settings
health_check_enabled: bool = True
health_check_cron: str = "0-4 * * * *" # Every hour, first 5 minutes
health_check_message: str = "ping"
health_check_model: str = "claude-3-haiku-20240307" # Use cheapest model for checks
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Load from environment or token file

View File

@@ -3,4 +3,5 @@ uvicorn==0.24.0
httpx==0.25.2
pydantic==2.5.0
anthropic==0.7.8
python-dotenv==1.0.0
python-dotenv==1.0.0
apscheduler==3.10.4