feat: Add scheduled health check and auto-recovery

Major enhancements to Claude Router v1.1.0: - Add APScheduler for automated Claude Pro health checks - Schedule checks every hour (0-4 minutes) to detect quota recovery - Implement intelligent auto-switch back to Claude Pro when available - Add manual health check endpoint for immediate testing - Enhance status monitoring with health check metrics - Improve API compatibility with older Anthropic client versions - Update documentation with new features and usage examples - Configure Claude Code CLI integration with environment variables The router now automatically detects when Claude Pro quota is restored and switches back to prioritize the premium service. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-14 19:03:02 -05:00
parent d0d797ef46
commit 77096edebd
4 changed files with 181 additions and 17 deletions
--- a/README.md
+++ b/README.md
@@ -5,7 +5,8 @@
 ## 功能特性

 - **自动故障转移**: 检测到速率限制或使用限制时自动切换provider
- **健康检查**: 实时监控各provider状态
+- **定时健康检查**: 每小时前5分钟自动检测Claude Pro限额恢复
+- **智能恢复**: 自动切换回Claude Pro，优先使用高级功能
 - **手动切换**: 支持手动切换到指定provider
 - **兼容Claude Code CLI**: 完全兼容Anthropic API格式
 - **Docker化部署**: 一键部署，开箱即用
@@ -37,16 +38,21 @@ curl http://localhost:8000/v1/status

 ### 3. 配置Claude Code CLI

-修改Claude Code CLI的配置，将API endpoint指向路由器：
+设置环境变量将Claude Code CLI指向路由器：

 ```bash
-# 设置环境变量
+# 设置API endpoint为路由器地址
 export ANTHROPIC_API_URL="http://localhost:8000"
-export ANTHROPIC_API_KEY="your_claude_api_key"

-# 或者修改Claude Code CLI配置文件
+# 添加到bashrc使其永久生效
+echo 'export ANTHROPIC_API_URL="http://localhost:8000"' >> ~/.bashrc
+
+# 测试配置
+echo "Hello Claude Router" | claude --print
 ```

+**注意**: 无需修改ANTHROPIC_API_KEY，路由器会自动处理API密钥。
+
 ## API端点

 ### 主要端点
@@ -55,6 +61,7 @@ export ANTHROPIC_API_KEY="your_claude_api_key"
 - `GET /health` - 健康检查
 - `GET /v1/status` - 获取路由器状态
 - `POST /v1/switch-provider` - 手动切换provider
+- `POST /v1/health-check` - 手动触发Claude Pro健康检查

 ### 健康检查响应示例

@@ -64,6 +71,8 @@ export ANTHROPIC_API_KEY="your_claude_api_key"
  "current_provider": "claude_pro", 
  "failover_count": 0,
  "last_failover": null,
+  "last_health_check": "2025-07-14T19:00:00.000Z",
+  "health_check_failures": 0,
  "providers": {
    "claude_pro": {"active": true},
    "claude_api": {"active": true}
@@ -81,6 +90,13 @@ export ANTHROPIC_API_KEY="your_claude_api_key"
 - `MAX_RETRIES`: 最大重试次数（默认: 3）
 - `RETRY_DELAY`: 重试延迟（默认: 1.0秒）

+### 健康检查配置
+
+- `health_check_enabled`: 是否启用定时健康检查（默认: true）
+- `health_check_cron`: 检查时间表达式（默认: "0-4 * * * *" - 每小时前5分钟）
+- `health_check_message`: 测试消息内容（默认: "ping"）
+- `health_check_model`: 使用的模型（默认: claude-3-haiku-20240307）
+
 ### Token文件

 路由器会自动从 `/home/will/docker/tokens.txt` 读取API密钥，无需手动配置环境变量。
@@ -121,6 +137,16 @@ curl -X POST http://localhost:8000/v1/switch-provider \
  -d '"claude_api"'
 ```

+### 手动健康检查
+
+```bash
+# 立即检测Claude Pro是否可用
+curl -X POST http://localhost:8000/v1/health-check
+
+# 查看详细状态
+curl http://localhost:8000/v1/status
+```
+
 ## 开发和调试

 ### 本地开发
@@ -183,6 +209,21 @@ docker logs -f claude-router
 - Python: 3.11+
 - 支持: Claude-3 系列模型

+## 更新日志
+
+### v1.1.0 (2025-07-14)
+- ✅ 添加定时健康检查功能
+- ✅ 每小时前5分钟自动检测Claude Pro限额恢复
+- ✅ 智能自动切换回Claude Pro
+- ✅ 新增手动健康检查API
+- ✅ 完善日志记录和状态监控
+
+### v1.0.0 (2025-07-14)
+- ✅ 基础路由器功能
+- ✅ Claude Pro到Claude API自动故障转移
+- ✅ Docker容器化部署
+- ✅ Claude Code CLI兼容性
+
 ## 后续开发计划

 - [ ] 添加DeepSeek API支持
--- a/app.py
+++ b/app.py
@@ -9,6 +9,8 @@ import httpx
 from fastapi import FastAPI, Request, HTTPException
 from fastapi.responses import StreamingResponse, JSONResponse
 from anthropic import Anthropic
+from apscheduler.schedulers.asyncio import AsyncIOScheduler
+from apscheduler.triggers.cron import CronTrigger

 from config import config

@@ -21,6 +23,9 @@ class ClaudeRouter:
        self.current_provider = "claude_pro"
        self.failover_count = 0
        self.last_failover = None
+        self.last_health_check = None
+        self.health_check_failures = 0
+        self.scheduler = None
        self.providers = {
            "claude_pro": {
                "api_key": config.claude_pro_api_key,
@@ -98,13 +103,23 @@ class ClaudeRouter:
                logger.info(f"Making request with provider: {self.current_provider}")
                
                # Make the API call
-                response = await asyncio.to_thread(
-                    client.messages.create,
-                    model=model,
-                    max_tokens=max_tokens,
-                    messages=messages,
-                    stream=stream
-                )
+                if hasattr(client, 'messages'):
+                    response = await asyncio.to_thread(
+                        client.messages.create,
+                        model=model,
+                        max_tokens=max_tokens,
+                        messages=messages,
+                        stream=stream
+                    )
+                else:
+                    # For older anthropic versions
+                    response = await asyncio.to_thread(
+                        client.completions.create,
+                        model=model,
+                        max_tokens_to_sample=max_tokens,
+                        prompt=f"Human: {messages[0]['content']}\n\nAssistant:",
+                        stream=stream
+                    )
                
                return response
                
@@ -120,6 +135,81 @@ class ClaudeRouter:
                    raise HTTPException(status_code=500, detail=f"All providers failed. Last error: {str(e)}")
        
        raise HTTPException(status_code=500, detail="No providers available")
+    
+    async def health_check_claude_pro(self):
+        """Check if Claude Pro is available again"""
+        # Only check if we're not currently using Claude Pro
+        if self.current_provider == "claude_pro":
+            logger.debug("Skipping health check - already using Claude Pro")
+            return
+        
+        logger.info("Running Claude Pro health check...")
+        self.last_health_check = datetime.now()
+        
+        try:
+            client = Anthropic(
+                api_key=config.claude_pro_api_key,
+                base_url=config.claude_pro_base_url
+            )
+            
+            # Send a minimal test message
+            if hasattr(client, 'messages'):
+                response = await asyncio.to_thread(
+                    client.messages.create,
+                    model=config.health_check_model,
+                    max_tokens=10,
+                    messages=[{"role": "user", "content": config.health_check_message}]
+                )
+            else:
+                # For older anthropic versions
+                response = await asyncio.to_thread(
+                    client.completions.create,
+                    model=config.health_check_model,
+                    max_tokens_to_sample=10,
+                    prompt=f"Human: {config.health_check_message}\n\nAssistant:"
+                )
+            
+            # If successful, switch back to Claude Pro
+            old_provider = self.current_provider
+            self.current_provider = "claude_pro"
+            self.health_check_failures = 0
+            
+            logger.info(f"Claude Pro health check successful! Switched from {old_provider} to claude_pro")
+            
+        except Exception as e:
+            self.health_check_failures += 1
+            error_str = str(e).lower()
+            
+            if any(indicator in error_str for indicator in ["rate_limit", "usage limit", "quota exceeded", "429", "too many requests", "limit reached"]):
+                logger.info(f"Claude Pro still rate limited: {str(e)}")
+            else:
+                logger.warning(f"Claude Pro health check failed (attempt {self.health_check_failures}): {str(e)}")
+    
+    def start_scheduler(self):
+        """Start the health check scheduler"""
+        if not config.health_check_enabled:
+            logger.info("Health check disabled in config")
+            return
+            
+        self.scheduler = AsyncIOScheduler()
+        
+        # Schedule health check using cron expression
+        self.scheduler.add_job(
+            self.health_check_claude_pro,
+            trigger=CronTrigger.from_crontab(config.health_check_cron),
+            id="claude_pro_health_check",
+            name="Claude Pro Health Check",
+            misfire_grace_time=60
+        )
+        
+        self.scheduler.start()
+        logger.info(f"Health check scheduler started with cron: {config.health_check_cron}")
+    
+    def stop_scheduler(self):
+        """Stop the health check scheduler"""
+        if self.scheduler:
+            self.scheduler.shutdown()
+            logger.info("Health check scheduler stopped")

 # Initialize router
 router = ClaudeRouter()
@@ -128,7 +218,14 @@ router = ClaudeRouter()
 async def lifespan(app: FastAPI):
    logger.info("Claude Router starting up...")
    logger.info(f"Current provider: {router.current_provider}")
+    
+    # Start health check scheduler
+    router.start_scheduler()
+    
    yield
+    
+    # Stop scheduler on shutdown
+    router.stop_scheduler()
    logger.info("Claude Router shutting down...")

 app = FastAPI(
@@ -147,9 +244,11 @@ async def health_check():
        "failover_count": router.failover_count,
        "last_failover": router.last_failover.isoformat() if router.last_failover else None,
        "providers": {
-            name: {"active": config["active"]} 
-            for name, config in router.providers.items()
-        }
+            name: {"active": provider_config["active"]} 
+            for name, provider_config in router.providers.items()
+        },
+        "last_health_check": router.last_health_check.isoformat() if router.last_health_check else None,
+        "health_check_failures": router.health_check_failures
    }

@app.post("/v1/messages")
@@ -189,8 +288,10 @@ async def create_message(request: Request):
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/v1/switch-provider")
-async def switch_provider(provider: str):
+async def switch_provider(request: Request):
    """Manually switch to a specific provider"""
+    provider = await request.json()
+    
    if provider not in router.providers:
        raise HTTPException(status_code=400, detail=f"Unknown provider: {provider}")
    
@@ -214,9 +315,24 @@ async def get_status():
        "current_provider": router.current_provider,
        "failover_count": router.failover_count,
        "last_failover": router.last_failover.isoformat() if router.last_failover else None,
+        "last_health_check": router.last_health_check.isoformat() if router.last_health_check else None,
+        "health_check_failures": router.health_check_failures,
        "providers": router.providers
    }

+@app.post("/v1/health-check")
+async def manual_health_check():
+    """Manually trigger Claude Pro health check"""
+    try:
+        await router.health_check_claude_pro()
+        return {
+            "message": "Health check completed",
+            "current_provider": router.current_provider,
+            "last_health_check": router.last_health_check.isoformat() if router.last_health_check else None
+        }
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Health check failed: {str(e)}")
+
 if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host=config.host, port=config.port)
--- a/config.py
+++ b/config.py
@@ -19,6 +19,12 @@ class Config(BaseModel):
    claude_pro_base_url: str = "https://api.anthropic.com"
    claude_api_base_url: str = "https://api.anthropic.com"
    
+    # Health check settings
+    health_check_enabled: bool = True
+    health_check_cron: str = "0-4 * * * *"  # Every hour, first 5 minutes
+    health_check_message: str = "ping"
+    health_check_model: str = "claude-3-haiku-20240307"  # Use cheapest model for checks
+    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # Load from environment or token file
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,4 +3,5 @@ uvicorn==0.24.0
 httpx==0.25.2
 pydantic==2.5.0
 anthropic==0.7.8
-python-dotenv==1.0.0
+python-dotenv==1.0.0
+apscheduler==3.10.4