Monitoring¶
Monitor PulseStage in production to ensure reliability and performance.
Health Endpoints¶
API Health Check¶
Response:
{
"status": "ok",
"timestamp": "2025-10-15T12:00:00.000Z",
"uptime": "2d 4h 30m",
"database": {
"status": "connected",
"latency": "5ms"
},
"redis": {
"status": "connected",
"latency": "2ms"
},
"rateLimiting": {
"status": "active",
"requestsInLastHour": 1234
},
"auth": {
"mode": "production",
"strategies": ["github", "google"]
}
}
Frontend Health¶
Should return 200 OK.
Application Metrics¶
Built-in Health Dashboard¶
Visit /admin/health for real-time metrics:
- System Status: API, database, Redis connectivity
- Rate Limiting: Request counts and limits
- Auth Status: Enabled authentication modes
- Uptime: Service uptime
Access: Requires admin role
Key Metrics to Monitor¶
| Metric | Endpoint | Target |
|---|---|---|
| API Response Time | /health |
< 100ms |
| Database Latency | /health |
< 10ms |
| Redis Latency | /health |
< 5ms |
| Error Rate | Application logs | < 1% |
| Uptime | /health |
> 99.9% |
Logging¶
Application Logs¶
Docker Compose:
Production:
Log Levels¶
- ERROR: Critical issues requiring immediate attention
- WARN: Potential problems
- INFO: General application flow
- DEBUG: Detailed debugging (development only)
Important Log Patterns¶
Session Issues:
Authentication:
Database:
Redis:
Audit Logging¶
PulseStage logs all admin actions for compliance:
- User role changes
- Team member additions/removals
- Question moderation (pin, freeze, delete)
- Bulk operations
- Settings changes
Access audit logs:
# Via Admin Panel
/admin/audit
# Via API
curl -H "Authorization: Bearer $TOKEN" \
https://api.yourdomain.com/admin/audit?limit=100
Alerting¶
Set Up Alerts¶
Monitor these conditions:
- API Health Check Fails
- Check:
curl https://api.yourdomain.com/health -
Alert: Response code ≠ 200
-
High Error Rate
- Check: Application logs
-
Alert: Error count > 10/minute
-
Database Connection Lost
- Check:
/healthendpoint -
Alert:
database.status ≠ "connected" -
Redis Connection Lost
- Check:
/healthendpoint -
Alert:
redis.status ≠ "connected" -
High Response Time
- Check: API response time
- Alert: Response time > 1 second
Example: Uptime Monitoring¶
Using Uptime Robot (free):
- Add HTTP(s) Monitor
- URL:
https://api.yourdomain.com/health - Monitoring Interval: 5 minutes
- Alert Contacts: Email, SMS, Slack
Using Healthchecks.io:
Performance Monitoring¶
Database Performance¶
# Check slow queries
docker compose exec postgres psql -U postgres -d pulsestage -c "
SELECT query, calls, mean_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
"
Redis Performance¶
API Performance¶
Monitor response times for key endpoints:
- GET /questions - Should be < 200ms
- GET /teams - Should be < 100ms
- POST /questions - Should be < 300ms
- GET /health - Should be < 50ms
Resource Usage¶
Docker Stats¶
Monitor: - CPU usage: Should be < 80% - Memory usage: Should be < 80% - Network I/O: Watch for anomalies
Disk Usage¶
# Check database size
docker compose exec postgres psql -U postgres -c "
SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database;
"
# Check disk space
df -h
Backup Verification¶
Regularly verify backups work:
# Test database backup
docker compose exec postgres pg_dump -U postgres pulsestage > test-backup.sql
# Test restore to separate database
docker compose exec postgres createdb -U postgres test_restore
docker compose exec -T postgres psql -U postgres test_restore < test-backup.sql
Security Monitoring¶
Failed Login Attempts¶
Monitor for brute force attacks:
Unusual Activity¶
Watch for: - Bulk operations outside business hours - Repeated 401/403 errors - Unusual rate limit hits
Rate Limiting¶
Troubleshooting¶
High CPU Usage¶
- Check slow queries
- Review recent changes
- Check for infinite loops in logs
High Memory Usage¶
- Check for memory leaks
- Review Redis memory usage
- Check database connections
Slow Response Times¶
- Check database performance
- Check Redis performance
- Review recent deployments
- Check resource utilization
See Also¶
- Production Runbook - Operational procedures
- Troubleshooting Guide - Common issues
- Health Dashboard - Admin interface