commit 9db163745b90059fabacdb017d987981ef42b5c5 Author: sm4640 Date: Wed Jan 21 16:45:55 2026 +0900 Docs: [main] Organization Readme 작성 diff --git a/README.md b/README.md new file mode 100644 index 0000000..fd48750 --- /dev/null +++ b/README.md @@ -0,0 +1,101 @@ +# 2025-capstone +작성: AI / 수정: nkey + +라즈베리파이 k3s 클러스터(마스터/워커) + 모니터링/알람 + 자동 조치(승인/검증 포함) 데모를 위한 조직 레포입니다. +(Tailscale VPN 환경에서 k3s, Oracle 서버(n8n/WMS/DB), GPU 서버(Ollama/llama3)를 연동) + +## What’s Inside +- **k3s 클러스터** + - **master node**: 클러스터 관리 + 매니페스트 적용(데모/모니터링 K8s 리소스) + - **worker service node**: 서비스 실행 노드(디스크 채우기 데모 Pod exec) + - **worker monitor node**: 모니터링 스택(Prometheus/Alertmanager/Loki/Grafana + auto action module) +- **Oracle Server** + - Web Monitoring Service(WMS) + - n8n hosting + - PostgreSQL +- **GPU Server** + - Ollama + llama3 (솔루션 생성/검증) + +## Architecture + +### High-level (요약) +![high-level-architecture](docs/assets/system_architecture.png) + +### End-to-end Flow (상세) +![end-to-end-flow](docs/assets/data_flow_architecture.png) + +## Repositories +- `rpi-master-node` + - k3s 매니페스트(예: `k3s-manifests/disk-fill-demo.yaml`, `k3s-manifests/aam-deploy.yaml`) + - K8s 기반 모니터링 스택 매니페스트(`k3s-monitoring/*`) + - kube-state-metrics 매니페스트(`kube-state-metrics/*`) +- `rpi-worker-monitor-node` + - `monitoring/docker-compose.yml` 기반 Prometheus/Alertmanager/Loki/Grafana 구동(워커 모니터 노드) +- `rpi-worker-service-node` + - 서비스 워커 노드(레포가 비어있을 수 있음) + - **중요:** 디스크 채우기 데모는 **master 노드 레포의 `disk-fill-demo.yaml`을 배포**한 뒤, **service 노드에서 Pod에 exec**하여 스크립트를 실행합니다. + +## Quickstart (Demo: DiskAlmostFull 알람 → 자동 조치 → 검증) +### 1) (Master) 디스크 채우기 데모 배포 +```bash +# master 노드에서 +git clone https://nkeystudy.site/gitea/2025-capstone/rpi-master-node.git +cd rpi-master-node + +kubectl apply -f k3s-manifests/disk-fill-demo.yaml +kubectl get pod -n alert-service -l app=disk-fill-demo +kubectl get svc -n alert-service +``` + +### 2) (Master -> Service Worker) Pod exec로 디스크 채우기 / 정리 실행 +> 아래 명령은 **Master 노드**에서 실행되는 시나리오를 기준으로 합니다. +```bash +# Pod 이름 확인 +POD=$(sudo kubectl get pod -n alert-service -l app=disk-fill-demo -o jsonpath='{.items[0].metadata.name}') + +# 절대경로로 실행 +sudo kubectl exec -n alert-service "$POD" -- /usr/local/bin/fill_disk_safe.sh /tmp/disk-fill-demo 90 1024 + +# 상대경로로 실행 +sudo kubectl exec -n alert-service "$POD" -- fill_disk_safe.sh /tmp/disk-fill-demo 90 1024 + +# 수동 해결(cleanup 셸 스크립트 실행) +sudo kubectl exec -n alert-service "$POD" -- cleanup_disk.sh /tmp/disk-fill-demo +``` + +![1_cause_disk_full_demo](docs/assets/1_cause_disk_full_demo.gif) + + + +### 3) (UI) 웹 페이지에서 확인 +- disk full alert prometheus로 확인(2_check_disk_full_demo_result_in_prometheus) +![2_check_disk_full_demo_result_in_prometheus](docs/assets/2_check_disk_full_demo_result_in_prometheus.gif) + +- 해결책 생성 과정 워크플로우 동작 확인(3_check_make_solution_n8n_workflow) +![3_check_make_solution_n8n_workflow](docs/assets/3_check_make_solution_n8n_workflow.gif) + +- 해결책 거절 및 새로운 해결책 확인(4_reject_solution_and_check_other_solution) +![4_reject_solution_and_check_other_solution](docs/assets/4_reject_solution_and_check_other_solution.gif) + +- 해결책 검증(5_verify_solution) +![5_verify_solution](docs/assets/5_verify_solution.gif) + +- 해결책 적용 및 결과 확인(6_apply_solution_and_check_result) +![6_apply_solution_and_check_result](docs/assets/1_cause_d6_apply_solution_and_check_resultisk_full_demo.gif) + + +### 4) (Logs) 자동 조치 모듈 로그 확인 +- SSH로 대상 호스트에 접속해 precheck/action 수행, 결과를 Alert API로 전달하는 흐름 확인 + - check_apply_solution_in_monitor_node(해결책 적용 수행 노드): + ![7_check_apply_solution_in_monitor_node](docs/assets/7_check_apply_solution_in_monitor_node.gif) + + - check_apply_solution_in_service_node(해결책 적용 대상 노드): + ![8_check_apply_solution_in_service_node](docs/assets/8_check_apply_solution_in_service_node.gif) + + +## Docs +- 각 리포지토리별 자세한 실행/구성은 해당 리포의 README를 참고하세요. + - `rpi-master-node/README.md` + - `rpi-worker-monitor-node/README.md` + - `rpi-worker-service-node/README.md` + diff --git a/docs/assets/1_cause_disk_full_demo.gif b/docs/assets/1_cause_disk_full_demo.gif new file mode 100644 index 0000000..d3a071f Binary files /dev/null and b/docs/assets/1_cause_disk_full_demo.gif differ diff --git a/docs/assets/2_check_disk_full_demo_result_in_prometheus.gif b/docs/assets/2_check_disk_full_demo_result_in_prometheus.gif new file mode 100644 index 0000000..5c9dbea Binary files /dev/null and b/docs/assets/2_check_disk_full_demo_result_in_prometheus.gif differ diff --git a/docs/assets/3_check_make_solution_n8n_workflow.gif b/docs/assets/3_check_make_solution_n8n_workflow.gif new file mode 100644 index 0000000..a2cd6b4 Binary files /dev/null and b/docs/assets/3_check_make_solution_n8n_workflow.gif differ diff --git a/docs/assets/4_reject_solution_and_check_other_solution.gif b/docs/assets/4_reject_solution_and_check_other_solution.gif new file mode 100644 index 0000000..8f016ec Binary files /dev/null and b/docs/assets/4_reject_solution_and_check_other_solution.gif differ diff --git a/docs/assets/5_verify_solution.gif b/docs/assets/5_verify_solution.gif new file mode 100644 index 0000000..b1cf585 Binary files /dev/null and b/docs/assets/5_verify_solution.gif differ diff --git a/docs/assets/6_apply_solution_and_check_result.gif b/docs/assets/6_apply_solution_and_check_result.gif new file mode 100644 index 0000000..5caec1f Binary files /dev/null and b/docs/assets/6_apply_solution_and_check_result.gif differ diff --git a/docs/assets/7_check_apply_solution_in_monitor_node.gif b/docs/assets/7_check_apply_solution_in_monitor_node.gif new file mode 100644 index 0000000..8155746 Binary files /dev/null and b/docs/assets/7_check_apply_solution_in_monitor_node.gif differ diff --git a/docs/assets/8_check_apply_solution_in_service_node.gif b/docs/assets/8_check_apply_solution_in_service_node.gif new file mode 100644 index 0000000..25203b5 Binary files /dev/null and b/docs/assets/8_check_apply_solution_in_service_node.gif differ diff --git a/docs/assets/data_flow_architecture.png b/docs/assets/data_flow_architecture.png new file mode 100644 index 0000000..5b8d4f0 Binary files /dev/null and b/docs/assets/data_flow_architecture.png differ diff --git a/docs/assets/system_architecture.png b/docs/assets/system_architecture.png new file mode 100644 index 0000000..670a395 Binary files /dev/null and b/docs/assets/system_architecture.png differ