Feat: [main] cicd 할 때 md 파일 변화는 제외하기
All checks were successful
hufs-notice-crawler-cicd / build_push_deploy (push) Successful in 5m50s

This commit is contained in:
sm4640
2026-03-26 00:38:52 +09:00
parent ca460453af
commit aa6c3f8e37
2 changed files with 72 additions and 0 deletions

70
CLAUDE.md Normal file
View File

@@ -0,0 +1,70 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Commands
**Run tests:**
```bash
python -m pytest
```
**Run a single test:**
```bash
python -m pytest tests/test_service.py::test_crawl_service_bootstrap_saves_posts_without_returning_them
```
**Run the app locally:**
```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000
```
**Docker build:**
```bash
docker build -t your-dockerhub-id/hufs-notice-crawler:latest .
```
**Setup (first time):**
```bash
python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
```
## Architecture
FastAPI web service that crawls three HUFS Computer Science Department notice boards and returns only new posts since the last crawl. The state is persisted in PostgreSQL.
**Request flow:** n8n (scheduler) → `POST /api/v1/crawl``CrawlService``HufsCrawler` → PostgreSQL
**Layer responsibilities:**
- `app/crawler.py` — HTTP + BeautifulSoup scraping. No DB access. Returns raw `PostStub` and `PostDetail` objects. Handles URL encoding to user-facing `subview.do?enc=...` format.
- `app/service.py` — Orchestration. Compares scraped `article_id`s against DB to find new posts, fetches details only for new ones, persists results, handles bootstrap mode.
- `app/main.py` — FastAPI entrypoint. Two routes: `GET /health`, `POST /api/v1/crawl`. Auto-creates tables on startup via lifespan.
- `app/models.py` / `app/db.py` — SQLAlchemy ORM + session management.
**Bootstrap mode:** On first run (empty `scraped_posts` table), the service saves all found posts but returns `new_posts: []` to prevent flooding Discord/n8n notifications with old posts. Subsequent runs return only genuinely new posts.
**Three boards crawled:**
| Key | Name | Board ID |
|-----|------|----------|
| `notice` | 공지사항 | 1926 |
| `archive` | 자료실 | 1927 |
| `jobs` | 취업정보 | 1929 |
## Tests
Tests use an in-memory SQLite DB (`conftest.py`) and a `FakeCrawler` stub — no real HTTP calls or PostgreSQL required.
- `test_api.py` — endpoint shape/status tests (service is mocked)
- `test_service.py` — new-post detection logic, bootstrap mode, zero-new-posts path
## CI/CD
GitHub Actions (`.github/workflows/deploy.yml`) triggers on push to `main`:
1. SSH into Gitea, clone repo
2. Build and push Docker image to DockerHub (tagged `latest` + optional `[x.y.z]` version from commit message)
3. Deploy via `docker compose -p nkeys-apps -f /nkeysworld/compose.apps.yml pull hufs-notice-crawler`
4. Notify Discord via webhook
Required secrets: `NKEY_SSH_PRIVATE_KEY`, `DOCKERHUB_USERNAME`, `DOCKERHUB_TOKEN`, `DISCORD_WEBHOOK`
The app runs on an internal Docker network (`nkeysworld-network`) with no exposed ports — n8n calls it as `http://hufs-notice-crawler:8000`.