All checks were successful
hufs-notice-crawler-cicd / build_push_deploy (push) Successful in 5m50s
71 lines
2.8 KiB
Markdown
71 lines
2.8 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Commands
|
|
|
|
**Run tests:**
|
|
```bash
|
|
python -m pytest
|
|
```
|
|
|
|
**Run a single test:**
|
|
```bash
|
|
python -m pytest tests/test_service.py::test_crawl_service_bootstrap_saves_posts_without_returning_them
|
|
```
|
|
|
|
**Run the app locally:**
|
|
```bash
|
|
uvicorn app.main:app --host 0.0.0.0 --port 8000
|
|
```
|
|
|
|
**Docker build:**
|
|
```bash
|
|
docker build -t your-dockerhub-id/hufs-notice-crawler:latest .
|
|
```
|
|
|
|
**Setup (first time):**
|
|
```bash
|
|
python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
|
|
```
|
|
|
|
## Architecture
|
|
|
|
FastAPI web service that crawls three HUFS Computer Science Department notice boards and returns only new posts since the last crawl. The state is persisted in PostgreSQL.
|
|
|
|
**Request flow:** n8n (scheduler) → `POST /api/v1/crawl` → `CrawlService` → `HufsCrawler` → PostgreSQL
|
|
|
|
**Layer responsibilities:**
|
|
- `app/crawler.py` — HTTP + BeautifulSoup scraping. No DB access. Returns raw `PostStub` and `PostDetail` objects. Handles URL encoding to user-facing `subview.do?enc=...` format.
|
|
- `app/service.py` — Orchestration. Compares scraped `article_id`s against DB to find new posts, fetches details only for new ones, persists results, handles bootstrap mode.
|
|
- `app/main.py` — FastAPI entrypoint. Two routes: `GET /health`, `POST /api/v1/crawl`. Auto-creates tables on startup via lifespan.
|
|
- `app/models.py` / `app/db.py` — SQLAlchemy ORM + session management.
|
|
|
|
**Bootstrap mode:** On first run (empty `scraped_posts` table), the service saves all found posts but returns `new_posts: []` to prevent flooding Discord/n8n notifications with old posts. Subsequent runs return only genuinely new posts.
|
|
|
|
**Three boards crawled:**
|
|
| Key | Name | Board ID |
|
|
|-----|------|----------|
|
|
| `notice` | 공지사항 | 1926 |
|
|
| `archive` | 자료실 | 1927 |
|
|
| `jobs` | 취업정보 | 1929 |
|
|
|
|
## Tests
|
|
|
|
Tests use an in-memory SQLite DB (`conftest.py`) and a `FakeCrawler` stub — no real HTTP calls or PostgreSQL required.
|
|
|
|
- `test_api.py` — endpoint shape/status tests (service is mocked)
|
|
- `test_service.py` — new-post detection logic, bootstrap mode, zero-new-posts path
|
|
|
|
## CI/CD
|
|
|
|
GitHub Actions (`.github/workflows/deploy.yml`) triggers on push to `main`:
|
|
1. SSH into Gitea, clone repo
|
|
2. Build and push Docker image to DockerHub (tagged `latest` + optional `[x.y.z]` version from commit message)
|
|
3. Deploy via `docker compose -p nkeys-apps -f /nkeysworld/compose.apps.yml pull hufs-notice-crawler`
|
|
4. Notify Discord via webhook
|
|
|
|
Required secrets: `NKEY_SSH_PRIVATE_KEY`, `DOCKERHUB_USERNAME`, `DOCKERHUB_TOKEN`, `DISCORD_WEBHOOK`
|
|
|
|
The app runs on an internal Docker network (`nkeysworld-network`) with no exposed ports — n8n calls it as `http://hufs-notice-crawler:8000`.
|