Files
hufs-notice-crawler/CLAUDE.md
sm4640 aa6c3f8e37
All checks were successful
hufs-notice-crawler-cicd / build_push_deploy (push) Successful in 5m50s
Feat: [main] cicd 할 때 md 파일 변화는 제외하기
2026-03-26 00:38:52 +09:00

2.8 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

Run tests:

python -m pytest

Run a single test:

python -m pytest tests/test_service.py::test_crawl_service_bootstrap_saves_posts_without_returning_them

Run the app locally:

uvicorn app.main:app --host 0.0.0.0 --port 8000

Docker build:

docker build -t your-dockerhub-id/hufs-notice-crawler:latest .

Setup (first time):

python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt

Architecture

FastAPI web service that crawls three HUFS Computer Science Department notice boards and returns only new posts since the last crawl. The state is persisted in PostgreSQL.

Request flow: n8n (scheduler) → POST /api/v1/crawlCrawlServiceHufsCrawler → PostgreSQL

Layer responsibilities:

  • app/crawler.py — HTTP + BeautifulSoup scraping. No DB access. Returns raw PostStub and PostDetail objects. Handles URL encoding to user-facing subview.do?enc=... format.
  • app/service.py — Orchestration. Compares scraped article_ids against DB to find new posts, fetches details only for new ones, persists results, handles bootstrap mode.
  • app/main.py — FastAPI entrypoint. Two routes: GET /health, POST /api/v1/crawl. Auto-creates tables on startup via lifespan.
  • app/models.py / app/db.py — SQLAlchemy ORM + session management.

Bootstrap mode: On first run (empty scraped_posts table), the service saves all found posts but returns new_posts: [] to prevent flooding Discord/n8n notifications with old posts. Subsequent runs return only genuinely new posts.

Three boards crawled:

Key Name Board ID
notice 공지사항 1926
archive 자료실 1927
jobs 취업정보 1929

Tests

Tests use an in-memory SQLite DB (conftest.py) and a FakeCrawler stub — no real HTTP calls or PostgreSQL required.

  • test_api.py — endpoint shape/status tests (service is mocked)
  • test_service.py — new-post detection logic, bootstrap mode, zero-new-posts path

CI/CD

GitHub Actions (.github/workflows/deploy.yml) triggers on push to main:

  1. SSH into Gitea, clone repo
  2. Build and push Docker image to DockerHub (tagged latest + optional [x.y.z] version from commit message)
  3. Deploy via docker compose -p nkeys-apps -f /nkeysworld/compose.apps.yml pull hufs-notice-crawler
  4. Notify Discord via webhook

Required secrets: NKEY_SSH_PRIVATE_KEY, DOCKERHUB_USERNAME, DOCKERHUB_TOKEN, DISCORD_WEBHOOK

The app runs on an internal Docker network (nkeysworld-network) with no exposed ports — n8n calls it as http://hufs-notice-crawler:8000.