2.8 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Commands
Run tests:
python -m pytest
Run a single test:
python -m pytest tests/test_service.py::test_crawl_service_bootstrap_saves_posts_without_returning_them
Run the app locally:
uvicorn app.main:app --host 0.0.0.0 --port 8000
Docker build:
docker build -t your-dockerhub-id/hufs-notice-crawler:latest .
Setup (first time):
python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
Architecture
FastAPI web service that crawls three HUFS Computer Science Department notice boards and returns only new posts since the last crawl. The state is persisted in PostgreSQL.
Request flow: n8n (scheduler) → POST /api/v1/crawl → CrawlService → HufsCrawler → PostgreSQL
Layer responsibilities:
app/crawler.py— HTTP + BeautifulSoup scraping. No DB access. Returns rawPostStubandPostDetailobjects. Handles URL encoding to user-facingsubview.do?enc=...format.app/service.py— Orchestration. Compares scrapedarticle_ids against DB to find new posts, fetches details only for new ones, persists results, handles bootstrap mode.app/main.py— FastAPI entrypoint. Two routes:GET /health,POST /api/v1/crawl. Auto-creates tables on startup via lifespan.app/models.py/app/db.py— SQLAlchemy ORM + session management.
Bootstrap mode: On first run (empty scraped_posts table), the service saves all found posts but returns new_posts: [] to prevent flooding Discord/n8n notifications with old posts. Subsequent runs return only genuinely new posts.
Three boards crawled:
| Key | Name | Board ID |
|---|---|---|
notice |
공지사항 | 1926 |
archive |
자료실 | 1927 |
jobs |
취업정보 | 1929 |
Tests
Tests use an in-memory SQLite DB (conftest.py) and a FakeCrawler stub — no real HTTP calls or PostgreSQL required.
test_api.py— endpoint shape/status tests (service is mocked)test_service.py— new-post detection logic, bootstrap mode, zero-new-posts path
CI/CD
GitHub Actions (.github/workflows/deploy.yml) triggers on push to main:
- SSH into Gitea, clone repo
- Build and push Docker image to DockerHub (tagged
latest+ optional[x.y.z]version from commit message) - Deploy via
docker compose -p nkeys-apps -f /nkeysworld/compose.apps.yml pull hufs-notice-crawler - Notify Discord via webhook
Required secrets: NKEY_SSH_PRIVATE_KEY, DOCKERHUB_USERNAME, DOCKERHUB_TOKEN, DISCORD_WEBHOOK
The app runs on an internal Docker network (nkeysworld-network) with no exposed ports — n8n calls it as http://hufs-notice-crawler:8000.