Files
news-summary-bot/app/transcript.py
sm4640 aaf5bd8d05
All checks were successful
news-summary-bot-cicd / build_push_deploy (push) Successful in 9m6s
Fix: [2.0.2] yt-dlp → youtube-transcript-api로 교체
OCI 서버에서 YouTube 봇 감지로 yt-dlp 차단됨.
자막 전용 라이브러리로 교체하여 클라우드 IP 환경에서도 동작하도록 수정.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 14:23:52 +09:00

20 lines
652 B
Python

from youtube_transcript_api import YouTubeTranscriptApi
def extract_video_id(url: str) -> str:
"""YouTube URL에서 video ID 추출."""
if "youtu.be/" in url:
return url.split("youtu.be/")[1].split("?")[0]
if "v=" in url:
return url.split("v=")[1].split("&")[0]
raise ValueError(f"유효하지 않은 YouTube URL: {url}")
def fetch_transcript(video_id: str) -> str:
"""YouTube 자막을 텍스트로 추출."""
ytt_api = YouTubeTranscriptApi()
transcript = ytt_api.fetch(video_id, languages=["ko", "en"])
texts = [entry.text for entry in transcript if entry.text.strip()]
return " ".join(texts)