
Long-form video is hard to skim. Users want accurate takeaways (key points, timestamps, action items, terms) without watching the entire video. The summarizer should:
Constraint: Summaries must remain faithful; hallucinations should be minimized by grounding in transcript and metadata.
tldr: 5–7 bullets, ≤120 words.detailed_summary: sections with headings mirroring topics/chapters.timeline: list of {start_time, end_time, title, key_points[]}.qa: array of {question, answer, supporting_timestamps[]}.glossary: {term, definition, first_mention_time}.th this.