Files
X-Agents/account/admin/skills/bilibili-watcher/SKILL.md
2026-03-11 16:26:22 +08:00

485 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: openakita/skills@bilibili-watcher
description: Extract subtitles and transcripts from Bilibili and YouTube videos. Use when the user wants to get subtitles from B站 (Bilibili) or YouTube, extract Chinese/Japanese video transcripts, watch member-only Bilibili content, or perform Q&A on video content. Supports dual-platform subtitle extraction with yt-dlp.
license: MIT
metadata:
author: openakita
version: "1.0.0"
based_on: openclaw/skills/bilibili-youtube-watcher
---
# Bilibili & YouTube Watcher — 双平台字幕提取
从 BilibiliB站和 YouTube 视频中提取字幕/转录文本,支持多语言、会员视频和内容问答。
## When to Use This Skill
- User shares a Bilibili link and wants subtitles or a summary
- User shares a YouTube link and wants transcript extraction via yt-dlp
- User needs subtitles from member-only (大会员) Bilibili videos
- User wants to search or query content within a video's transcript
- User wants to compare subtitles across languages
- User needs to extract subtitles from videos with hardcoded subs (OCR not included — only soft subs)
- User wants batch subtitle extraction from a playlist or series
## Prerequisites
### Install yt-dlp
yt-dlp is a feature-rich command-line audio/video downloader that also extracts subtitles.
**Via pip (recommended):**
```bash
pip install yt-dlp
```
**Via package manager:**
```bash
# macOS
brew install yt-dlp
# Ubuntu/Debian
sudo apt install yt-dlp
# Windows (scoop)
scoop install yt-dlp
```
**Verify installation:**
```bash
yt-dlp --version
```
### Optional: ffmpeg
Some subtitle formats require ffmpeg for conversion:
```bash
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Windows (scoop)
scoop install ffmpeg
```
### Cookie Setup for Bilibili Member Videos
Bilibili member-only (大会员) content requires authentication cookies.
**Method 1: Export cookies from browser**
Install a browser extension like "Get cookies.txt LOCALLY" and export cookies for `bilibili.com`:
```bash
# Use the exported cookies file
yt-dlp --cookies cookies.txt "https://www.bilibili.com/video/BV..."
```
**Method 2: Use browser cookies directly**
```bash
# yt-dlp can read cookies from your browser
yt-dlp --cookies-from-browser chrome "https://www.bilibili.com/video/BV..."
yt-dlp --cookies-from-browser firefox "https://www.bilibili.com/video/BV..."
yt-dlp --cookies-from-browser edge "https://www.bilibili.com/video/BV..."
```
> **Security note**: Cookie files contain your login session. Do not share them or commit them to version control.
---
## Instructions
### Step 1: Identify the Platform and URL Format
#### Bilibili URL Formats
| Format | Example |
|---|---|
| Standard BV | `https://www.bilibili.com/video/BV1xx411c7mD` |
| With page | `https://www.bilibili.com/video/BV1xx411c7mD?p=2` |
| Short link | `https://b23.tv/aBcDeFg` |
| Bangumi | `https://www.bilibili.com/bangumi/play/ep12345` |
| Old AV format | `https://www.bilibili.com/video/av12345` |
| Mobile | `https://m.bilibili.com/video/BV1xx411c7mD` |
#### YouTube URL Formats
| Format | Example |
|---|---|
| Standard | `https://www.youtube.com/watch?v=VIDEO_ID` |
| Short | `https://youtu.be/VIDEO_ID` |
| Embed | `https://www.youtube.com/embed/VIDEO_ID` |
| Shorts | `https://www.youtube.com/shorts/VIDEO_ID` |
### Step 2: Extract Subtitles
#### Bilibili Subtitle Extraction
**List available subtitles:**
```bash
yt-dlp --list-subs "https://www.bilibili.com/video/BV..."
```
**Download subtitles only (no video):**
```bash
# Download all available subtitles
yt-dlp --write-sub --skip-download "https://www.bilibili.com/video/BV..."
# Download auto-generated subtitles as well
yt-dlp --write-sub --write-auto-sub --skip-download "https://www.bilibili.com/video/BV..."
# Download specific language (Chinese)
yt-dlp --write-sub --sub-lang zh-CN --skip-download "https://www.bilibili.com/video/BV..."
# Convert to SRT format
yt-dlp --write-sub --sub-lang zh-CN --convert-subs srt --skip-download "https://www.bilibili.com/video/BV..."
```
**Member-only videos (require cookies):**
```bash
yt-dlp --cookies-from-browser chrome --write-sub --skip-download "https://www.bilibili.com/video/BV..."
```
#### YouTube Subtitle Extraction
**List available subtitles:**
```bash
yt-dlp --list-subs "https://www.youtube.com/watch?v=VIDEO_ID"
```
**Download subtitles:**
```bash
# English subtitles
yt-dlp --write-sub --sub-lang en --skip-download "https://www.youtube.com/watch?v=VIDEO_ID"
# Auto-generated subtitles
yt-dlp --write-auto-sub --sub-lang en --skip-download "https://www.youtube.com/watch?v=VIDEO_ID"
# Multiple languages
yt-dlp --write-sub --sub-lang "en,zh-Hans,ja" --skip-download "URL"
# Convert to plain text (SRT format)
yt-dlp --write-auto-sub --sub-lang en --convert-subs srt --skip-download "URL"
```
### Step 3: Parse Subtitle Files
yt-dlp downloads subtitles in various formats. Here's how to parse common ones:
```python
import re
import json
def parse_srt(filepath: str) -> list[dict]:
"""Parse SRT subtitle file into structured segments."""
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
segments = []
blocks = content.strip().split('\n\n')
for block in blocks:
lines = block.strip().split('\n')
if len(lines) >= 3:
time_match = re.match(
r'(\d{2}):(\d{2}):(\d{2}),(\d{3}) --> (\d{2}):(\d{2}):(\d{2}),(\d{3})',
lines[1]
)
if time_match:
h, m, s = int(time_match[1]), int(time_match[2]), int(time_match[3])
start_sec = h * 3600 + m * 60 + s
text = ' '.join(lines[2:]).strip()
# Remove HTML tags from auto-generated subs
text = re.sub(r'<[^>]+>', '', text)
if text:
segments.append({
'start': start_sec,
'text': text
})
return segments
def parse_json3(filepath: str) -> list[dict]:
"""Parse YouTube JSON3 subtitle format."""
with open(filepath, 'r', encoding='utf-8') as f:
data = json.load(f)
segments = []
for event in data.get('events', []):
start_ms = event.get('tStartMs', 0)
segs = event.get('segs', [])
text = ''.join(s.get('utf8', '') for s in segs).strip()
if text and text != '\n':
segments.append({
'start': start_ms / 1000,
'text': text
})
return segments
def segments_to_text(segments: list[dict]) -> str:
"""Convert segments to plain text with timestamps."""
lines = []
for seg in segments:
minutes = int(seg['start'] // 60)
seconds = int(seg['start'] % 60)
lines.append(f"[{minutes:02d}:{seconds:02d}] {seg['text']}")
return '\n'.join(lines)
```
### Step 4: Summarize or Query the Content
Once you have the transcript text, generate summaries or answer questions about the content.
**For summarization**, combine all subtitle text and apply a structured prompt:
```
Based on the following video transcript, provide:
1. **概要** (Executive Summary): 2-3 sentences in the video's language
2. **要点** (Key Points): Bulleted list with timestamps [MM:SS]
3. **详细笔记** (Detailed Notes): Organized by topic sections
4. **问答** (Q&A): Answer any specific questions the user has
Transcript:
{full_transcript_text}
```
**For Q&A**, search the transcript for relevant segments first, then answer based on context.
---
## Workflows
### Workflow 1: Quick Bilibili Subtitle Extraction
User says: "提取这个B站视频的字幕: https://www.bilibili.com/video/BV..."
1. Run `yt-dlp --list-subs` to check available subtitles
2. Download Chinese subtitles: `yt-dlp --write-sub --sub-lang zh-CN --convert-subs srt --skip-download URL`
3. Parse the SRT file
4. Present the clean transcript to the user
### Workflow 2: Bilibili Member Video
User says: "这是大会员视频,帮我提取字幕"
1. Inform user that cookies are needed
2. Use `--cookies-from-browser chrome` (or user's preferred browser)
3. Extract subtitles with authentication
4. If cookies fail, guide user to export cookies.txt manually
### Workflow 3: YouTube Multi-Language
User says: "Get both English and Chinese subtitles from this YouTube video"
1. List available subtitle languages
2. Download both `en` and `zh-Hans` subtitles
3. Parse both files
4. Present side-by-side or merged view
### Workflow 4: Video Content Q&A
User says: "视频里有没有提到关于 X 的内容?"
1. Extract full transcript
2. Search for keywords related to X
3. Return matching segments with timestamps
4. Provide a concise answer based on the matching content
### Workflow 5: Batch Playlist Extraction
User provides a playlist or series URL:
1. Use `yt-dlp --flat-playlist` to list all videos
2. Extract subtitles from each video sequentially
3. Save each transcript as a separate file
4. Generate a combined index with video titles and file paths
### Workflow 6: Bilibili Bangumi (番剧) Subtitles
User shares a bangumi URL:
1. Bangumi often has official multi-language subtitles
2. Use `--list-subs` to show all available languages
3. Download preferred language(s)
4. Note: Some bangumi require 大会员 cookies
---
## Output Format
### Transcript Output
```markdown
# 📝 Subtitles: [Video Title]
**Platform**: Bilibili / YouTube
**Language**: 中文 (zh-CN)
**Duration**: ~XX minutes
**Subtitle Type**: Manual / Auto-generated
---
[00:00] 大家好,欢迎来到今天的视频
[00:05] 今天我们要讨论的话题是...
[00:12] 首先我们来看一下背景
...
```
### Summary Output
```markdown
# 📋 Video Summary: [Title]
## 概要
[2-3 sentence summary in the video's language]
## 要点
- **[00:00]** 开场介绍和主题说明
- **[02:15]** 第一个核心观点
- **[08:30]** 关键论据和数据
- **[15:00]** 实际演示
- **[22:45]** 总结与下一步
## 详细笔记
### 第一部分: [主题] (00:00 - 05:30)
[详细内容笔记]
### 第二部分: [主题] (05:30 - 12:00)
[详细内容笔记]
```
---
## Common Pitfalls
### 1. Bilibili Geo-Restrictions
**Problem**: Some Bilibili content is restricted to mainland China.
**Solutions**:
- Use a proxy or VPN with a Chinese IP: `yt-dlp --proxy socks5://127.0.0.1:1080 URL`
- Set the `--geo-bypass` flag: `yt-dlp --geo-bypass URL`
- For persistent issues, use `--geo-bypass-country CN`
```bash
yt-dlp --geo-bypass-country CN --write-sub --skip-download "URL"
```
### 2. Member-Only Content Without Cookies
**Problem**: `yt-dlp` returns an error or empty subtitles for 大会员 videos.
**Solution**: Always check if the video requires 大会员 access. If so, cookies are mandatory:
```bash
# If this fails:
yt-dlp --list-subs "URL"
# ERROR: This video requires premium membership
# Try with cookies:
yt-dlp --cookies-from-browser chrome --list-subs "URL"
```
If browser cookie extraction fails (common on Linux), export cookies manually to a `cookies.txt` file.
### 3. No Subtitles Available
**Problem**: Many Bilibili videos, especially older ones or user-generated content, have no subtitles at all.
**Solution**: Inform the user clearly. Unlike YouTube, Bilibili does not always generate auto-subtitles. The video may only have hardcoded (burned-in) subtitles which require OCR — beyond the scope of this skill.
### 4. yt-dlp Version Issues
**Problem**: Bilibili frequently changes its API, causing older yt-dlp versions to fail.
**Solution**: Always ensure yt-dlp is up to date:
```bash
pip install -U yt-dlp
# or
yt-dlp -U
```
### 5. Subtitle Format Inconsistencies
**Problem**: Different videos return subtitles in different formats (SRT, VTT, JSON3, ASS).
**Solution**: Use `--convert-subs srt` to normalize all subtitles to SRT format:
```bash
yt-dlp --write-sub --convert-subs srt --skip-download "URL"
```
### 6. Rate Limiting on Bilibili
**Problem**: Rapid successive requests may get temporarily blocked.
**Solutions**:
- Add delays between batch requests: `--sleep-requests 2`
- Use `--sleep-interval 5` for playlists
- Limit concurrent downloads: `--max-downloads 10`
### 7. Short Link Resolution
**Problem**: Bilibili short links (`b23.tv`) may not resolve properly.
**Solution**: yt-dlp handles most redirects automatically, but if it fails:
```bash
# Manually resolve the short link first
curl -sI "https://b23.tv/aBcDeFg" | grep -i location
```
---
## Multi-Language Subtitle Support
### Common Language Codes
| Platform | Language | Code |
|---|---|---|
| Bilibili | 中文 | `zh-CN`, `zh` |
| Bilibili | English | `en` |
| Bilibili | 日本語 | `ja` |
| YouTube | English | `en` |
| YouTube | 中文(简体) | `zh-Hans` |
| YouTube | 中文(繁体) | `zh-Hant` |
| YouTube | 日本語 | `ja` |
| YouTube | 한국어 | `ko` |
### Checking Available Languages
```bash
# Bilibili
yt-dlp --list-subs "https://www.bilibili.com/video/BV..."
# YouTube
yt-dlp --list-subs "https://www.youtube.com/watch?v=..."
```
The output shows both manual and auto-generated subtitle tracks with their language codes.
---
## Platform Comparison
| Feature | Bilibili | YouTube |
|---|---|---|
| Auto-generated subtitles | Rare | Common |
| Manual subtitles | Common (CC) | Common |
| Multi-language subs | Some (bangumi) | Many |
| Cookie auth needed | 大会员 content | Age-restricted |
| Geo-restrictions | Some content CN-only | Varies |
| Subtitle formats | SRT, JSON | SRT, VTT, JSON3 |
| Playlist support | Yes (multi-page) | Yes |
| Rate limiting | Moderate | Moderate |