Complete 7-step walkthrough for extracting vocabulary from YouTube videos. From transcript extraction through deployment, with detailed explanations and troubleshooting.
Before running any Python-based transcript extraction commands, you MUST:
conda-bash alias to ensure python commands work in scriptsbase environment will be activated. Now you can run the conda commands, then you can activate ags-dictionary environmentActivate conda bash profile:
1source ~/miniconda3/etc/profile.d/conda.sh
Activate the Python environment:
1conda activate ags-dictionary
Then run your commands:
1python -m youtube_transcript_api VIDEO_ID --format text
OR use the automated script (which handles this for you):
1./scripts/dictionary/dict-step1-transcript.sh VIDEO_URL TOPIC_NAME
Easy Way - Using Automated Script:
1# From project root
2./scripts/dictionary/dict-step1-transcript.sh "https://www.youtube.com/watch?v=9M_dq_0ljsc" capitalism
What it does:
.prompts/dictionary/transcript/TOPIC-transcript.txtManual Way (if you prefer):
1# 1. Activate conda
2source ~/miniconda3/etc/profile.d/conda.sh
3conda activate ags-dictionary
4
5# 2. Extract transcript
6python -m youtube_transcript_api VIDEO_ID --format text > .prompts/dictionary/transcript/TOPIC-transcript.txt
7
8# 3. Check word count
9wc -w .prompts/dictionary/transcript/TOPIC-transcript.txt
Now that you have the transcript, use GitHub Copilot to extract vocabulary.
Open Copilot Chat (Ctrl+Shift+I or Cmd+Shift+I)
Use this prompt (copy-paste ready):
1I need help extracting vocabulary from a video transcript for my English learning dictionary.
2
3**Task**: Analyze this video transcript about "capitalism and economic systems" and extract 25 challenging English words suitable for an advanced ESL learner (first language: Urdu).
4
5**Selection Criteria**:
6- Difficulty level: 6-9 out of 10
7- Important for understanding economics/politics/systems
8- Not common everyday words
9- Academic or domain-specific terms
10- Useful for intellectual discussions
11
12**For each word, provide**:
131. word: the English word (lowercase)
142. part_of_speech: (noun, verb, adjective, adverb, phrase, technical-term, etc.)
153. urdu_meaning: Urdu translation in Urdu script
164. example_en: A clear example sentence from the transcript or similar context
175. example_ur: Natural, contextual Urdu translation of the example
186. additional_example_ur: (optional) Another Urdu example showing different usage
19
20**Output Format**: Valid YAML array only, no markdown formatting, no explanations.
21
22**Example Entry**:
23```yaml
24- word: accumulate
25 part_of_speech: verb
26 urdu_meaning: جمع کرنا، اکٹھا کرنا
27 example_en: Capitalism encourages us to accumulate wealth and resources.
28 example_ur: سرمایہ داری ہمیں دولت اور وسائل جمع کرنے کی ترغیب دیتی ہے۔
29 additional_example_ur: وقت کے ساتھ ساتھ دولت جمع ہوتی جاتی ہے۔
Transcript:
Please extract the vocabulary now in YAML format.
[OPEN FILE...] with the actual transcript contentCreate the data directory and file:
1# Create directory
2mkdir -p data/dictionary/capitalism
3
4# Save YAML (paste Copilot output)
5cat > data/dictionary/capitalism/vocabulary.yaml
6# Paste the YAML here, then press Ctrl+D
Or use VS Code:
data/dictionary/capitalism/vocabulary.yaml1# Check YAML is valid
2python -c "import yaml; print('✅ Valid YAML!'); print(f'Entries: {len(yaml.safe_load(open(\"data/dictionary/capitalism/vocabulary.yaml\")))}')"
1# Create page from template
2hugo new content/my_dictionary/capitalism/index.md --kind dictionary
Then edit the file to:
13. Update shortcode reference: {{/< vocabulary-accordion "dictionary.capitalism.vocabulary">}}
1# Start dev server
2npm run dev:memory
3
4# Open browser to:
5# http://localhost:1313/docs/dictionary/capitalism/
1git add - A
2git cm "message"
scripts/dictionary/dict-step1-transcript.sh ⭐ NEW
./scripts/dictionary/dict-step1-transcript.sh VIDEO_URL TOPIC_NAMEscripts/dictionary/dict-extract.sh
./scripts/dictionary/dict-extract.sh --video-url "URL" --topic "topic" 1# Step 1: Extract transcript (automated)
2./scripts/dictionary/dict-step1-transcript.sh "https://www.youtube.com/watch?v=9M_dq_0ljsc" capitalism
3
4# Step 2: Open transcript file
5code .prompts/dictionary/transcript/capitalism-transcript.txt
6
7# Step 3: Use Copilot Chat to extract vocabulary (see prompt above)
8
9# Step 4: Save YAML
10mkdir -p data/dictionary/capitalism
11cat > data/dictionary/capitalism/vocabulary.yaml
12# Paste YAML, Ctrl+D
13
14# Step 5: Create Hugo page
15hugo new content/docs/dictionary/capitalism/index.md --kind dictionary
16
17# Step 6: Edit Hugo page to reference "dictionary.capitalism.vocabulary"
18
19# Step 7: Test
20npm run dev:memory
21
22# Step 8: Commit
23git add data/dictionary/capitalism/ content/docs/dictionary/capitalism/
24git commit -m "Add capitalism vocabulary"
Total time: 15-20 minutes
Cost: $0 (using Copilot)
Solution: Conda path might be different. Try:
1source ~/anaconda3/etc/profile.d/conda.sh # If using Anaconda
2# or
3source ~/opt/miniconda3/etc/profile.d/conda.sh # Alternative location
Solution: Create the environment:
1conda create -n ags-dictionary python=3.11
2conda activate ags-dictionary
3pip install youtube-transcript-api PyYAML openai python-dotenv
Reasons:
In your Copilot prompt, you can customize:
1Focus on words related to:
2- Economic systems
3- Political theory
4- Social structures
5- Academic discourse
6
7Difficulty level: 7-9 (very advanced)
8Word count: 30 (more words)
Ask Copilot to:
1Use formal Urdu for academic terms
2Use contemporary vocabulary
3Avoid overly archaic expressions
4Include context in examples
Created: 2026-05-05
Status: Production Ready
Recommended: Use Step 1 script + Copilot for best experience