Copilot Prompt - Extract Vocabulary from Transcript

January 1, 1 4 min read Dictionary Vocabulary

Ready-to-use prompt for extracting vocabulary using GitHub Copilot

On this page

Copilot Vocabulary Extraction Prompt

🎯 How to Use This Prompt

Get a video transcript (copy from YouTube, or use youtube-transcript-api)
Copy the prompt template below
Replace [TOPIC] and [TRANSCRIPT] with your content
Paste into GitHub Copilot Chat
Review and save the generated YAML

📋 Prompt Template

I need help extracting vocabulary from a video transcript for my English learning dictionary.

**Task**: Analyze this video transcript about [TOPIC] and extract 25 challenging English words suitable for an advanced ESL learner (first language: Urdu).

**Selection Criteria**:
- Difficulty level: 6-9 out of 10
- Important for understanding the topic
- Not common everyday words
- Technical or domain-specific terms
- Useful for academic/professional contexts

**For each word, provide**:
1. word: the English word (lowercase)
2. part_of_speech: (noun, verb, adjective, adverb, phrase, technical-term, etc.)
3. urdu_meaning: Urdu translation in Urdu script
4. example_en: A clear example sentence from the transcript or similar context
5. example_ur: Natural, contextual Urdu translation of the example
6. additional_example_ur: (optional) Another Urdu example showing different usage

**Output Format**: Valid YAML array only, no markdown formatting, no explanations.

**Example Entry**:
```yaml
- word: retrieval
  part_of_speech: noun
  urdu_meaning: بازیافت، واپس لانا
  example_en: Efficient retrieval of information is crucial for RAG systems.
  example_ur: RAG سسٹمز کے لیے معلومات کی موثر بازیافت بہت اہم ہے۔
  additional_example_ur: ڈیٹا بیس سے دستاویزات کی بازیافت تیزی سے ہونی چاہیے۔

Transcript: [PASTE YOUR TRANSCRIPT HERE]

Please extract the vocabulary now in YAML format.


---

## 📝 Alternative: Shorter Version for Quick Extraction

Extract 20 difficult English words from this video transcript about [TOPIC] for an Urdu-speaking advanced learner.

For each word provide YAML format:

word: (lowercase)
part_of_speech: (noun/verb/adjective/etc)
urdu_meaning: (Urdu script)
example_en: (clear example)
example_ur: (Urdu translation)

Transcript: [PASTE TRANSCRIPT]

Output only valid YAML array.


---

## 🎨 Customization Options

### Adjust Difficulty Level

Focus on words with difficulty level [LEVEL] out of 10:

Level 5-6: Upper intermediate
Level 6-7: Advanced
Level 7-8: Proficient
Level 8-10: Native/Expert


### Focus on Specific Word Types

Prioritize:

Technical terms only
Academic vocabulary
Business English
Phrasal verbs
Idioms and expressions
Formal vs informal language


### Adjust Number of Words

Extract [NUMBER] words:

10-15: Short video or focused topic
20-25: Standard (recommended)
30-40: Long lecture or comprehensive
50+: Entire course (batch process)


---

## 💡 Pro Tips

### 1. Provide Context

Include this in your prompt for better results:

Context: I’m learning [FIELD/DOMAIN] and my current English level is [LEVEL]. I already know common words like [EXAMPLES], so focus on more advanced terms.


### 2. Request Specific Translation Style

For Urdu translations:

Use modern, conversational Urdu
Include technical terms when appropriate
Maintain formal tone for academic content
Use simple Urdu for better comprehension


### 3. Ask for Additional Information

Also provide for each word:

Timestamp (if identifiable from context)
Usage frequency in the video
Related words or synonyms
Common collocations


### 4. Iterative Refinement

If results aren't perfect:

Please refine the translations:

Make the Urdu more natural
Add more context to examples
Focus on [specific aspect]
Simplify/formalize the language


---

## 🔧 Post-Processing Checklist

After Copilot generates the YAML:

- [ ] Copy the output
- [ ] Validate YAML syntax (use yamllint.com or Python)
- [ ] Review Urdu translations for accuracy
- [ ] Check example sentences for clarity
- [ ] Verify all required fields are present
- [ ] Remove any duplicates
- [ ] Save to `data/dictionary/[topic]/vocabulary.yaml`
- [ ] Create/update Hugo content page
- [ ] Test locally with `npm run dev:memory`

---

## 📊 Example Workflow

### Step 1: Get Transcript

```bash
# For YouTube videos
youtube-transcript-api VIDEO_ID --format text > transcript.txt

Step 2: Copy Prompt

Copy the main prompt template above and fill in:

[TOPIC] → “RAG and Vector Databases”
[TRANSCRIPT] → Paste your transcript content

Step 3: Get Output from Copilot

Paste into Copilot Chat and wait for response.

Step 4: Save Output

1# Save to file
2cat > data/dictionary/rag-course/vocabulary.yaml << 'EOF'
3[PASTE COPILOT OUTPUT HERE]
4EOF

Step 5: Validate

1# Validate YAML
2python -c "import yaml; print(len(yaml.safe_load(open('data/dictionary/rag-course/vocabulary.yaml'))))"

Step 6: Create Hugo Page

Use the archetype or manual creation:

1hugo new content/docs/dictionary/rag-course/index.md --kind dictionary

🚀 Advanced: Batch Processing with Copilot

For multiple videos in a series:

I have 5 video transcripts from a course about [TOPIC]. I'll provide them one at a time.

For EACH transcript, extract 15-20 unique words (don't repeat words from previous transcripts).
Maintain consistent Urdu translation style across all outputs.

Transcript 1:
[PASTE TRANSCRIPT 1]

[Wait for response, then continue with next transcript]

🎓 Learning from Copilot

After using this several times, you’ll notice:

Which prompts give better results
How to refine Urdu translations
What difficulty levels work for you
How to adjust for different topics

Save your own improved prompts in this file!

video-to-vocab-howto.md - Complete how-to guide
workflow-improvements.md - System improvements
complete-guide.md - Dictionary system guide
quick-reference.md - Quick reference card

Last Updated: 2026-05-05 Status: Ready to Use Tested With: GitHub Copilot Chat, ChatGPT-4, Claude 3

Quick Ref

Browse Docs