Copilot Prompt - Extract Vocabulary from Transcript

Ready-to-use prompt for extracting vocabulary using GitHub Copilot

Copilot Vocabulary Extraction Prompt

🎯 How to Use This Prompt

  1. Get a video transcript (copy from YouTube, or use youtube-transcript-api)
  2. Copy the prompt template below
  3. Replace [TOPIC] and [TRANSCRIPT] with your content
  4. Paste into GitHub Copilot Chat
  5. Review and save the generated YAML

📋 Prompt Template

I need help extracting vocabulary from a video transcript for my English learning dictionary.

**Task**: Analyze this video transcript about [TOPIC] and extract 25 challenging English words suitable for an advanced ESL learner (first language: Urdu).

**Selection Criteria**:
- Difficulty level: 6-9 out of 10
- Important for understanding the topic
- Not common everyday words
- Technical or domain-specific terms
- Useful for academic/professional contexts

**For each word, provide**:
1. word: the English word (lowercase)
2. part_of_speech: (noun, verb, adjective, adverb, phrase, technical-term, etc.)
3. urdu_meaning: Urdu translation in Urdu script
4. example_en: A clear example sentence from the transcript or similar context
5. example_ur: Natural, contextual Urdu translation of the example
6. additional_example_ur: (optional) Another Urdu example showing different usage

**Output Format**: Valid YAML array only, no markdown formatting, no explanations.

**Example Entry**:
```yaml
- word: retrieval
  part_of_speech: noun
  urdu_meaning: بازیافت، واپس لانا
  example_en: Efficient retrieval of information is crucial for RAG systems.
  example_ur: RAG سسٹمز کے لیے معلومات کی موثر بازیافت بہت اہم ہے۔
  additional_example_ur: ڈیٹا بیس سے دستاویزات کی بازیافت تیزی سے ہونی چاہیے۔

Transcript: [PASTE YOUR TRANSCRIPT HERE]

Please extract the vocabulary now in YAML format.


---

## 📝 Alternative: Shorter Version for Quick Extraction

Extract 20 difficult English words from this video transcript about [TOPIC] for an Urdu-speaking advanced learner.

For each word provide YAML format:

  • word: (lowercase)
  • part_of_speech: (noun/verb/adjective/etc)
  • urdu_meaning: (Urdu script)
  • example_en: (clear example)
  • example_ur: (Urdu translation)

Transcript: [PASTE TRANSCRIPT]

Output only valid YAML array.


---

## 🎨 Customization Options

### Adjust Difficulty Level

Focus on words with difficulty level [LEVEL] out of 10:

  • Level 5-6: Upper intermediate
  • Level 6-7: Advanced
  • Level 7-8: Proficient
  • Level 8-10: Native/Expert

### Focus on Specific Word Types

Prioritize:

  • Technical terms only
  • Academic vocabulary
  • Business English
  • Phrasal verbs
  • Idioms and expressions
  • Formal vs informal language

### Adjust Number of Words

Extract [NUMBER] words:

  • 10-15: Short video or focused topic
  • 20-25: Standard (recommended)
  • 30-40: Long lecture or comprehensive
  • 50+: Entire course (batch process)

---

## 💡 Pro Tips

### 1. Provide Context

Include this in your prompt for better results:

Context: I’m learning [FIELD/DOMAIN] and my current English level is [LEVEL]. I already know common words like [EXAMPLES], so focus on more advanced terms.


### 2. Request Specific Translation Style

For Urdu translations:

  • Use modern, conversational Urdu
  • Include technical terms when appropriate
  • Maintain formal tone for academic content
  • Use simple Urdu for better comprehension

### 3. Ask for Additional Information

Also provide for each word:

  • Timestamp (if identifiable from context)
  • Usage frequency in the video
  • Related words or synonyms
  • Common collocations

### 4. Iterative Refinement

If results aren't perfect:

Please refine the translations:

  1. Make the Urdu more natural
  2. Add more context to examples
  3. Focus on [specific aspect]
  4. Simplify/formalize the language

---

## 🔧 Post-Processing Checklist

After Copilot generates the YAML:

- [ ] Copy the output
- [ ] Validate YAML syntax (use yamllint.com or Python)
- [ ] Review Urdu translations for accuracy
- [ ] Check example sentences for clarity
- [ ] Verify all required fields are present
- [ ] Remove any duplicates
- [ ] Save to `data/dictionary/[topic]/vocabulary.yaml`
- [ ] Create/update Hugo content page
- [ ] Test locally with `npm run dev:memory`

---

## 📊 Example Workflow

### Step 1: Get Transcript

```bash
# For YouTube videos
youtube-transcript-api VIDEO_ID --format text > transcript.txt

Step 2: Copy Prompt

Copy the main prompt template above and fill in:

  • [TOPIC] → “RAG and Vector Databases”
  • [TRANSCRIPT] → Paste your transcript content

Step 3: Get Output from Copilot

Paste into Copilot Chat and wait for response.

Step 4: Save Output

1# Save to file
2cat > data/dictionary/rag-course/vocabulary.yaml << 'EOF'
3[PASTE COPILOT OUTPUT HERE]
4EOF

Step 5: Validate

1# Validate YAML
2python -c "import yaml; print(len(yaml.safe_load(open('data/dictionary/rag-course/vocabulary.yaml'))))"

Step 6: Create Hugo Page

Use the archetype or manual creation:

1hugo new content/docs/dictionary/rag-course/index.md --kind dictionary

🚀 Advanced: Batch Processing with Copilot

For multiple videos in a series:

I have 5 video transcripts from a course about [TOPIC]. I'll provide them one at a time.

For EACH transcript, extract 15-20 unique words (don't repeat words from previous transcripts).
Maintain consistent Urdu translation style across all outputs.

Transcript 1:
[PASTE TRANSCRIPT 1]

[Wait for response, then continue with next transcript]

🎓 Learning from Copilot

After using this several times, you’ll notice:

  • Which prompts give better results
  • How to refine Urdu translations
  • What difficulty levels work for you
  • How to adjust for different topics

Save your own improved prompts in this file!



Last Updated: 2026-05-05 Status: Ready to Use Tested With: GitHub Copilot Chat, ChatGPT-4, Claude 3