This document demonstrates debugging Python exceptions using PDB debugger covering traceback analysis, KeyError investigation, and fixing UTF-8 BOM encoding issues in CSV files. Practical case study of database import script debugging.
This document provides a practical walkthrough of debugging Python exceptions using the PDB debugger, demonstrating how to analyze KeyError exceptions, investigate variable contents, identify UTF-8 Byte Order Mark (BOM) encoding issues, and implement fixes for CSV file processing.
While C and C++ programs commonly crash with segmentation faults, Python applications typically fail with unexpected exceptions. Understanding how to debug these exceptions using Python’s PDB debugger is essential for diagnosing and fixing runtime errors in Python code.
A script updates product descriptions in a database by importing data from CSV files. The script works correctly for most files but fails when processing files generated by a specific user.
| Component | Description |
|---|---|
| Input | CSV file with product codes and descriptions |
| Process | Read CSV and update database records |
| Output | Updated database entries |
| Failure Mode | KeyError exception on specific files |
Python tracebacks provide information about exception locations and call chains:
1Traceback (most recent call last):
2 File "update_products.py", line 25, in <module>
3 main()
4 File "update_products.py", line 20, in main
5 update_data(row)
6 File "update_products.py", line 10, in update_data
7 code = row['product_code']
8KeyError: 'product_code'
| Element | Information Provided |
|---|---|
| Exception Type | KeyError, ValueError, TypeError, etc. |
| Exception Message | Specific key, value, or context |
| File and Line | Location where exception occurred |
| Function Stack | Call chain leading to exception (bottom-up) |
Python tracebacks display function calls in reverse order compared to GDB backtraces:
First step is to inspect the problematic CSV file:
1# View file contents
2cat new_products.csv
3
4# Or use less for larger files
5less new_products.csv
6
7# Check file encoding
8file -i new_products.csv
1# Execute the script
2python3 update_products.py new_products.csv
3
4# Output:
5# Traceback (most recent call last):
6# File "update_products.py", line 25, in <module>
7# main()
8# File "update_products.py", line 20, in main
9# update_data(row)
10# File "update_products.py", line 10, in update_data
11# code = row['product_code']
12# KeyError: 'product_code'
Launch the debugger with the script and its arguments:
1# Syntax: pdb3 script.py [arguments]
2pdb3 update_products.py new_products.csv
3
4# Or use Python 3's pdb module
5python3 -m pdb update_products.py new_products.csv
When PDB starts, it positions at the first line of the script and waits for commands:
1> /path/to/update_products.py(1)<module>()
2-> import csv
3(Pdb)
| Command | Shortcut | Purpose |
|---|---|---|
continue | c | Run until exception or completion |
next | n | Execute next line (don’t enter functions) |
step | s | Execute next line (enter functions) |
print <expr> | p <expr> | Evaluate and print expression |
list | l | Show source code around current line |
where | w | Show stack trace |
up | u | Move up one stack frame |
down | d | Move down one stack frame |
quit | q | Exit debugger |
Instead of stepping through each line, use continue to run until the crash:
1(Pdb) continue
2Traceback (most recent call last):
3 File "update_products.py", line 25, in <module>
4 main()
5 File "update_products.py", line 20, in main
6 update_data(row)
7 File "update_products.py", line 10, in update_data
8 code = row['product_code']
9KeyError: 'product_code'
Once the exception occurs, use print to examine variable contents:
1(Pdb) print(row)
2{'\ufeffproduct_code': 'PROD001', 'description': 'Widget A'}
Notice the unusual characters \ufeff before product_code. These characters represent the Byte Order Mark (BOM).
The Byte Order Mark is a special Unicode character used to:
| Encoding | BOM Bytes | Purpose |
|---|---|---|
| UTF-8 | EF BB BF | Optional, indicates UTF-8 |
| UTF-16 LE | FF FE | Little-endian byte order |
| UTF-16 BE | FE FF | Big-endian byte order |
| UTF-32 LE | FF FE 00 00 | Little-endian 32-bit |
| UTF-32 BE | 00 00 FE FF | Big-endian 32-bit |
When BOM is present in UTF-8 files, it becomes part of the first field name:
1# Without BOM
2row = {'product_code': 'PROD001', 'description': 'Widget A'}
3
4# With BOM
5row = {'\ufeffproduct_code': 'PROD001', 'description': 'Widget A'}
6
7# Accessing 'product_code' fails because key is '\ufeffproduct_code'
Python provides the utf-8-sig encoding that automatically handles BOM:
Modify the file opening code to use utf-8-sig encoding:
1# Before (fails with BOM)
2import csv
3
4with open('new_products.csv', 'r') as file:
5 reader = csv.DictReader(file)
6 for row in reader:
7 update_data(row)
8
9# After (handles BOM correctly)
10import csv
11
12with open('new_products.csv', 'r', encoding='utf-8-sig') as file:
13 reader = csv.DictReader(file)
14 for row in reader:
15 update_data(row)
1import csv
2import sqlite3
3
4def update_data(row):
5 """Update database with product information."""
6 code = row['product_code']
7 description = row['description']
8
9 # Update database (simplified)
10 conn = sqlite3.connect('products.db')
11 cursor = conn.cursor()
12 cursor.execute(
13 "UPDATE products SET description = ? WHERE code = ?",
14 (description, code)
15 )
16 conn.commit()
17 conn.close()
18
19def main():
20 """Main function to process CSV file."""
21 import sys
22
23 if len(sys.argv) < 2:
24 print("Usage: update_products.py <csv_file>")
25 sys.exit(1)
26
27 filename = sys.argv[1]
28
29 # Fix: Use utf-8-sig encoding to handle BOM
30 with open(filename, 'r', encoding='utf-8-sig') as file:
31 reader = csv.DictReader(file)
32 for row in reader:
33 update_data(row)
34
35 print("Database updated successfully")
36
37if __name__ == '__main__':
38 main()
1# Test with file containing BOM
2python3 update_products.py new_products.csv
3# Database updated successfully
4
5# Test with file without BOM
6python3 update_products.py standard_products.csv
7# Database updated successfully
| File Type | BOM Present | Result with utf-8 | Result with utf-8-sig |
|---|---|---|---|
| UTF-8 with BOM | Yes | KeyError | Success |
| UTF-8 without BOM | No | Success | Success |
| ASCII | No | Success | Success |
Set breakpoints to pause execution at specific lines:
1# In code: Add breakpoint()
2def update_data(row):
3 breakpoint() # Python 3.7+
4 code = row['product_code']
5 # ...
1# In PDB: Set breakpoint at line number
2(Pdb) break 10
3Breakpoint 1 at /path/to/script.py:10
4
5# Set breakpoint in function
6(Pdb) break update_data
7Breakpoint 2 at /path/to/script.py:5
8
9# List breakpoints
10(Pdb) break
1# Break when condition is true
2(Pdb) break 10, row['product_code'] == 'PROD001'
3
4# Continue until expression changes
5(Pdb) display row['product_code']
| Command | Behavior |
|---|---|
step | Step into function calls |
next | Execute line without entering functions |
return | Continue until current function returns |
until | Continue until line greater than current |
1# Show arguments of current function
2(Pdb) args
3
4# Execute Python code
5(Pdb) !variable = new_value
6
7# Show all variables in current scope
8(Pdb) pp locals()
9
10# Show source code
11(Pdb) list 1, 20
12
13# Jump to different line (use with caution)
14(Pdb) jump 15
| Step | Action |
|---|---|
| 1 | Read exception message and traceback |
| 2 | Identify function and line where exception occurred |
| 3 | Launch PDB with script and arguments |
| 4 | Use continue to run until exception |
| 5 | Examine variables with print |
| 6 | Investigate unexpected values |
| 7 | Research error patterns online |
| 8 | Implement fix |
| 9 | Test with original failing case |
| 10 | Test with other cases to ensure no regression |
1# Add print statements (quick debugging)
2def update_data(row):
3 print(f"DEBUG: row = {row}")
4 print(f"DEBUG: keys = {list(row.keys())}")
5 code = row['product_code']
6
7# Use logging module (production debugging)
8import logging
9logging.basicConfig(level=logging.DEBUG)
10
11def update_data(row):
12 logging.debug(f"Processing row: {row}")
13 code = row['product_code']
14
15# Use try-except for better error messages
16def update_data(row):
17 try:
18 code = row['product_code']
19 except KeyError:
20 print(f"KeyError: Available keys are {list(row.keys())}")
21 raise
1# Check file encoding with file command
2file -i data.csv
3# data.csv: text/csv; charset=utf-8
4
5# View raw bytes of file start
6hexdump -C data.csv | head
7# Look for EF BB BF (UTF-8 BOM)
8
9# Try different encodings
10python3 -c "print(open('data.csv', 'rb').read()[:10])"
| Encoding | When to Use |
|---|---|
utf-8 | Standard UTF-8 files without BOM |
utf-8-sig | UTF-8 files that may have BOM |
latin-1 | Western European legacy files |
cp1252 | Windows Western European files |
iso-8859-1 | Unix/Linux Western European files |
Important
When processing CSV files from multiple sources, always use
utf-8-sigencoding to handle files both with and without BOM gracefully.
Debugging Python exceptions involves analyzing tracebacks to identify error locations, using the PDB debugger to inspect variable contents at crash time, and investigating unexpected values that cause exceptions. The Byte Order Mark (BOM) in UTF-8 files can cause KeyError exceptions when parsing CSV files. Using Python’s utf-8-sig encoding automatically handles BOM presence, making code compatible with files from various sources. Advanced PDB features like breakpoints and watchpoints enable sophisticated debugging strategies for complex issues.