Python Crash Debugging

November 13, 2025 8 min read Troubleshooting Debugging Python Docs Automation-With-Python Pdb Exceptions Encoding Csv

This document demonstrates debugging Python exceptions using PDB debugger covering traceback analysis, KeyError investigation, and fixing UTF-8 BOM encoding issues in CSV files. Practical case study of database import script debugging.

On this page

This document provides a practical walkthrough of debugging Python exceptions using the PDB debugger, demonstrating how to analyze KeyError exceptions, investigate variable contents, identify UTF-8 Byte Order Mark (BOM) encoding issues, and implement fixes for CSV file processing.

Introduction

While C and C++ programs commonly crash with segmentation faults, Python applications typically fail with unexpected exceptions. Understanding how to debug these exceptions using Python’s PDB debugger is essential for diagnosing and fixing runtime errors in Python code.

Case Study: Database Import Script

Problem Description

A script updates product descriptions in a database by importing data from CSV files. The script works correctly for most files but fails when processing files generated by a specific user.

Script Purpose

Component	Description
Input	CSV file with product codes and descriptions
Process	Read CSV and update database records
Output	Updated database entries
Failure Mode	KeyError exception on specific files

Understanding Python Tracebacks

Traceback Structure

Python tracebacks provide information about exception locations and call chains:

1Traceback (most recent call last):
2  File "update_products.py", line 25, in <module>
3    main()
4  File "update_products.py", line 20, in main
5    update_data(row)
6  File "update_products.py", line 10, in update_data
7    code = row['product_code']
8KeyError: 'product_code'

Reading Tracebacks

Element	Information Provided
Exception Type	KeyError, ValueError, TypeError, etc.
Exception Message	Specific key, value, or context
File and Line	Location where exception occurred
Function Stack	Call chain leading to exception (bottom-up)

Traceback Order

Python tracebacks display function calls in reverse order compared to GDB backtraces:

Top: Module-level or entry point
Middle: Intermediate function calls
Bottom: Function where exception occurred

Initial Investigation

Examining the Input File

First step is to inspect the problematic CSV file:

1# View file contents
2cat new_products.csv
3
4# Or use less for larger files
5less new_products.csv
6
7# Check file encoding
8file -i new_products.csv

Running the Script

 1# Execute the script
 2python3 update_products.py new_products.csv
 3
 4# Output:
 5# Traceback (most recent call last):
 6#   File "update_products.py", line 25, in <module>
 7#     main()
 8#   File "update_products.py", line 20, in main
 9#     update_data(row)
10#   File "update_products.py", line 10, in update_data
11#     code = row['product_code']
12# KeyError: 'product_code'

Using the PDB Debugger

Starting PDB

Launch the debugger with the script and its arguments:

1# Syntax: pdb3 script.py [arguments]
2pdb3 update_products.py new_products.csv
3
4# Or use Python 3's pdb module
5python3 -m pdb update_products.py new_products.csv

Initial PDB State

When PDB starts, it positions at the first line of the script and waits for commands:

1> /path/to/update_products.py(1)<module>()
2-> import csv
3(Pdb)

Basic PDB Commands

Essential Commands

Command	Shortcut	Purpose
`continue`	`c`	Run until exception or completion
`next`	`n`	Execute next line (don’t enter functions)
`step`	`s`	Execute next line (enter functions)
`print <expr>`	`p <expr>`	Evaluate and print expression
`list`	`l`	Show source code around current line
`where`	`w`	Show stack trace
`up`	`u`	Move up one stack frame
`down`	`d`	Move down one stack frame
`quit`	`q`	Exit debugger

Running Until Exception

Instead of stepping through each line, use continue to run until the crash:

1(Pdb) continue
2Traceback (most recent call last):
3  File "update_products.py", line 25, in <module>
4    main()
5  File "update_products.py", line 20, in main
6    update_data(row)
7  File "update_products.py", line 10, in update_data
8    code = row['product_code']
9KeyError: 'product_code'

Investigating the Exception

Examining Variables

Once the exception occurs, use print to examine variable contents:

1(Pdb) print(row)
2{'\ufeffproduct_code': 'PROD001', 'description': 'Widget A'}

Identifying the Issue

Notice the unusual characters \ufeff before product_code. These characters represent the Byte Order Mark (BOM).

Understanding Byte Order Mark (BOM)

What is BOM

The Byte Order Mark is a special Unicode character used to:

Indicate byte order (endianness) in UTF-16 and UTF-32 files
Sometimes included in UTF-8 files (though not required)
Represented by code point U+FEFF

BOM in Different Encodings

Encoding	BOM Bytes	Purpose
UTF-8	EF BB BF	Optional, indicates UTF-8
UTF-16 LE	FF FE	Little-endian byte order
UTF-16 BE	FE FF	Big-endian byte order
UTF-32 LE	FF FE 00 00	Little-endian 32-bit
UTF-32 BE	00 00 FE FF	Big-endian 32-bit

BOM Impact on CSV Parsing

When BOM is present in UTF-8 files, it becomes part of the first field name:

1# Without BOM
2row = {'product_code': 'PROD001', 'description': 'Widget A'}
3
4# With BOM
5row = {'\ufeffproduct_code': 'PROD001', 'description': 'Widget A'}
6
7# Accessing 'product_code' fails because key is '\ufeffproduct_code'

The Solution: UTF-8-sig Encoding

Python’s UTF-8-sig Encoding

Python provides the utf-8-sig encoding that automatically handles BOM:

With BOM: Removes BOM when reading file
Without BOM: Behaves like standard UTF-8

Implementing the Fix

Modify the file opening code to use utf-8-sig encoding:

 1# Before (fails with BOM)
 2import csv
 3
 4with open('new_products.csv', 'r') as file:
 5    reader = csv.DictReader(file)
 6    for row in reader:
 7        update_data(row)
 8
 9# After (handles BOM correctly)
10import csv
11
12with open('new_products.csv', 'r', encoding='utf-8-sig') as file:
13    reader = csv.DictReader(file)
14    for row in reader:
15        update_data(row)

Complete Fixed Script

 1import csv
 2import sqlite3
 3
 4def update_data(row):
 5    """Update database with product information."""
 6    code = row['product_code']
 7    description = row['description']
 8
 9    # Update database (simplified)
10    conn = sqlite3.connect('products.db')
11    cursor = conn.cursor()
12    cursor.execute(
13        "UPDATE products SET description = ? WHERE code = ?",
14        (description, code)
15    )
16    conn.commit()
17    conn.close()
18
19def main():
20    """Main function to process CSV file."""
21    import sys
22
23    if len(sys.argv) < 2:
24        print("Usage: update_products.py <csv_file>")
25        sys.exit(1)
26
27    filename = sys.argv[1]
28
29    # Fix: Use utf-8-sig encoding to handle BOM
30    with open(filename, 'r', encoding='utf-8-sig') as file:
31        reader = csv.DictReader(file)
32        for row in reader:
33            update_data(row)
34
35    print("Database updated successfully")
36
37if __name__ == '__main__':
38    main()

Testing the Fix

Verification Steps

1# Test with file containing BOM
2python3 update_products.py new_products.csv
3# Database updated successfully
4
5# Test with file without BOM
6python3 update_products.py standard_products.csv
7# Database updated successfully

Testing Matrix

File Type	BOM Present	Result with utf-8	Result with utf-8-sig
UTF-8 with BOM	Yes	KeyError	Success
UTF-8 without BOM	No	Success	Success
ASCII	No	Success	Success

Advanced PDB Features

Breakpoints

Set breakpoints to pause execution at specific lines:

1# In code: Add breakpoint()
2def update_data(row):
3    breakpoint()  # Python 3.7+
4    code = row['product_code']
5    # ...

 1# In PDB: Set breakpoint at line number
 2(Pdb) break 10
 3Breakpoint 1 at /path/to/script.py:10
 4
 5# Set breakpoint in function
 6(Pdb) break update_data
 7Breakpoint 2 at /path/to/script.py:5
 8
 9# List breakpoints
10(Pdb) break

Watchpoints (Conditional Breakpoints)

1# Break when condition is true
2(Pdb) break 10, row['product_code'] == 'PROD001'
3
4# Continue until expression changes
5(Pdb) display row['product_code']

Stepping Through Code

Command	Behavior
`step`	Step into function calls
`next`	Execute line without entering functions
`return`	Continue until current function returns
`until`	Continue until line greater than current

Additional PDB Commands

 1# Show arguments of current function
 2(Pdb) args
 3
 4# Execute Python code
 5(Pdb) !variable = new_value
 6
 7# Show all variables in current scope
 8(Pdb) pp locals()
 9
10# Show source code
11(Pdb) list 1, 20
12
13# Jump to different line (use with caution)
14(Pdb) jump 15

Debugging Best Practices

When to Use PDB

Exception message is unclear
Need to inspect variable contents at crash time
Intermittent failures requiring step-by-step analysis
Complex logic with multiple conditional paths

Debugging Workflow

Step	Action
1	Read exception message and traceback
2	Identify function and line where exception occurred
3	Launch PDB with script and arguments
4	Use `continue` to run until exception
5	Examine variables with `print`
6	Investigate unexpected values
7	Research error patterns online
8	Implement fix
9	Test with original failing case
10	Test with other cases to ensure no regression

Alternative Debugging Approaches

 1# Add print statements (quick debugging)
 2def update_data(row):
 3    print(f"DEBUG: row = {row}")
 4    print(f"DEBUG: keys = {list(row.keys())}")
 5    code = row['product_code']
 6
 7# Use logging module (production debugging)
 8import logging
 9logging.basicConfig(level=logging.DEBUG)
10
11def update_data(row):
12    logging.debug(f"Processing row: {row}")
13    code = row['product_code']
14
15# Use try-except for better error messages
16def update_data(row):
17    try:
18        code = row['product_code']
19    except KeyError:
20        print(f"KeyError: Available keys are {list(row.keys())}")
21        raise

Common CSV Encoding Issues

Detecting Encoding Problems

 1# Check file encoding with file command
 2file -i data.csv
 3# data.csv: text/csv; charset=utf-8
 4
 5# View raw bytes of file start
 6hexdump -C data.csv | head
 7# Look for EF BB BF (UTF-8 BOM)
 8
 9# Try different encodings
10python3 -c "print(open('data.csv', 'rb').read()[:10])"

Common Encodings for CSV Files

Encoding	When to Use
`utf-8`	Standard UTF-8 files without BOM
`utf-8-sig`	UTF-8 files that may have BOM
`latin-1`	Western European legacy files
`cp1252`	Windows Western European files
`iso-8859-1`	Unix/Linux Western European files

Important
When processing CSV files from multiple sources, always use utf-8-sig encoding to handle files both with and without BOM gracefully.

Conclusion

Debugging Python exceptions involves analyzing tracebacks to identify error locations, using the PDB debugger to inspect variable contents at crash time, and investigating unexpected values that cause exceptions. The Byte Order Mark (BOM) in UTF-8 files can cause KeyError exceptions when parsing CSV files. Using Python’s utf-8-sig encoding automatically handles BOM presence, making code compatible with files from various sources. Advanced PDB features like breakpoints and watchpoints enable sophisticated debugging strategies for complex issues.

FAQ

Debug With Print

Browse Courses