Python Crash Debugging

This document demonstrates debugging Python exceptions using PDB debugger covering traceback analysis, KeyError investigation, and fixing UTF-8 BOM encoding issues in CSV files. Practical case study of database import script debugging.

This document provides a practical walkthrough of debugging Python exceptions using the PDB debugger, demonstrating how to analyze KeyError exceptions, investigate variable contents, identify UTF-8 Byte Order Mark (BOM) encoding issues, and implement fixes for CSV file processing.


Introduction

While C and C++ programs commonly crash with segmentation faults, Python applications typically fail with unexpected exceptions. Understanding how to debug these exceptions using Python’s PDB debugger is essential for diagnosing and fixing runtime errors in Python code.


Case Study: Database Import Script

Problem Description

A script updates product descriptions in a database by importing data from CSV files. The script works correctly for most files but fails when processing files generated by a specific user.

Script Purpose

ComponentDescription
InputCSV file with product codes and descriptions
ProcessRead CSV and update database records
OutputUpdated database entries
Failure ModeKeyError exception on specific files

Understanding Python Tracebacks

Traceback Structure

Python tracebacks provide information about exception locations and call chains:

1Traceback (most recent call last):
2  File "update_products.py", line 25, in <module>
3    main()
4  File "update_products.py", line 20, in main
5    update_data(row)
6  File "update_products.py", line 10, in update_data
7    code = row['product_code']
8KeyError: 'product_code'

Reading Tracebacks

ElementInformation Provided
Exception TypeKeyError, ValueError, TypeError, etc.
Exception MessageSpecific key, value, or context
File and LineLocation where exception occurred
Function StackCall chain leading to exception (bottom-up)

Traceback Order

Python tracebacks display function calls in reverse order compared to GDB backtraces:

  • Top: Module-level or entry point
  • Middle: Intermediate function calls
  • Bottom: Function where exception occurred

Initial Investigation

Examining the Input File

First step is to inspect the problematic CSV file:

1# View file contents
2cat new_products.csv
3
4# Or use less for larger files
5less new_products.csv
6
7# Check file encoding
8file -i new_products.csv

Running the Script

 1# Execute the script
 2python3 update_products.py new_products.csv
 3
 4# Output:
 5# Traceback (most recent call last):
 6#   File "update_products.py", line 25, in <module>
 7#     main()
 8#   File "update_products.py", line 20, in main
 9#     update_data(row)
10#   File "update_products.py", line 10, in update_data
11#     code = row['product_code']
12# KeyError: 'product_code'

Using the PDB Debugger

Starting PDB

Launch the debugger with the script and its arguments:

1# Syntax: pdb3 script.py [arguments]
2pdb3 update_products.py new_products.csv
3
4# Or use Python 3's pdb module
5python3 -m pdb update_products.py new_products.csv

Initial PDB State

When PDB starts, it positions at the first line of the script and waits for commands:

1> /path/to/update_products.py(1)<module>()
2-> import csv
3(Pdb)

Basic PDB Commands

Essential Commands

CommandShortcutPurpose
continuecRun until exception or completion
nextnExecute next line (don’t enter functions)
stepsExecute next line (enter functions)
print <expr>p <expr>Evaluate and print expression
listlShow source code around current line
wherewShow stack trace
upuMove up one stack frame
downdMove down one stack frame
quitqExit debugger

Running Until Exception

Instead of stepping through each line, use continue to run until the crash:

1(Pdb) continue
2Traceback (most recent call last):
3  File "update_products.py", line 25, in <module>
4    main()
5  File "update_products.py", line 20, in main
6    update_data(row)
7  File "update_products.py", line 10, in update_data
8    code = row['product_code']
9KeyError: 'product_code'

Investigating the Exception

Examining Variables

Once the exception occurs, use print to examine variable contents:

1(Pdb) print(row)
2{'\ufeffproduct_code': 'PROD001', 'description': 'Widget A'}

Identifying the Issue

Notice the unusual characters \ufeff before product_code. These characters represent the Byte Order Mark (BOM).


Understanding Byte Order Mark (BOM)

What is BOM

The Byte Order Mark is a special Unicode character used to:

  • Indicate byte order (endianness) in UTF-16 and UTF-32 files
  • Sometimes included in UTF-8 files (though not required)
  • Represented by code point U+FEFF

BOM in Different Encodings

EncodingBOM BytesPurpose
UTF-8EF BB BFOptional, indicates UTF-8
UTF-16 LEFF FELittle-endian byte order
UTF-16 BEFE FFBig-endian byte order
UTF-32 LEFF FE 00 00Little-endian 32-bit
UTF-32 BE00 00 FE FFBig-endian 32-bit

BOM Impact on CSV Parsing

When BOM is present in UTF-8 files, it becomes part of the first field name:

1# Without BOM
2row = {'product_code': 'PROD001', 'description': 'Widget A'}
3
4# With BOM
5row = {'\ufeffproduct_code': 'PROD001', 'description': 'Widget A'}
6
7# Accessing 'product_code' fails because key is '\ufeffproduct_code'

The Solution: UTF-8-sig Encoding

Python’s UTF-8-sig Encoding

Python provides the utf-8-sig encoding that automatically handles BOM:

  • With BOM: Removes BOM when reading file
  • Without BOM: Behaves like standard UTF-8

Implementing the Fix

Modify the file opening code to use utf-8-sig encoding:

 1# Before (fails with BOM)
 2import csv
 3
 4with open('new_products.csv', 'r') as file:
 5    reader = csv.DictReader(file)
 6    for row in reader:
 7        update_data(row)
 8
 9# After (handles BOM correctly)
10import csv
11
12with open('new_products.csv', 'r', encoding='utf-8-sig') as file:
13    reader = csv.DictReader(file)
14    for row in reader:
15        update_data(row)

Complete Fixed Script

 1import csv
 2import sqlite3
 3
 4def update_data(row):
 5    """Update database with product information."""
 6    code = row['product_code']
 7    description = row['description']
 8
 9    # Update database (simplified)
10    conn = sqlite3.connect('products.db')
11    cursor = conn.cursor()
12    cursor.execute(
13        "UPDATE products SET description = ? WHERE code = ?",
14        (description, code)
15    )
16    conn.commit()
17    conn.close()
18
19def main():
20    """Main function to process CSV file."""
21    import sys
22
23    if len(sys.argv) < 2:
24        print("Usage: update_products.py <csv_file>")
25        sys.exit(1)
26
27    filename = sys.argv[1]
28
29    # Fix: Use utf-8-sig encoding to handle BOM
30    with open(filename, 'r', encoding='utf-8-sig') as file:
31        reader = csv.DictReader(file)
32        for row in reader:
33            update_data(row)
34
35    print("Database updated successfully")
36
37if __name__ == '__main__':
38    main()

Testing the Fix

Verification Steps

1# Test with file containing BOM
2python3 update_products.py new_products.csv
3# Database updated successfully
4
5# Test with file without BOM
6python3 update_products.py standard_products.csv
7# Database updated successfully

Testing Matrix

File TypeBOM PresentResult with utf-8Result with utf-8-sig
UTF-8 with BOMYesKeyErrorSuccess
UTF-8 without BOMNoSuccessSuccess
ASCIINoSuccessSuccess

Advanced PDB Features

Breakpoints

Set breakpoints to pause execution at specific lines:

1# In code: Add breakpoint()
2def update_data(row):
3    breakpoint()  # Python 3.7+
4    code = row['product_code']
5    # ...
 1# In PDB: Set breakpoint at line number
 2(Pdb) break 10
 3Breakpoint 1 at /path/to/script.py:10
 4
 5# Set breakpoint in function
 6(Pdb) break update_data
 7Breakpoint 2 at /path/to/script.py:5
 8
 9# List breakpoints
10(Pdb) break

Watchpoints (Conditional Breakpoints)

1# Break when condition is true
2(Pdb) break 10, row['product_code'] == 'PROD001'
3
4# Continue until expression changes
5(Pdb) display row['product_code']

Stepping Through Code

CommandBehavior
stepStep into function calls
nextExecute line without entering functions
returnContinue until current function returns
untilContinue until line greater than current

Additional PDB Commands

 1# Show arguments of current function
 2(Pdb) args
 3
 4# Execute Python code
 5(Pdb) !variable = new_value
 6
 7# Show all variables in current scope
 8(Pdb) pp locals()
 9
10# Show source code
11(Pdb) list 1, 20
12
13# Jump to different line (use with caution)
14(Pdb) jump 15

Debugging Best Practices

When to Use PDB

  • Exception message is unclear
  • Need to inspect variable contents at crash time
  • Intermittent failures requiring step-by-step analysis
  • Complex logic with multiple conditional paths

Debugging Workflow

StepAction
1Read exception message and traceback
2Identify function and line where exception occurred
3Launch PDB with script and arguments
4Use continue to run until exception
5Examine variables with print
6Investigate unexpected values
7Research error patterns online
8Implement fix
9Test with original failing case
10Test with other cases to ensure no regression

Alternative Debugging Approaches

 1# Add print statements (quick debugging)
 2def update_data(row):
 3    print(f"DEBUG: row = {row}")
 4    print(f"DEBUG: keys = {list(row.keys())}")
 5    code = row['product_code']
 6
 7# Use logging module (production debugging)
 8import logging
 9logging.basicConfig(level=logging.DEBUG)
10
11def update_data(row):
12    logging.debug(f"Processing row: {row}")
13    code = row['product_code']
14
15# Use try-except for better error messages
16def update_data(row):
17    try:
18        code = row['product_code']
19    except KeyError:
20        print(f"KeyError: Available keys are {list(row.keys())}")
21        raise

Common CSV Encoding Issues

Detecting Encoding Problems

 1# Check file encoding with file command
 2file -i data.csv
 3# data.csv: text/csv; charset=utf-8
 4
 5# View raw bytes of file start
 6hexdump -C data.csv | head
 7# Look for EF BB BF (UTF-8 BOM)
 8
 9# Try different encodings
10python3 -c "print(open('data.csv', 'rb').read()[:10])"

Common Encodings for CSV Files

EncodingWhen to Use
utf-8Standard UTF-8 files without BOM
utf-8-sigUTF-8 files that may have BOM
latin-1Western European legacy files
cp1252Windows Western European files
iso-8859-1Unix/Linux Western European files

Conclusion

Debugging Python exceptions involves analyzing tracebacks to identify error locations, using the PDB debugger to inspect variable contents at crash time, and investigating unexpected values that cause exceptions. The Byte Order Mark (BOM) in UTF-8 files can cause KeyError exceptions when parsing CSV files. Using Python’s utf-8-sig encoding automatically handles BOM presence, making code compatible with files from various sources. Advanced PDB features like breakpoints and watchpoints enable sophisticated debugging strategies for complex issues.


FAQ