Dealing With Memory Leaks

November 11, 2025 17 min read Programming Troubleshooting Resources Docs Automation-With-Python Memory-Leaks Profiling Debugging Python

This document demonstrates practical memory leak diagnosis and resolution through real-world examples using Python memory profilers. It covers identifying memory consumption patterns in applications, analyzing memory usage with tools like memory_profiler, and fixing code that unnecessarily retains data in memory causing resource exhaustion.

On this page

This document provides hands-on investigation of memory leaks through practical examples, demonstrating how applications exhaust memory through excessive allocations. It walks through using monitoring tools like top to identify memory growth patterns, applying Python's memory_profiler to pinpoint problematic code lines, and resolving issues where programs unnecessarily retain full data structures instead of minimal references.

Understanding Memory Requests

Why Applications Request Memory

There are many reasons why an application may request a lot of memory. Sometimes it’s what’s needed for the program to complete its task. Sometimes it’s caused by a part of the software misbehaving.

Legitimate vs problematic memory usage:

Memory Usage Type	Cause	Characteristics	Action Required
Legitimate	Task requirements	Stable or predictable growth	None (accept or optimize)
Cache accumulation	Performance optimization	Growing but bounded	Monitor limits
Memory leak	Programming error	Continuous unbounded growth	Debug and fix
Excessive retention	Poor design	Holds more than needed	Refactor code

Common memory request reasons:

Reason	Example	Typical Size	Management
Data processing	Loading file into memory	MB to GB	Release after processing
Caching	Store frequently accessed data	MB	Implement eviction policy
Buffering	Network or file buffers	KB to MB	Auto-managed usually
Collections growth	Lists, dictionaries growing	Variable	Monitor size
Object retention	Keeping unnecessary references	Accumulates	Explicit cleanup

Demonstrating Memory Leak: Terminal Scroll Buffer

Triggering the Misbehavior

First, triggering the misbehavior demonstrates what this looks like. A terminal called uxterm will be used for that, configured with a really, really long scroll buffer.

Terminal scroll buffer mechanics:

Aspect	Description	Memory Impact
Scroll buffer	Stores command history and output	Grows with content
Storage location	RAM	Continuous allocation
Default size	Usually limited (1000-10000 lines)	Manageable
Configured size	Can be unlimited	Potentially dangerous
Content retention	Kept until buffer full or cleared	Persistent memory use

The scroll buffer is that nifty feature that lets scrolling up to see things that were executed and their output. The contents of the buffer are kept in memory.

Creating Rapid Memory Growth

If the buffer is made really long and manages to fill it, it will cause the computer to run out of memory. With normal use, it might take ages until that happens. But if a command is run that keeps generating a lot of output, filling that buffer pretty quickly can be managed.

Output generation command:

1# Generate continuous random output
2od -cx /dev/urandom

Command breakdown:

Component	Purpose	Result
`od`	Octal dump utility	Formats binary data
`-c`	Display as characters	Character representation
`-x`	Display as hexadecimal	Hex representation
`/dev/urandom`	Random number device	Infinite random data source

This command will take the random numbers generated by the urandom device and show them as both characters and hexadecimal numbers. Since the urandom device keeps giving more and more random numbers, it will just keep going.

Memory consumption pattern:

Time Elapsed	Lines Generated	Memory Used	Status
10 seconds	~10,000	~10 MB	Normal
1 minute	~60,000	~60 MB	Noticeable
5 minutes	~300,000	~300 MB	High
10 minutes	~600,000	~600 MB	Critical

The command is filling up the scroll buffer, making the computer require more and more memory.

Monitoring Memory with top

Opening top for Analysis

In a different terminal, opening top to check out what’s going on is the next step.

top command basics:

1# Launch top
2top
3
4# Launch with specific refresh rate (2 seconds)
5top -d 2
6
7# Launch sorted by memory
8top -o %MEM

Sorting by Memory Usage

Pressing Shift+M tells top that ordering the programs by how much memory they are using is desired.

top keyboard shortcuts for memory analysis:

Key	Action	Purpose
Shift+M	Sort by memory	Find memory hogs
Shift+P	Sort by CPU	Find CPU hogs
k	Kill process	Terminate problematic process
r	Renice process	Change priority
1	Toggle CPU cores	Per-core view
m	Toggle memory display	Memory graphs

The percentage of memory used by xterm can be seen going up super quickly.

Observing memory growth:

Observation Interval	Memory %	Growth Rate	Interpretation
Initial	2%	Baseline	Normal startup
After 10s	5%	+3% in 10s	Rapid growth
After 30s	15%	+10% in 20s	Accelerating
After 60s	35%	+20% in 30s	Critical leak

Stopping the Process

Stopping the process that’s filling up the buffer by pressing Control+C is the next action.

Process control:

Signal	Key Combination	Effect	Memory Impact
SIGINT	Ctrl+C	Interrupt process	Stops growth
SIGTERM	`kill PID`	Terminate gracefully	May release memory
SIGKILL	`kill -9 PID`	Force kill	Immediate release

With that, the command that was filling the buffer is stopped. But the terminal still has that memory allocated, storing all the lines in the scroll buffer.

Important
Stopping a process that generates output doesn’t immediately free memory consumed by scroll buffers. The terminal application still holds all previously generated content in RAM until the buffer is cleared or the terminal is closed.

Understanding top Memory Columns

Detailed Column Analysis

Looking at the output of top in a bit more detail reveals a bunch of different columns with data about each process.

top memory column definitions:

Column	Full Name	Meaning	Typical Values
RES	Resident Memory	Physical RAM used by process	MB to GB
SHR	Shared Memory	Memory shared with other processes	KB to MB
VIRT	Virtual Memory	Total virtual memory allocated	Often very large
%MEM	Memory Percentage	Percentage of total RAM	0.1% to 90%

RES Column

The column labeled RES is the dynamic memory that’s preserved for the specific process.

RES memory characteristics:

Aspect	Description	Monitoring Priority
Definition	Actual physical RAM in use	High
Excludes	Swapped memory	Focus on this first
Problem indicator	Continuously growing	Memory leak
Normal behavior	Stable or bounded fluctuation	Healthy

SHR Column

The one labeled SHR is for memory that’s shared across processes.

SHR memory examples:

Shared Resource	Type	Typical Size	Purpose
System libraries	Code	10-100 MB	Common functions
Shared buffers	Data	Variable	IPC communication
Memory-mapped files	Files	File size	Efficient file access
Database caches	Data	GB	Performance

VIRT Column

The one labeled VIRT lists all the virtual memory allocated for each process. This includes process-specific memory, shared memory, and other shared resources that are stored on disk but mapped into the memory of the process.

VIRT memory components:

Component	Description	Disk-backed	Critical if High
Process memory	Private allocations	No	Yes
Shared libraries	System libraries	Yes	No
Memory-mapped files	Files accessed as memory	Yes	No
Swap space	Paged out memory	Yes	Sometimes

It’s usually fine for a process to have a high value in the VIRT column. The one that usually indicates a problem is the RES column.

Memory troubleshooting priority:

Column	Problem Indicator	Action Priority	Reason
RES growing	Yes	High	Actual RAM exhaustion
VIRT growing	Maybe	Medium	Could be memory-mapped files
SHR growing	Rarely	Low	Usually system-managed
%MEM high	Yes	High	Impacts entire system

Releasing Memory

Closing the other terminal releases all the memory that it reserved.

Memory release mechanisms:

Action	Memory Released	Speed	Completeness
Stop process	Command buffers	Immediate	Partial
Clear buffer	Scroll history	Immediate	Partial
Close terminal	All terminal memory	Immediate	Complete
OS reclaim	Process exits	Automatic	Complete

Example Analysis Summary

In this example, what a program that keeps requesting more and more memory looks like was demonstrated. This was a super extreme example. Most memory leaks don’t happen at this speed.

Memory leak speed comparison:

Leak Type	Growth Rate	Detection Time	Example
Extreme (demo)	MB per second	Minutes	Terminal scroll buffer
Fast	MB per hour	Hours to days	Logging without rotation
Moderate	MB per day	Days to weeks	Cache without limits
Slow	MB per week	Weeks to months	Small object accumulation

It can usually take a long while until noticing that a program is taking more memory than it should. It might be hard to tell the difference between memory that’s actually needed and memory that’s being wasted.

Distinguishing legitimate vs wasted memory:

Indicator	Legitimate	Wasted (Leak)
Growth pattern	Plateaus at some point	Continuous unbounded growth
Relationship to work	Proportional to data size	Independent of work done
After completion	Releases most memory	Retains high memory
Restart behavior	Similar pattern	Repeats growth pattern

But looking at the output of top and comparing it to what it used to be a while back is usually how any investigation into a memory leak starts.

Word Frequency Script Example

The Problem Scenario

Looking at a different example, there’s a script that analyzes the frequency of words in web pages. This script works fine when it’s just a few web pages, but if all Wikipedia content is provided, it starts using up all the memory.

Script behavior by scale:

Input Scale	Articles	Memory Usage	Status
Small test	5-10	<100 MB	Works fine
Medium test	100	~500 MB	Acceptable
Large test	1,000	~5 GB	Slow but works
Full Wikipedia	Millions	>32 GB	Runs out of memory

The script is running, and it will take a long while to finish, processing a huge amount of articles after all.

Multiprocessing Observation

While this is running, looking at the output of top in a different terminal reveals findings. There are a bunch of different content stats processes running.

Multiprocessing memory pattern:

Aspect	Observation	Implication
Process count	Multiple workers	Using multiprocessing
Individual memory	Each process growing	Not shared data problem
Total memory	Sum of all processes	Multiplied impact
One process growing fast	Specific worker	Leak in worker code

That’s because the script is using the multiprocessing techniques to parallelize the processing of the information and get the results as fast as possible.

Multiprocessing memory characteristics:

Single Process	Multiprocess (4 workers)
5 GB leak	4 × 5 GB = 20 GB leak
Easier to profile	Harder to profile
Single RES column	Multiple RES columns
Simpler debugging	Complex debugging

It seems like these scripts are taking a lot of memory, so sorting to see the details is the next step. The memory used by one of the processes in particular keeps growing and growing.

Analyzing the Issue

The application is processing a bunch of data and generating a dictionary with it, so it’s expected that it will use some memory but not this much.

Expected vs actual memory usage:

Component	Expected Memory	Actual Memory	Ratio
Article text	10 KB	10 KB	1×
Word dictionary	100 KB	100 KB	1×
Article storage	0 (should be discarded)	10 KB × articles	∞
Total (1000 articles)	~100 MB	~10 GB	100×

This looks like the program is storing more than it should in memory.

Warning
When processing large datasets, failing to release intermediate data structures after use can cause memory consumption to grow proportionally with input size rather than remaining constant, leading to inevitable exhaustion on sufficiently large inputs.

Using Memory Profiler

Why Use a Profiler

This program is pretty complex, so the help of a memory profiler is needed to figure out what the problem is.

When to use memory profilers:

Scenario	Manual Analysis	Memory Profiler
Simple script	Feasible	Overkill
Complex application	Difficult	Recommended
Multiprocess app	Very hard	Required (simplified version)
Production system	Limited info	Detailed insights

Profiling the memory of a multiprocess application is extra hard. Instead of processing all the articles, just handling a few allows checking out the memory consumption quickly.

Simplified Script Setup

Profiling strategy:

Original Script	Simplified for Profiling
Multiprocessing	Single process
All Wikipedia	50 articles
Hours to complete	Minutes to complete
Hard to profile	Easy to profile

Opening the simplified script for examination is the next step. The memory_profiler module will be used, one of the many different memory profilers available for Python.

Python memory profiler alternatives:

Profiler	Granularity	Overhead	Best For
memory_profiler	Line-by-line	High	Development debugging
tracemalloc	Snapshot-based	Low	Production monitoring
pympler	Object-level	Medium	Object tracking
objgraph	Reference tracking	Medium	Circular reference detection

Adding the Decorator

A @profile label has been added before the main function definition to tell the profiler that analyzing the memory consumption of it is desired.

Decorator implementation:

 1from memory_profiler import profile
 2
 3@profile
 4def main():
 5    """Process articles and analyze word frequency"""
 6    articles = fetch_articles(limit=50)
 7    word_freq = {}
 8    article_refs = []  # Problem: stores full articles
 9
10    for article in articles:
11        text = article.get_text()
12        words = text.split()
13
14        for word in words:
15            if word not in word_freq:
16                word_freq[word] = []
17            # BUG: Storing entire article instead of reference
18            word_freq[word].append(article)
19
20    return word_freq
21
22if __name__ == '__main__':
23    main()

Decorator characteristics:

Aspect	Description	Impact
Type	Python decorator	Adds behavior without code modification
Syntax	`@profile`	Applied above function
Purpose	Mark function for profiling	Tells profiler what to analyze
Scope	Specific functions	Can profile multiple functions

This type of label is called a decorator, and it’s used in Python to add extra behavior to functions without having to modify the code. In this case, the extra behavior is measuring the use of memory.

Script Simplification

The rest of the code is basically the same as the original one. It just uses a single process and is limited to 50 articles instead of the thousands of articles that the other script was going through.

Running the Memory Profiler

Execution

Running the script with the memory profiler enabled is next.

Memory profiler execution:

1# Install memory_profiler if needed
2pip install memory_profiler
3
4# Run script with profiling
5python -m memory_profiler content_stats_simplified.py
6
7# Alternative: using mprof for more features
8mprof run content_stats_simplified.py
9mprof plot  # Generate graph

Execution characteristics:

Aspect	Normal Execution	With Profiler
Speed	Fast	10-100× slower
Memory usage	Normal	Slightly higher
Output	Program results	+ Memory statistics
Purpose	Production use	Development debugging

This is just reading through 50 articles, but it takes a bunch of time because all that memory profiling makes the script slower.

Profiler Output

Once the program finishes, the memory profiler gives information about which lines are adding or removing data from the memory used by the program.

Memory profiler output format:

 1Line #    Mem usage    Increment  Occurences   Line Contents
 2============================================================
 3     5     45.2 MiB     45.2 MiB           1   @profile
 4     6                                         def main():
 5     7     45.3 MiB      0.1 MiB           1       articles = fetch_articles(50)
 6     8     45.3 MiB      0.0 MiB           1       word_freq = {}
 7     9     45.3 MiB      0.0 MiB           1       article_refs = []
 8    10
 9    11     49.5 MiB      4.2 MiB          51       for article in articles:
10    12     52.8 MiB      3.3 MiB          50           text = article.get_text()
11    13     53.1 MiB      0.3 MiB          50           words = text.split()
12    14
13    15    130.5 MiB     77.4 MiB       50000           for word in words:
14    16    130.5 MiB      0.0 MiB       45000               if word not in word_freq:
15    17    130.5 MiB      0.0 MiB       12000                   word_freq[word] = []
16    18    130.5 MiB      0.0 MiB       50000               word_freq[word].append(article)

Output column explanation:

Column	Meaning	Use
Line #	Source code line number	Locate code
Mem usage	Total memory after line executes	Track growth
Increment	Memory added by this line	Find culprits
Occurrences	How many times line ran	Understand loops
Line Contents	The actual code	See what it does

The first column shows the amount of memory required when each line gets executed. The second one shows the increase in memory for each specific line.

Identifying the Problem

After going through 50 articles, the program already took 130 megabytes can be observed. No wonder memory ran out when trying to process all the articles.

Memory consumption analysis:

Metric	Value	Projection for All Wikipedia
Articles processed	50	~6,000,000
Memory used	130 MB	~15,600 GB (15.6 TB)
Per article	2.6 MB	Same ratio
Expected per article	~100 KB	26× too much

The variables that require the most memory are article and text with about four and three megabytes respectively.

Variable memory breakdown:

Variable	Memory per Instance	Instances	Total	Should Release?
`article`	4 MB	1 (current)	4 MB	After processing
`text`	3 MB	1 (current)	3 MB	After processing
`word_freq`	Variable	1	Growing	Keep (needed)
`article_refs`	4 MB × 50	50	200 MB	BUG: Should not keep

Those are the articles being processed, and it’s fine for them to take space while counting the words in the article. But once processing one article is done, that memory shouldn’t be kept around.

Spotting and Fixing the Bug

The Critical Question

Can the problem be spotted? Right at the end, the code is storing the article to keep a reference to it, but it’s storing the whole article.

Problematic code pattern:

1# BAD: Storing entire article
2for word in words:
3    if word not in word_freq:
4        word_freq[word] = []
5    word_freq[word].append(article)  # Stores full 4 MB article object

What’s being stored:

What’s Stored	Size	Necessary?	Issue
Full article object	4 MB	No	Excessive memory
Article title	50 bytes	Yes	Minimal impact
Article ID	8 bytes	Yes	Minimal impact
Article index	4 bytes	Yes	Minimal impact

The Solution

If keeping a reference to all the articles that include a word is desired, titles or index entries could be stored, definitely not the whole contents.

Corrected code:

1# GOOD: Storing only article title/ID
2for word in words:
3    if word not in word_freq:
4        word_freq[word] = []
5    word_freq[word].append(article.title)  # Stores only ~50 byte string

Before vs after comparison:

Approach	Memory per Reference	50 Articles	1000 Articles	1M Articles
Store full article	4 MB	200 MB	4 GB	4 TB
Store title only	50 bytes	2.5 KB	50 KB	50 MB
Savings	99.999%	99.999%	99.999%	99.999%

Alternative reference strategies:

Reference Type	Size	Retrieval	Best For
Title string	20-100 bytes	Lookup by title	Human-readable
Numeric ID	4-8 bytes	Database query	Memory efficiency
Index number	4 bytes	Array access	Fast lookup
URL	50-200 bytes	HTTP fetch	Distributed systems

Note
When building reference structures like dictionaries or indices, store only the minimal identifier needed to retrieve the full object later, not the complete object itself. This prevents memory consumption from scaling with the number of references rather than the number of unique objects.

Key Takeaways from the Examples

Lessons Learned

Memory leak investigation process:

Step	Tool	What to Look For	Action
1. Detect	top, system monitor	Growing RES memory	Note which process
2. Confirm	Multiple observations	Continuous growth	Verify it’s a leak
3. Simplify	Code modification	Reduce to test case	Make profileable
4. Profile	memory_profiler	Line-by-line usage	Find hotspots
5. Analyze	Profiler output	Largest increments	Identify cause
6. Fix	Code changes	Remove retention	Test improvement
7. Verify	Re-profile	Stable memory	Confirm fix

Common Memory Retention Patterns

Patterns that cause excessive memory retention:

Pattern	Problem	Fix
Storing full objects in indices	References keep objects alive	Store IDs/titles only
Never clearing collections	Lists/dicts grow indefinitely	Implement size limits
Caching without eviction	Cache grows forever	Use LRU cache
Keeping processed data	Don’t release after use	Delete when done
Circular references	Objects reference each other	Break references explicitly

Profiling Considerations

Memory profiling trade-offs:

Consideration	Impact	Mitigation
Performance overhead	10-100× slower	Profile simplified version
Multiprocess complexity	Very hard to profile	Use single-process version
Production profiling	Risk to live system	Profile in development/staging
Sample size	Too small misses issue	Balance size vs. profiling time

Conclusion

Applications request memory for various reasons ranging from legitimate task requirements and performance caching to programming errors causing memory leaks, with the critical distinction being whether memory usage plateaus at a reasonable level for the work being done or continues growing unboundedly regardless of workload. The terminal scroll buffer example demonstrated extreme memory leak behavior where running od -cx /dev/urandom generated continuous output filling an unlimited buffer, causing memory consumption visible in top’s RES column growing from normal levels to critical exhaustion in minutes, illustrating that while most memory leaks occur much more slowly over days or weeks, the investigation pattern remains consistent: monitoring memory over time and comparing to historical baselines. Understanding top’s memory columns is essential, with RES showing actual physical RAM in use being the primary problem indicator, SHR displaying shared memory across processes typically system-managed, and VIRT listing total virtual memory including disk-backed resources where high values are usually acceptable, making RES the focused metric for leak detection. The word frequency script example revealed how a program working fine with small inputs exhausted memory on large datasets, using multiprocessing that multiplied the memory leak impact across workers, with one process showing continuous growth because the application unnecessarily stored full 4MB article objects in word frequency dictionaries instead of minimal title or ID references, demonstrating how legitimate data structure memory can be dwarfed by improper retention of intermediate processing objects. Using Python’s memory_profiler with the @profile decorator on a simplified single-process version processing just 50 articles instead of millions allowed line-by-line analysis showing 130MB consumption that would project to terabytes for full Wikipedia, with the profiler output revealing that while article and text variables legitimately consumed 4MB and 3MB during processing, the bug was storing entire article objects in word_freq dictionary values rather than just titles, changing from 4MB per reference to 50 bytes achieving 99.999% memory savings and transforming scaling from terabytes to megabytes for large datasets.

FAQ

Network Saturation

Important Tasks

Browse Courses