Dealing With Memory Leaks

This document demonstrates practical memory leak diagnosis and resolution through real-world examples using Python memory profilers. It covers identifying memory consumption patterns in applications, analyzing memory usage with tools like memory_profiler, and fixing code that unnecessarily retains data in memory causing resource exhaustion.

This document provides hands-on investigation of memory leaks through practical examples, demonstrating how applications exhaust memory through excessive allocations. It walks through using monitoring tools like top to identify memory growth patterns, applying Python's memory_profiler to pinpoint problematic code lines, and resolving issues where programs unnecessarily retain full data structures instead of minimal references.


Understanding Memory Requests

Why Applications Request Memory

There are many reasons why an application may request a lot of memory. Sometimes it’s what’s needed for the program to complete its task. Sometimes it’s caused by a part of the software misbehaving.

Legitimate vs problematic memory usage:

Memory Usage TypeCauseCharacteristicsAction Required
LegitimateTask requirementsStable or predictable growthNone (accept or optimize)
Cache accumulationPerformance optimizationGrowing but boundedMonitor limits
Memory leakProgramming errorContinuous unbounded growthDebug and fix
Excessive retentionPoor designHolds more than neededRefactor code

Common memory request reasons:

ReasonExampleTypical SizeManagement
Data processingLoading file into memoryMB to GBRelease after processing
CachingStore frequently accessed dataMBImplement eviction policy
BufferingNetwork or file buffersKB to MBAuto-managed usually
Collections growthLists, dictionaries growingVariableMonitor size
Object retentionKeeping unnecessary referencesAccumulatesExplicit cleanup

Demonstrating Memory Leak: Terminal Scroll Buffer

Triggering the Misbehavior

First, triggering the misbehavior demonstrates what this looks like. A terminal called uxterm will be used for that, configured with a really, really long scroll buffer.

Terminal scroll buffer mechanics:

AspectDescriptionMemory Impact
Scroll bufferStores command history and outputGrows with content
Storage locationRAMContinuous allocation
Default sizeUsually limited (1000-10000 lines)Manageable
Configured sizeCan be unlimitedPotentially dangerous
Content retentionKept until buffer full or clearedPersistent memory use

The scroll buffer is that nifty feature that lets scrolling up to see things that were executed and their output. The contents of the buffer are kept in memory.

Creating Rapid Memory Growth

If the buffer is made really long and manages to fill it, it will cause the computer to run out of memory. With normal use, it might take ages until that happens. But if a command is run that keeps generating a lot of output, filling that buffer pretty quickly can be managed.

Output generation command:

1# Generate continuous random output
2od -cx /dev/urandom

Command breakdown:

ComponentPurposeResult
odOctal dump utilityFormats binary data
-cDisplay as charactersCharacter representation
-xDisplay as hexadecimalHex representation
/dev/urandomRandom number deviceInfinite random data source

This command will take the random numbers generated by the urandom device and show them as both characters and hexadecimal numbers. Since the urandom device keeps giving more and more random numbers, it will just keep going.

Memory consumption pattern:

Time ElapsedLines GeneratedMemory UsedStatus
10 seconds~10,000~10 MBNormal
1 minute~60,000~60 MBNoticeable
5 minutes~300,000~300 MBHigh
10 minutes~600,000~600 MBCritical

The command is filling up the scroll buffer, making the computer require more and more memory.


Monitoring Memory with top

Opening top for Analysis

In a different terminal, opening top to check out what’s going on is the next step.

top command basics:

1# Launch top
2top
3
4# Launch with specific refresh rate (2 seconds)
5top -d 2
6
7# Launch sorted by memory
8top -o %MEM

Sorting by Memory Usage

Pressing Shift+M tells top that ordering the programs by how much memory they are using is desired.

top keyboard shortcuts for memory analysis:

KeyActionPurpose
Shift+MSort by memoryFind memory hogs
Shift+PSort by CPUFind CPU hogs
kKill processTerminate problematic process
rRenice processChange priority
1Toggle CPU coresPer-core view
mToggle memory displayMemory graphs

The percentage of memory used by xterm can be seen going up super quickly.

Observing memory growth:

Observation IntervalMemory %Growth RateInterpretation
Initial2%BaselineNormal startup
After 10s5%+3% in 10sRapid growth
After 30s15%+10% in 20sAccelerating
After 60s35%+20% in 30sCritical leak

Stopping the Process

Stopping the process that’s filling up the buffer by pressing Control+C is the next action.

Process control:

SignalKey CombinationEffectMemory Impact
SIGINTCtrl+CInterrupt processStops growth
SIGTERMkill PIDTerminate gracefullyMay release memory
SIGKILLkill -9 PIDForce killImmediate release

With that, the command that was filling the buffer is stopped. But the terminal still has that memory allocated, storing all the lines in the scroll buffer.


Understanding top Memory Columns

Detailed Column Analysis

Looking at the output of top in a bit more detail reveals a bunch of different columns with data about each process.

top memory column definitions:

ColumnFull NameMeaningTypical Values
RESResident MemoryPhysical RAM used by processMB to GB
SHRShared MemoryMemory shared with other processesKB to MB
VIRTVirtual MemoryTotal virtual memory allocatedOften very large
%MEMMemory PercentagePercentage of total RAM0.1% to 90%

RES Column

The column labeled RES is the dynamic memory that’s preserved for the specific process.

RES memory characteristics:

AspectDescriptionMonitoring Priority
DefinitionActual physical RAM in useHigh
ExcludesSwapped memoryFocus on this first
Problem indicatorContinuously growingMemory leak
Normal behaviorStable or bounded fluctuationHealthy

SHR Column

The one labeled SHR is for memory that’s shared across processes.

SHR memory examples:

Shared ResourceTypeTypical SizePurpose
System librariesCode10-100 MBCommon functions
Shared buffersDataVariableIPC communication
Memory-mapped filesFilesFile sizeEfficient file access
Database cachesDataGBPerformance

VIRT Column

The one labeled VIRT lists all the virtual memory allocated for each process. This includes process-specific memory, shared memory, and other shared resources that are stored on disk but mapped into the memory of the process.

VIRT memory components:

ComponentDescriptionDisk-backedCritical if High
Process memoryPrivate allocationsNoYes
Shared librariesSystem librariesYesNo
Memory-mapped filesFiles accessed as memoryYesNo
Swap spacePaged out memoryYesSometimes

It’s usually fine for a process to have a high value in the VIRT column. The one that usually indicates a problem is the RES column.

Memory troubleshooting priority:

ColumnProblem IndicatorAction PriorityReason
RES growingYesHighActual RAM exhaustion
VIRT growingMaybeMediumCould be memory-mapped files
SHR growingRarelyLowUsually system-managed
%MEM highYesHighImpacts entire system

Releasing Memory

Closing the other terminal releases all the memory that it reserved.

Memory release mechanisms:

ActionMemory ReleasedSpeedCompleteness
Stop processCommand buffersImmediatePartial
Clear bufferScroll historyImmediatePartial
Close terminalAll terminal memoryImmediateComplete
OS reclaimProcess exitsAutomaticComplete

Example Analysis Summary

In this example, what a program that keeps requesting more and more memory looks like was demonstrated. This was a super extreme example. Most memory leaks don’t happen at this speed.

Memory leak speed comparison:

Leak TypeGrowth RateDetection TimeExample
Extreme (demo)MB per secondMinutesTerminal scroll buffer
FastMB per hourHours to daysLogging without rotation
ModerateMB per dayDays to weeksCache without limits
SlowMB per weekWeeks to monthsSmall object accumulation

It can usually take a long while until noticing that a program is taking more memory than it should. It might be hard to tell the difference between memory that’s actually needed and memory that’s being wasted.

Distinguishing legitimate vs wasted memory:

IndicatorLegitimateWasted (Leak)
Growth patternPlateaus at some pointContinuous unbounded growth
Relationship to workProportional to data sizeIndependent of work done
After completionReleases most memoryRetains high memory
Restart behaviorSimilar patternRepeats growth pattern

But looking at the output of top and comparing it to what it used to be a while back is usually how any investigation into a memory leak starts.


Word Frequency Script Example

The Problem Scenario

Looking at a different example, there’s a script that analyzes the frequency of words in web pages. This script works fine when it’s just a few web pages, but if all Wikipedia content is provided, it starts using up all the memory.

Script behavior by scale:

Input ScaleArticlesMemory UsageStatus
Small test5-10<100 MBWorks fine
Medium test100~500 MBAcceptable
Large test1,000~5 GBSlow but works
Full WikipediaMillions>32 GBRuns out of memory

The script is running, and it will take a long while to finish, processing a huge amount of articles after all.

Multiprocessing Observation

While this is running, looking at the output of top in a different terminal reveals findings. There are a bunch of different content stats processes running.

Multiprocessing memory pattern:

AspectObservationImplication
Process countMultiple workersUsing multiprocessing
Individual memoryEach process growingNot shared data problem
Total memorySum of all processesMultiplied impact
One process growing fastSpecific workerLeak in worker code

That’s because the script is using the multiprocessing techniques to parallelize the processing of the information and get the results as fast as possible.

Multiprocessing memory characteristics:

Single ProcessMultiprocess (4 workers)
5 GB leak4 × 5 GB = 20 GB leak
Easier to profileHarder to profile
Single RES columnMultiple RES columns
Simpler debuggingComplex debugging

It seems like these scripts are taking a lot of memory, so sorting to see the details is the next step. The memory used by one of the processes in particular keeps growing and growing.

Analyzing the Issue

The application is processing a bunch of data and generating a dictionary with it, so it’s expected that it will use some memory but not this much.

Expected vs actual memory usage:

ComponentExpected MemoryActual MemoryRatio
Article text10 KB10 KB
Word dictionary100 KB100 KB
Article storage0 (should be discarded)10 KB × articles
Total (1000 articles)~100 MB~10 GB100×

This looks like the program is storing more than it should in memory.


Using Memory Profiler

Why Use a Profiler

This program is pretty complex, so the help of a memory profiler is needed to figure out what the problem is.

When to use memory profilers:

ScenarioManual AnalysisMemory Profiler
Simple scriptFeasibleOverkill
Complex applicationDifficultRecommended
Multiprocess appVery hardRequired (simplified version)
Production systemLimited infoDetailed insights

Profiling the memory of a multiprocess application is extra hard. Instead of processing all the articles, just handling a few allows checking out the memory consumption quickly.

Simplified Script Setup

Profiling strategy:

Original ScriptSimplified for Profiling
MultiprocessingSingle process
All Wikipedia50 articles
Hours to completeMinutes to complete
Hard to profileEasy to profile

Opening the simplified script for examination is the next step. The memory_profiler module will be used, one of the many different memory profilers available for Python.

Python memory profiler alternatives:

ProfilerGranularityOverheadBest For
memory_profilerLine-by-lineHighDevelopment debugging
tracemallocSnapshot-basedLowProduction monitoring
pymplerObject-levelMediumObject tracking
objgraphReference trackingMediumCircular reference detection

Adding the Decorator

A @profile label has been added before the main function definition to tell the profiler that analyzing the memory consumption of it is desired.

Decorator implementation:

 1from memory_profiler import profile
 2
 3@profile
 4def main():
 5    """Process articles and analyze word frequency"""
 6    articles = fetch_articles(limit=50)
 7    word_freq = {}
 8    article_refs = []  # Problem: stores full articles
 9
10    for article in articles:
11        text = article.get_text()
12        words = text.split()
13
14        for word in words:
15            if word not in word_freq:
16                word_freq[word] = []
17            # BUG: Storing entire article instead of reference
18            word_freq[word].append(article)
19
20    return word_freq
21
22if __name__ == '__main__':
23    main()

Decorator characteristics:

AspectDescriptionImpact
TypePython decoratorAdds behavior without code modification
Syntax@profileApplied above function
PurposeMark function for profilingTells profiler what to analyze
ScopeSpecific functionsCan profile multiple functions

This type of label is called a decorator, and it’s used in Python to add extra behavior to functions without having to modify the code. In this case, the extra behavior is measuring the use of memory.

Script Simplification

The rest of the code is basically the same as the original one. It just uses a single process and is limited to 50 articles instead of the thousands of articles that the other script was going through.


Running the Memory Profiler

Execution

Running the script with the memory profiler enabled is next.

Memory profiler execution:

1# Install memory_profiler if needed
2pip install memory_profiler
3
4# Run script with profiling
5python -m memory_profiler content_stats_simplified.py
6
7# Alternative: using mprof for more features
8mprof run content_stats_simplified.py
9mprof plot  # Generate graph

Execution characteristics:

AspectNormal ExecutionWith Profiler
SpeedFast10-100× slower
Memory usageNormalSlightly higher
OutputProgram results+ Memory statistics
PurposeProduction useDevelopment debugging

This is just reading through 50 articles, but it takes a bunch of time because all that memory profiling makes the script slower.

Profiler Output

Once the program finishes, the memory profiler gives information about which lines are adding or removing data from the memory used by the program.

Memory profiler output format:

 1Line #    Mem usage    Increment  Occurences   Line Contents
 2============================================================
 3     5     45.2 MiB     45.2 MiB           1   @profile
 4     6                                         def main():
 5     7     45.3 MiB      0.1 MiB           1       articles = fetch_articles(50)
 6     8     45.3 MiB      0.0 MiB           1       word_freq = {}
 7     9     45.3 MiB      0.0 MiB           1       article_refs = []
 8    10
 9    11     49.5 MiB      4.2 MiB          51       for article in articles:
10    12     52.8 MiB      3.3 MiB          50           text = article.get_text()
11    13     53.1 MiB      0.3 MiB          50           words = text.split()
12    14
13    15    130.5 MiB     77.4 MiB       50000           for word in words:
14    16    130.5 MiB      0.0 MiB       45000               if word not in word_freq:
15    17    130.5 MiB      0.0 MiB       12000                   word_freq[word] = []
16    18    130.5 MiB      0.0 MiB       50000               word_freq[word].append(article)

Output column explanation:

ColumnMeaningUse
Line #Source code line numberLocate code
Mem usageTotal memory after line executesTrack growth
IncrementMemory added by this lineFind culprits
OccurrencesHow many times line ranUnderstand loops
Line ContentsThe actual codeSee what it does

The first column shows the amount of memory required when each line gets executed. The second one shows the increase in memory for each specific line.

Identifying the Problem

After going through 50 articles, the program already took 130 megabytes can be observed. No wonder memory ran out when trying to process all the articles.

Memory consumption analysis:

MetricValueProjection for All Wikipedia
Articles processed50~6,000,000
Memory used130 MB~15,600 GB (15.6 TB)
Per article2.6 MBSame ratio
Expected per article~100 KB26× too much

The variables that require the most memory are article and text with about four and three megabytes respectively.

Variable memory breakdown:

VariableMemory per InstanceInstancesTotalShould Release?
article4 MB1 (current)4 MBAfter processing
text3 MB1 (current)3 MBAfter processing
word_freqVariable1GrowingKeep (needed)
article_refs4 MB × 5050200 MBBUG: Should not keep

Those are the articles being processed, and it’s fine for them to take space while counting the words in the article. But once processing one article is done, that memory shouldn’t be kept around.


Spotting and Fixing the Bug

The Critical Question

Can the problem be spotted? Right at the end, the code is storing the article to keep a reference to it, but it’s storing the whole article.

Problematic code pattern:

1# BAD: Storing entire article
2for word in words:
3    if word not in word_freq:
4        word_freq[word] = []
5    word_freq[word].append(article)  # Stores full 4 MB article object

What’s being stored:

What’s StoredSizeNecessary?Issue
Full article object4 MBNoExcessive memory
Article title50 bytesYesMinimal impact
Article ID8 bytesYesMinimal impact
Article index4 bytesYesMinimal impact

The Solution

If keeping a reference to all the articles that include a word is desired, titles or index entries could be stored, definitely not the whole contents.

Corrected code:

1# GOOD: Storing only article title/ID
2for word in words:
3    if word not in word_freq:
4        word_freq[word] = []
5    word_freq[word].append(article.title)  # Stores only ~50 byte string

Before vs after comparison:

ApproachMemory per Reference50 Articles1000 Articles1M Articles
Store full article4 MB200 MB4 GB4 TB
Store title only50 bytes2.5 KB50 KB50 MB
Savings99.999%99.999%99.999%99.999%

Alternative reference strategies:

Reference TypeSizeRetrievalBest For
Title string20-100 bytesLookup by titleHuman-readable
Numeric ID4-8 bytesDatabase queryMemory efficiency
Index number4 bytesArray accessFast lookup
URL50-200 bytesHTTP fetchDistributed systems

Key Takeaways from the Examples

Lessons Learned

Memory leak investigation process:

StepToolWhat to Look ForAction
1. Detecttop, system monitorGrowing RES memoryNote which process
2. ConfirmMultiple observationsContinuous growthVerify it’s a leak
3. SimplifyCode modificationReduce to test caseMake profileable
4. Profilememory_profilerLine-by-line usageFind hotspots
5. AnalyzeProfiler outputLargest incrementsIdentify cause
6. FixCode changesRemove retentionTest improvement
7. VerifyRe-profileStable memoryConfirm fix

Common Memory Retention Patterns

Patterns that cause excessive memory retention:

PatternProblemFix
Storing full objects in indicesReferences keep objects aliveStore IDs/titles only
Never clearing collectionsLists/dicts grow indefinitelyImplement size limits
Caching without evictionCache grows foreverUse LRU cache
Keeping processed dataDon’t release after useDelete when done
Circular referencesObjects reference each otherBreak references explicitly

Profiling Considerations

Memory profiling trade-offs:

ConsiderationImpactMitigation
Performance overhead10-100× slowerProfile simplified version
Multiprocess complexityVery hard to profileUse single-process version
Production profilingRisk to live systemProfile in development/staging
Sample sizeToo small misses issueBalance size vs. profiling time

Conclusion

Applications request memory for various reasons ranging from legitimate task requirements and performance caching to programming errors causing memory leaks, with the critical distinction being whether memory usage plateaus at a reasonable level for the work being done or continues growing unboundedly regardless of workload. The terminal scroll buffer example demonstrated extreme memory leak behavior where running od -cx /dev/urandom generated continuous output filling an unlimited buffer, causing memory consumption visible in top’s RES column growing from normal levels to critical exhaustion in minutes, illustrating that while most memory leaks occur much more slowly over days or weeks, the investigation pattern remains consistent: monitoring memory over time and comparing to historical baselines. Understanding top’s memory columns is essential, with RES showing actual physical RAM in use being the primary problem indicator, SHR displaying shared memory across processes typically system-managed, and VIRT listing total virtual memory including disk-backed resources where high values are usually acceptable, making RES the focused metric for leak detection. The word frequency script example revealed how a program working fine with small inputs exhausted memory on large datasets, using multiprocessing that multiplied the memory leak impact across workers, with one process showing continuous growth because the application unnecessarily stored full 4MB article objects in word frequency dictionaries instead of minimal title or ID references, demonstrating how legitimate data structure memory can be dwarfed by improper retention of intermediate processing objects. Using Python’s memory_profiler with the @profile decorator on a simplified single-process version processing just 50 articles instead of millions allowed line-by-line analysis showing 130MB consumption that would project to terabytes for full Wikipedia, with the profiler output revealing that while article and text variables legitimately consumed 4MB and 3MB during processing, the bug was storing entire article objects in word_freq dictionary values rather than just titles, changing from 4MB per reference to 50 bytes achieving 99.999% memory savings and transforming scaling from terabytes to megabytes for large datasets.


FAQ