Managing Disk Space

November 11, 2025 18 min read Troubleshooting Resources Systems Docs Automation-With-Python Disk-Management Storage Logs Cleanup

This document addresses disk space management challenges in IT systems covering common causes of disk exhaustion from logs to temporary files. It explores diagnostic techniques for identifying space usage patterns, handling deleted but open files, and implementing preventive strategies to avoid disk-related performance degradation and data loss.

On this page

This document examines disk space management as a critical system resource, exploring how programs consume storage through binaries, data, caches, logs, and temporary files. It covers diagnostic approaches for identifying space usage patterns, understanding performance degradation as disks fill up, and implementing strategies to prevent disk exhaustion that can cause application crashes and potential data loss.

Understanding Disk Space Usage

Why Programs Need Disk Space

Another resource that might need attention is the disk usage of computers. Programs may need disk space for lots of different reasons.

Common disk space consumers:

Type	Purpose	Growth Pattern	Cleanup Frequency
Installed binaries	Application executables	Stable	On uninstall
Libraries	Shared code dependencies	Stable	On uninstall
Application data	User and system data	Growing	User-driven
Cache information	Performance optimization	Growing/stable	Periodic
Logs	System and application events	Continuously growing	Rotation-based
Temporary files	Intermediate processing	Varies	Should be automatic
Backups	Data redundancy	Growing	Retention policy

Potential Causes of Space Exhaustion

If a computer is running out of space, it’s possible that there’s an attempt to store too much data in too little space.

Space exhaustion scenarios:

Scenario	Cause	Likelihood	Solution Type
Legitimate growth	Too many applications or large files	Common	Add storage capacity
Program misbehavior	Temporary files not cleaned	Very common	Fix cleanup logic
Log overflow	Excessive logging without rotation	Common	Configure log rotation
Cache accumulation	No cache eviction policy	Moderate	Implement cache limits
Backup retention	Old backups never deleted	Moderate	Set retention policy

Maybe there are too many applications installed, or an attempt to store too many large files in the drive.

Program Misuse of Disk Space

But it’s also possible that programs are misusing the space allotted to them, like by keeping temporary files or caching information that doesn’t get cleaned up quickly enough or at all.

Misuse patterns:

Misuse Type	Behavior	Impact Timeline	Detection Method
Temporary file retention	Never deleting temp files	Days to weeks	Directory size monitoring
Unbounded caching	Cache grows indefinitely	Weeks to months	Cache directory analysis
Excessive logging	High-frequency log writes	Hours to days	Log file growth rate
Failed cleanup	Crash prevents deletion	Per crash	Orphaned file detection

Performance Impact of Disk Exhaustion

System-Wide Performance Degradation

It’s common for the overall performance of the system to decrease as the available disk space gets smaller.

Performance degradation stages:

Disk Usage	Available Space	Performance Impact	User Experience
0-50%	Plenty free	Normal	Fast operations
50-80%	Moderate free	Slight slowdown	Barely noticeable
80-95%	Low free	Noticeable slowdown	Delays apparent
95-100%	Critical/none	Severe degradation	Very slow/crashes

Data Fragmentation

Data starts getting fragmented across the disk, and operations become slower.

Fragmentation effects:

Aspect	Unfragmented Disk	Fragmented Disk	Impact
File location	Contiguous blocks	Scattered blocks	Read time increases
Seek operations	Minimal	Many	Head movement delays
Write efficiency	Sequential	Random	Slower writes
Free space	Large contiguous blocks	Many small gaps	Allocation overhead

Fragmentation performance comparison:

Operation	Contiguous File	Fragmented File (10 pieces)	Slowdown Factor
Sequential read	1 seek + read	10 seeks + reads	5-10× slower
Random access	Direct access	Multiple seeks	3-5× slower
File opening	Fast	Slow (map fragments)	2-4× slower

Application Crashes

When a hard drive is full, programs may suddenly crash while trying to write something into disk and finding out that they can’t.

Write failure scenarios:

Operation	Expected Behavior	Full Disk Behavior	Result
Log write	Append to file	Write fails	Application crash
Save document	Update file	No space error	Work lost
Create temp file	Allocate space	Allocation fails	Process terminates
Database commit	Write transaction	Commit fails	Data inconsistency

Risk of Data Loss

A full hard drive might even lead to data loss, as some programs might truncate a file before writing an updated version of it, and then fail to write the new content, losing all the data that was stored in it before.

Data loss patterns:

Update Pattern	Step 1	Step 2	Full Disk Result
Safe (atomic)	Write to temp file	Rename over original	Temp write fails, original intact
Unsafe (truncate)	Truncate original	Write new data	Truncate succeeds, write fails, data lost
In-place	Seek to position	Overwrite data	Partial write, corrupted file

Warning
A full disk can cause catastrophic data loss when applications truncate files before writing updates. If the truncation succeeds but the write fails due to no space, all original data is permanently lost. Always monitor disk space to prevent reaching this critical state.

Error Messages

If it gets to this point, errors like “no space left on the device” will probably be seen when running applications or in the logs.

Common disk full errors:

Error Message	Context	Severity
“No space left on device”	Linux/Unix systems	Critical
“Disk full”	General error	Critical
“ENOSPC: no space left”	Node.js/JavaScript	Critical
“IOError: [Errno 28]”	Python	Critical
“Insufficient disk space”	Windows	Critical

Diagnosing Disk Space Issues

User Machine Solutions

So what should be done if a computer runs out of disk space? If it’s a user machine, it might be easily fixed by uninstalling applications that aren’t used, or cleaning up old data that isn’t needed anymore.

User machine cleanup approaches:

Cleanup Type	Target	Impact	Difficulty
Uninstall apps	Unused applications	High (GB)	Easy
Delete downloads	Old download files	Moderate (GB)	Easy
Clear caches	Browser, app caches	Moderate (MB-GB)	Easy
Remove duplicates	Duplicate files	Varies	Moderate
Archive old files	Old documents, photos	High (GB)	Moderate

Server Investigations

But if it’s a server, a closer look at what’s going on might be needed. Is the issue that an extra drive needs to be added to the server to have more available space, or is it that some application is misbehaving and filling the disk with useless data?

Server diagnostic questions:

Question	Indicates	Action Required
Is growth expected?	Legitimate data increase	Add storage capacity
Is one directory dominant?	Concentrated issue	Investigate specific application
Are files temporary/logs?	Cleanup problem	Fix cleanup processes
Is growth rate abnormal?	Application misbehavior	Debug application
Are backups accumulating?	Retention issue	Adjust backup policy

Space Usage Analysis

To figure this out, examining how the space is being used and what directories are taking up the most space is needed, then drilling down until finding out whether large chunks of space are taken by valid information or by files that should be purged.

Analysis workflow:

Step	Command Example	Purpose	Output
1. Top-level overview	`df -h`	Show filesystem usage	Total/used/available per mount
2. Directory breakdown	`du -sh /*`	Identify large directories	Size of top-level dirs
3. Drill down	`du -sh /var/*`	Investigate suspect dir	Subdirectory sizes
4. Find large files	`find / -size +1G`	Locate specific culprits	Files over threshold
5. Sort by size	`du -h	sort -rh	head -20`	Rank consumers	Top 20 space users

Common disk usage commands:

 1# Check overall disk usage
 2df -h
 3
 4# Find directories using most space
 5du -sh /* | sort -rh | head -10
 6
 7# Find large files over 100MB
 8find / -type f -size +100M -exec ls -lh {} \;
 9
10# Check disk usage by directory, sorted
11du -h /var | sort -rh | head -20
12
13# Find files modified in last 7 days
14find /var/log -mtime -7 -type f -exec du -sh {} \; | sort -rh

Expected vs Anomalous Usage

For example, on a database server, it’s expected that the bulk of the disk space is going to be used by the data stored in the database. On a mail server, it’s going to be the mailboxes of the users of that service.

Expected space usage by server type:

Server Type	Expected Primary Consumer	Typical Size	Anomaly Threshold
Database	Database files (/var/lib/mysql)	50-90% of disk	Logs >10%
Mail	User mailboxes (/var/mail)	60-90% of disk	Temp files >5%
Web	Static content (/var/www)	30-60% of disk	Logs >20%
File	Shared files (/shares)	70-95% of disk	System >5%
Application	Application data	40-70% of disk	Logs >15%

But if most of the data is found to be stored in logs or in temporary files, something has gone wrong.

Anomalous usage indicators:

Directory	Normal Size	Anomalous Size	Likely Issue
/var/log	<5% disk	>20% disk	Log rotation failure
/tmp	<2% disk	>10% disk	Temp file cleanup failure
/var/cache	<10% disk	>30% disk	Cache eviction not working
/var/spool	<5% disk	>15% disk	Queue processing stuck

Common Misbehavior Patterns

Excessive Error Logging

One common pattern of misbehavior is a program that keeps logging error messages to the system log over and over. This can happen for lots of different reasons.

Excessive logging scenarios:

Cause	Frequency	Growth Rate	Example
Configuration error	Continuous retries	MB to GB per hour	Service fails to start
Network timeout	Per request	GB per day	API endpoint down
Permission denied	Per access attempt	MB per hour	File access failure
Dependency failure	Per health check	GB per day	Database unreachable

OS Retry Loops

For example, the OS might keep trying to start a program that fails because of a configuration problem. This will generate a new log entry with every retry and can take up a lot of space if there are several retries per second.

Retry pattern impact:

Retry Rate	Log Entry Size	Space Used Per Hour	Space Used Per Day
1 per second	200 bytes	~700 KB	~17 MB
10 per second	200 bytes	~7 MB	~168 MB
100 per second	200 bytes	~70 MB	~1.7 GB
1000 per second	200 bytes	~700 MB	~17 GB

Example error log loop:

1Nov 11 10:15:01 server systemd[1]: Starting myapp.service...
2Nov 11 10:15:01 server myapp[1234]: Configuration file not found: /etc/myapp/config.yml
3Nov 11 10:15:01 server systemd[1]: myapp.service: Main process exited, code=exited, status=1/FAILURE
4Nov 11 10:15:01 server systemd[1]: myapp.service: Failed with result 'exit-code'.
5Nov 11 10:15:02 server systemd[1]: Starting myapp.service...
6Nov 11 10:15:02 server myapp[1235]: Configuration file not found: /etc/myapp/config.yml
7# ... repeats thousands of times ...

High-Volume Legitimate Logging

Or it could be that the server has a lot of activity and the logs are real, but there are just too many of them.

High-activity logging management:

Activity Level	Logs Per Day	Rotation Strategy	Retention Period
Low	<100 MB	Weekly rotation	30 days
Moderate	100 MB - 1 GB	Daily rotation	7-14 days
High	1-10 GB	Hourly rotation	3-7 days
Very high	>10 GB	Continuous/size-based	1-3 days

In that case, tweaking the configuration of the tools that rotate the logs more frequently might be needed to make sure that only what’s needed is being kept.

Log rotation configuration strategies:

Strategy	Configuration	Benefit	Trade-off
Size-based	Rotate when >100MB	Predictable disk usage	Uneven time periods
Time-based	Rotate daily at midnight	Regular schedule	Variable file sizes
Compression	Gzip old logs	Save 80-90% space	CPU overhead
Remote shipping	Send to log server	Local disk protected	Network dependency
Reduced verbosity	Lower log level	Less data written	Less debugging info

Example logrotate configuration:

 1/var/log/myapp/*.log {
 2    daily                  # Rotate daily
 3    rotate 7              # Keep 7 days
 4    compress              # Compress old logs
 5    delaycompress         # Don't compress most recent
 6    missingok            # Don't error if log missing
 7    notifempty           # Don't rotate if empty
 8    create 0644 root root # Create new file with permissions
 9    size 100M            # Also rotate if >100MB
10    postrotate
11        systemctl reload myapp
12    endscript
13}

Temporary File Issues

Uncleaned Temporary Files

In other cases, the disk might get full due to a program generating large temporary files and then failing to clean those up.

Temporary file problems:

Problem Type	Cause	Accumulation Rate	Detection
Crash cleanup failure	Process killed unexpectedly	Per crash	Growing /tmp directory
Programming error	Missing cleanup code	Continuous	Temp files with old timestamps
Failed cleanup logic	Error in cleanup routine	Varies	Files matching temp pattern
Partial processing	Job interrupted	Per failed job	Incomplete file sets

Cleanup on Normal vs Abnormal Exit

For example, an application might clean up temporary files when shutting down cleanly, but leave them behind if it crashes.

Cleanup behavior comparison:

Exit Type	Cleanup Trigger	Temp Files	Result
Normal shutdown	Exit handler called	Deleted	Clean /tmp
Graceful signal (SIGTERM)	Signal handler	Deleted	Clean /tmp
Kill signal (SIGKILL)	None	Remain	/tmp accumulates
Crash/exception	May not execute	Remain	/tmp accumulates
Power loss	None	Remain	/tmp accumulates

Programming Errors

Or it could simply be a programming error of creating temporary files and never cleaning them up.

Programming error patterns:

 1# Bad: Temporary file never cleaned up
 2def process_data(input_file):
 3    temp_file = "/tmp/processing_" + str(time.time())
 4    with open(temp_file, 'w') as f:
 5        # Process data...
 6        f.write(processed_data)
 7    # File left behind forever
 8    return result
 9
10# Good: Explicit cleanup
11def process_data_better(input_file):
12    temp_file = "/tmp/processing_" + str(time.time())
13    try:
14        with open(temp_file, 'w') as f:
15            f.write(processed_data)
16        # Process temp file
17        result = process(temp_file)
18    finally:
19        if os.path.exists(temp_file):
20            os.remove(temp_file)
21    return result
22
23# Best: Use tempfile module
24import tempfile
25
26def process_data_best(input_file):
27    with tempfile.NamedTemporaryFile(mode='w', delete=True) as temp_file:
28        temp_file.write(processed_data)
29        temp_file.flush()
30        result = process(temp_file.name)
31    # File automatically deleted when context exits
32    return result

Solutions for Temporary File Problems

In a case like this, ideally there would be some housekeeping to fix the program and delete those files correctly.

Temporary file solutions:

Solution	Approach	Permanence	Effort
Fix application	Correct cleanup code	Permanent	High
Add signal handlers	Catch SIGTERM	Permanent	Medium
Use proper temp APIs	tempfile module/mktemp	Permanent	Medium
Cleanup script	Scheduled deletion	Workaround	Low
tmpfs mount	RAM-based /tmp	System reboot cleans	Medium

But if that’s not possible, writing a custom script that gets rid of them might be needed.

Cleanup script example:

 1#!/bin/bash
 2# cleanup_temp_files.sh - Remove old temporary files
 3
 4# Find and delete temp files older than 7 days
 5find /tmp -type f -name "processing_*" -mtime +7 -delete
 6
 7# Find and delete empty directories
 8find /tmp -type d -empty -delete
 9
10# Log cleanup action
11echo "$(date): Cleaned up old temporary files" >> /var/log/temp_cleanup.log

Cron job for automated cleanup:

1# Run cleanup script daily at 2 AM
20 2 * * * /usr/local/bin/cleanup_temp_files.sh

Deleted But Open Files

The Tricky Debugging Situation

A situation that might be tricky to debug is when the files taking up the space are deleted files.

How deleted files consume space:

File State	Visible in Listing	Disk Space Used	Process Access
Normal open file	Yes	Yes	Yes
Deleted but open	No	Yes (still allocated)	Yes (via file descriptor)
Closed & deleted	No	No (freed)	No

If a program opens a file, the OS lets that program read and write in the file regardless of whether the file is marked as deleted or not.

Intentional Deletion Pattern

So lots of programs delete the temporary files they create right after opening to avoid issues with failing to clean them up later.

Temporary file lifecycle with immediate deletion:

Step	Action	File Visible	Disk Space	Process Access
1. Create	`fd = open('/tmp/work', 'w+')`	Yes	Allocated	Yes
2. Delete	`os.unlink('/tmp/work')`	No	Still allocated	Yes (via fd)
3. Use	Read/write via file descriptor	No	Grows as written	Yes
4. Close	`close(fd)` or process exits	No	Freed	No

That way, the process can read from and write to the file while the file is open. Then when the process finishes, the file gets closed and actually deleted.

Benefits of immediate deletion pattern:

Benefit	Description	Protection Against
Guaranteed cleanup	File auto-deleted when closed	Orphaned files
Crash resilience	OS cleans up on process death	Failed cleanup code
No manual deletion	No cleanup code needed	Programming errors
Namespace freed	Filename immediately reusable	Name conflicts

When Things Go Wrong

Now, this system is widely used and works fine for most processes. But if for some reason this temporarily deleted file starts becoming super large, it can end up taking up all the available disk space.

Deleted-but-open file problems:

Scenario	File Size	Impact	Visibility
Normal temp usage	<100 MB	None	Not listed
Large processing	1-10 GB	Disk space reduced	Not listed
Runaway process	>50 GB	Disk exhaustion	Not listed, hard to debug
Multiple processes	Many large files	System-wide impact	Very confusing

If that happens, confusion will result when trying to figure out where most of the data went, since these deleted files won’t be seen.

Important
Deleted but open files consume disk space without appearing in directory listings, making them extremely difficult to diagnose. The df command shows disk usage, but du cannot account for the space because the files don’t exist in the filesystem namespace anymore.

Detecting Deleted Open Files

To check for the specific condition, the currently opened files need to be listed and combed for the ones that are known to be deleted.

Detection commands:

 1# Linux: List open files marked as deleted
 2lsof | grep deleted
 3
 4# Alternative: Check /proc for deleted files
 5find /proc/*/fd -ls 2>/dev/null | grep '(deleted)'
 6
 7# Show processes with deleted files and their sizes
 8lsof -nP | grep '(deleted)' | awk '{print $2, $9, $7}' | \
 9  while read pid name size; do
10    echo "PID $pid: $name ($size bytes)"
11  done
12
13# Find large deleted files (>100MB)
14lsof -nP | grep '(deleted)' | \
15  awk '$7 > 104857600 {print $2, $9, $7/1048576 " MB"}'

lsof output interpretation:

1COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
2myapp    1234 root   3w   REG  253,0 10737418240 1234 /tmp/bigfile (deleted)

Column	Value	Meaning
COMMAND	myapp	Process name
PID	1234	Process ID
FD	3w	File descriptor 3, open for writing
SIZE/OFF	10737418240	File size: ~10 GB
NAME	/tmp/bigfile (deleted)	File path and deleted status

Resolution strategies:

Strategy	Command	Effect	Risk
Truncate file	`> /proc/PID/fd/FD`	Free space, process continues	May crash process
Kill process	`kill PID`	File freed on exit	Process terminated
Graceful shutdown	`kill -TERM PID`	Clean shutdown	May take time
Wait for completion	Monitor process	Natural cleanup	May be too slow

General Troubleshooting Approach

Consistent Problem-Solving Process

Of course, there are all kinds of other reasons why the disk may be getting too full. Just remember that whenever this happens, the process will remain the same.

Universal disk troubleshooting workflow:

Phase	Activities	Tools/Commands	Goal
1. Investigation	Check disk usage patterns	`df`, `du`, `lsof`	Identify what’s using space
2. Classification	Determine expected vs anomaly	Server role knowledge	Legitimate vs problem
3. Resolution	Fix the issue	Various	Reclaim space
4. Prevention	Implement safeguards	Monitoring, automation	Avoid recurrence

Investigation Phase

Time will need to be spent looking into what’s using the disk.

Investigation checklist:

Check	Command	What to Look For
Overall usage	`df -h`	Which filesystems are full
Directory sizes	`du -sh /*`	Top-level space consumers
Large files	`find / -size +1G`	Individual large files
Recent growth	`find / -mtime -1 -size +100M`	Recently created large files
Open deleted files	`lsof \| grep deleted`	Hidden space consumers
Log files	`du -sh /var/log/*`	Log accumulation
Temp directories	`du -sh /tmp /var/tmp`	Temporary file buildup

Classification Phase

Check to see if it’s expected or an anomaly.

Expected vs anomaly decision tree:

Observation	Expected?	Action
Database files large	Yes (DB server)	Monitor growth rate
User files large	Yes (file server)	Confirm within quotas
Logs very large	Maybe	Check rotation settings
Temp files old	No	Clean up
Deleted files open	No	Investigate processes
Cache unbounded	No	Implement limits

Resolution Phase

Figure out how to solve it.

Resolution strategies by problem type:

Problem Type	Immediate Action	Long-Term Fix
Legitimate growth	Add storage	Capacity planning
Log overflow	Compress/delete old logs	Configure rotation
Temp file accumulation	Delete old temps	Fix cleanup code
Cache bloat	Clear cache	Implement eviction
Deleted open files	Kill/truncate	Fix application
Backup retention	Delete old backups	Set retention policy

Prevention Phase

Most important of all, how to prevent it from happening again.

Prevention strategies:

Strategy	Implementation	Monitoring	Alerting
Disk monitoring	Prometheus, Nagios	Check every 5 min	Alert at 80%
Log rotation	logrotate configuration	Daily checks	Alert on rotation failure
Cleanup automation	Cron jobs for temp files	Verify execution	Alert on missed runs
Quota enforcement	Filesystem quotas	Track per-user usage	Alert on approach
Capacity planning	Track growth trends	Weekly reports	Forecast exhaustion
Application fixes	Code review, testing	Monitor temp dirs	Alert on anomalies

Monitoring script example:

 1#!/bin/bash
 2# disk_monitor.sh - Alert when disk usage exceeds threshold
 3
 4THRESHOLD=80
 5EMAIL="admin@example.com"
 6
 7df -h | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{ print $5 " " $1 }' | \
 8while read output; do
 9  usage=$(echo $output | awk '{ print $1}' | sed 's/%//g')
10  partition=$(echo $output | awk '{ print $2 }')
11
12  if [ $usage -ge $THRESHOLD ]; then
13    echo "ALERT: Partition $partition is ${usage}% full" | \
14      mail -s "Disk Space Alert on $(hostname)" $EMAIL
15  fi
16done

Conclusion

Disk space management represents a critical resource concern where programs consume storage through installed binaries and libraries, application data, caches, logs, temporary files, and backups, with exhaustion potentially caused by legitimate data growth requiring more capacity or program misbehavior through inadequate cleanup of temporary files and logs. As available disk space decreases, overall system performance degrades through data fragmentation causing slower operations, with full disks leading to application crashes when write operations fail and potential catastrophic data loss when programs truncate files before writing updates that then fail due to insufficient space, generating “no space left on device” errors. User machines can often be fixed through simple cleanup like uninstalling unused applications and deleting old data, but servers require detailed investigation to determine whether adding storage capacity is needed or if applications are misbehaving by filling disks with useless data, using commands like df and du to analyze space usage patterns and identify whether large chunks are legitimate data like databases and mailboxes or anomalies like excessive logs and temporary files. Common misbehavior patterns include programs logging error messages repeatedly when continuously failing to start due to configuration problems (potentially generating gigabytes per day), legitimate high-activity logging requiring more frequent log rotation to manage volume, and temporary files that programs fail to clean up either due to crashes preventing cleanup execution or simple programming errors never implementing deletion logic. The particularly tricky scenario of deleted but open files occurs when programs delete temporary files immediately after creation for guaranteed cleanup but before closing them, consuming disk space invisibly without appearing in directory listings since the OS maintains file access for open file descriptors regardless of deletion status, requiring lsof commands to detect these hidden space consumers. The consistent troubleshooting approach involves spending time investigating what uses the disk through filesystem and directory analysis, classifying whether usage is expected based on server role or anomalous requiring intervention, figuring out how to solve the immediate issue through cleanup or capacity addition, and most importantly implementing prevention strategies including disk monitoring with alerts at 80% usage, proper log rotation configuration, automated cleanup scripts for temporary files, and capacity planning to forecast and prevent future exhaustion events.

FAQ

Preventing Memory Leaks

Network Saturation

Browse Courses