<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Systems on Ghafoor's Personal Blog</title><link>http://ghafoorsblog.com/categories/systems/</link><description>Recent content in Systems on Ghafoor's Personal Blog</description><generator>Hugo</generator><language>en</language><managingEditor>noreply@example.com (AG Sayyed)</managingEditor><webMaster>noreply@example.com (AG Sayyed)</webMaster><copyright>Copyright © 2024-2026 AG Sayyed. All Rights Reserved.</copyright><lastBuildDate>Sat, 16 May 2026 17:42:12 +0100</lastBuildDate><atom:link href="http://ghafoorsblog.com/categories/systems/index.xml" rel="self" type="application/rss+xml"/><item><title>Debugging Complex Systems</title><link>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/04-module/009-complex-system/</link><pubDate>Thu, 13 Nov 2025 16:47:58 +0000</pubDate><author>noreply@example.com (AG Sayyed)</author><guid>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/04-module/009-complex-system/</guid><description>&lt;p class="lead text-primary"&gt;
This document explores debugging techniques for complex distributed systems involving multiple services, covering systematic log analysis across service boundaries, identifying what changed between working and failing states, rollback strategies, load balancer troubleshooting, removing faulty servers from pools, and managing cloud-based infrastructure with resource limits and automated deployment pipelines.
&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Troubleshooting problems on a single computer differs significantly from debugging complex systems with many interacting services. When multiple computers and services work together to provide functionality, problems can arise from any component or their interactions. Effective debugging requires understanding the bigger picture, analyzing logs across services, identifying changes, and managing infrastructure at scale.&lt;/p&gt;</description></item><item><title>Monitoring and Long-Term Solutions</title><link>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/015-future-planning/</link><pubDate>Wed, 12 Nov 2025 19:39:54 +0000</pubDate><author>noreply@example.com (AG Sayyed)</author><guid>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/015-future-planning/</guid><description>&lt;p class="lead text-primary"&gt;
This document covers quick workarounds versus long-term solutions, establishing monitoring systems to track resource usage and detect issues early, setting up effective alerting rules, best practices for bug reporting, implementing tests to prevent regressions, and documenting solutions for faster future incident resolution.
&lt;/p&gt;
&lt;h2 id="quick-workarounds-vs-long-term-solutions"&gt;Quick Workarounds vs Long-Term Solutions&lt;/h2&gt;
&lt;p&gt;When systems encounter issues, immediate action is necessary to restore service quickly. However, addressing the symptoms does not complete the troubleshooting process—permanent solutions must follow.&lt;/p&gt;</description></item><item><title>Planning Future Resources usage</title><link>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/013-planning-resources/</link><pubDate>Tue, 11 Nov 2025 18:30:10 +0000</pubDate><author>noreply@example.com (AG Sayyed)</author><guid>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/013-planning-resources/</guid><description>&lt;p class="lead text-primary"&gt;
This document describes forecasting and planning for future resource usage, covering growth estimation, monitoring, cleanup strategies, cloud migration trade-offs, and mixed-workload placement to maximize utilization and avoid urgent capacity crises.
&lt;/p&gt;
&lt;h2 id="forecasting-resource-growth"&gt;Forecasting Resource Growth&lt;/h2&gt;
&lt;p&gt;Planning ahead prevents scrambling when storage, CPU, memory, or network capacity is exhausted. Forecasting requires measuring current consumption, estimating growth rates, and calculating time-to-exhaustion.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Resource&lt;/th&gt;
 &lt;th style="text-align: right"&gt;Current Free&lt;/th&gt;
 &lt;th style="text-align: right"&gt;Expected Growth&lt;/th&gt;
 &lt;th style="text-align: right"&gt;Time to Exhaustion&lt;/th&gt;
 &lt;th&gt;Notes&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Disk&lt;/td&gt;
 &lt;td style="text-align: right"&gt;500 MB&lt;/td&gt;
 &lt;td style="text-align: right"&gt;1 MB/day&lt;/td&gt;
 &lt;td style="text-align: right"&gt;~500 days&lt;/td&gt;
 &lt;td&gt;Low risk if growth steady&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Disk&lt;/td&gt;
 &lt;td style="text-align: right"&gt;500 MB&lt;/td&gt;
 &lt;td style="text-align: right"&gt;10 MB/day&lt;/td&gt;
 &lt;td style="text-align: right"&gt;~50 days&lt;/td&gt;
 &lt;td&gt;Requires action: cleanup or expand&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Network&lt;/td&gt;
 &lt;td style="text-align: right"&gt;Baseline throughput&lt;/td&gt;
 &lt;td style="text-align: right"&gt;% growth per month&lt;/td&gt;
 &lt;td style="text-align: right"&gt;Scale accordingly&lt;/td&gt;
 &lt;td&gt;Monitor spikes and trends&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Memory&lt;/td&gt;
 &lt;td style="text-align: right"&gt;Free memory headroom&lt;/td&gt;
 &lt;td style="text-align: right"&gt;Per-process growth&lt;/td&gt;
 &lt;td style="text-align: right"&gt;Assess swap/use patterns&lt;/td&gt;
 &lt;td&gt;Watch for leaks&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="documenting-usage-and-assumptions"&gt;Documenting Usage and Assumptions&lt;/h2&gt;
&lt;p&gt;Record measurements and assumptions so forecasts can be validated and revised.&lt;/p&gt;</description></item><item><title>Proactive Practices</title><link>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/012-proactive-practices/</link><pubDate>Tue, 11 Nov 2025 18:21:25 +0000</pubDate><author>noreply@example.com (AG Sayyed)</author><guid>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/012-proactive-practices/</guid><description>&lt;p class="lead text-primary"&gt;
This document describes proactive practices to reduce incidents and simplify troubleshooting: automated testing and CI, test environments and canary deployments, centralized logging and monitoring, ticket automation, documentation, and capacity planning.
&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="why-proactive-practices-matter"&gt;Why Proactive Practices Matter&lt;/h2&gt;
&lt;p&gt;Bugs and failures are unavoidable. Proactive practices reduce their frequency and impact by catching issues early and providing better diagnostic information when problems occur.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Problem Area&lt;/th&gt;
 &lt;th&gt;Proactive Practice&lt;/th&gt;
 &lt;th&gt;Benefit&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Code regressions&lt;/td&gt;
 &lt;td&gt;Unit and integration tests + CI&lt;/td&gt;
 &lt;td&gt;Detects bugs before deployment&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Deployment risk&lt;/td&gt;
 &lt;td&gt;Test environments and canary releases&lt;/td&gt;
 &lt;td&gt;Limits blast radius&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Incident diagnosis&lt;/td&gt;
 &lt;td&gt;Centralized logging&lt;/td&gt;
 &lt;td&gt;Faster root-cause analysis&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Silent failures&lt;/td&gt;
 &lt;td&gt;Monitoring and alerting&lt;/td&gt;
 &lt;td&gt;Detects issues before users report them&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Repetitive requests&lt;/td&gt;
 &lt;td&gt;Ticket templates and automation&lt;/td&gt;
 &lt;td&gt;Saves triage time&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Knowledge gaps&lt;/td&gt;
 &lt;td&gt;Documentation and runbooks&lt;/td&gt;
 &lt;td&gt;Consistent on-call response&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="automated-testing-and-continuous-integration"&gt;Automated Testing and Continuous Integration&lt;/h2&gt;
&lt;p&gt;Automated tests serve as a safety net that catches regressions early. Continuous integration (CI) runs tests on every change, ensuring immediate feedback.&lt;/p&gt;</description></item><item><title>Communicating With Users</title><link>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/009-communicating/</link><pubDate>Tue, 11 Nov 2025 16:53:21 +0000</pubDate><author>noreply@example.com (AG Sayyed)</author><guid>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/009-communicating/</guid><description>&lt;p class="lead text-primary"&gt;
This document explores essential communication strategies for IT support professionals, covering expectation management, priority handling, ticket tracking systems, and practical shortcuts to improve response times and user satisfaction during incident response.
&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="understanding-user-expectations"&gt;Understanding User Expectations&lt;/h2&gt;
&lt;p&gt;When dealing with issues affecting one or more users, the pressure to meet expectations can be intense. Users develop implicit expectations about resolution times based on the perceived complexity of their problems. Understanding and managing these expectations is crucial for successful interactions.&lt;/p&gt;</description></item><item><title>Managing Disk Space</title><link>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/003-managing-disk-space/</link><pubDate>Tue, 11 Nov 2025 01:51:36 +0000</pubDate><author>noreply@example.com (AG Sayyed)</author><guid>http://ghafoorsblog.com/courses/google/it-automation-content/it-automation-python-pcert/04-troubleshooting-debugging/05-module/003-managing-disk-space/</guid><description>&lt;p class="lead text-primary"&gt;
This document examines disk space management as a critical system resource, exploring how programs consume storage through binaries, data, caches, logs, and temporary files. It covers diagnostic approaches for identifying space usage patterns, understanding performance degradation as disks fill up, and implementing strategies to prevent disk exhaustion that can cause application crashes and potential data loss.
&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="understanding-disk-space-usage"&gt;Understanding Disk Space Usage&lt;/h2&gt;
&lt;h3 id="why-programs-need-disk-space"&gt;Why Programs Need Disk Space&lt;/h3&gt;
&lt;p&gt;Another resource that might need attention is the disk usage of computers. Programs may need disk space for lots of different reasons.&lt;/p&gt;</description></item><item><title>Agent Usage</title><link>http://ghafoorsblog.com/courses/ibm/ai-developer-content/ai-developer-pcert/02-introduction-to-ai/03-module/002-agent-usage/</link><pubDate>Fri, 11 Jul 2025 11:35:48 +0000</pubDate><author>noreply@example.com (AG Sayyed)</author><guid>http://ghafoorsblog.com/courses/ibm/ai-developer-content/ai-developer-pcert/02-introduction-to-ai/03-module/002-agent-usage/</guid><description>&lt;p class="lead text-primary"&gt;
This document explains the evolution from monolithic AI models to compound AI systems, demonstrating how combining models with programmatic components and external data sources enables more accurate, adaptable, and context-aware solutions for complex tasks.
&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="from-monolithic-models-to-compound-ai-systems"&gt;From Monolithic Models to Compound AI Systems&lt;/h2&gt;
&lt;p&gt;Traditional AI models are limited by the data they are trained on and are difficult to adapt to new tasks or information. Adapting such models requires significant investment in data and resources. For example, a language model cannot answer personalized queries, such as vacation days available for a specific user, without access to external data.&lt;/p&gt;</description></item></channel></rss>