Comprehensive guide to cloud object storage explaining its architecture API-based access cost efficiency and ideal use cases for unstructured data static files and long-term archival
This document provides an overview of object storage in cloud computing, explaining its benefits, use cases, and considerations for provisioning. Object storage is used to store static files or objects, such as text files, audio and video files, IoT data, virtual machine images, backup files, and data archives. It is not suitable for running operating systems or databases. Objects are stored in buckets, which do not require predefined sizes and can hold varying amounts of data. Different types of buckets are available with varying charges based on resilience, availability, and access frequency. Object storage is typically less expensive but slower than file or block storage and is accessed using APIs, with many providers offering S3-compatible APIs. It also supports automatic archiving to cheaper storage tiers for infrequently accessed data, making it an effective solution for backup and disaster recovery.
Object storage is a type of storage where data is stored as objects, rather than in a file or block structure. Unlike traditional storage types, you do not connect object storage to a particular compute node. Instead, you provision an object storage service instance and use an API (application programming interface) to upload, download, and manage your data. This means you can use object storage directly with anything that can call an API, without needing an underlying compute node.
Object storage is typically less expensive than other cloud storage options. Its per gigabyte cost is usually a few US cents per month, and sometimes even less, depending on the storage tier used. Another key feature of object storage is that it is effectively infinite. With file and block storage, you specify the size of the storage you want and pay a fee based on that size. With object storage, you consume the storage you need and pay per gigabyte for what you use. You can keep uploading files without running out of storage.
Object storage is ideal for storing large amounts of unstructured data. Unstructured data means that the data is not stored in a hierarchical folder or directory structure. Instead, object storage uses buckets, and objects are stored within these buckets in a flat structure. A bucket is similar to a folder in that you can give it meaningful names and have different buckets for different object types, but you cannot place a bucket within another bucket.
When an object is placed in a bucket, it also has metadata, which is data about the data, such as an object ID. This metadata helps applications locate and access the object and provides information on when the data was stored or last accessed. When you create a bucket, you do not need to define any sizing information. The bucket will hold the data you place inside it, and the service provider ensures sufficient storage capacity is available. Buckets can hold as little as a few bytes of data up to multiple petabytes, and you can adjust the amount of data stored as needed.
The service provider also ensures that the object storage solution is highly available and resilient. Some cloud providers offer different types of buckets with varying levels of resilience. For example, some buckets are resilient but store data in only one data center, which is suitable for data that needs to reside in a specific geographical location or where high availability is less critical. Other buckets are highly available across regions, storing data multiple times in different data centers or zones within the same region or even in multiple regions. These options usually cost more but provide the highest level of resilience and availability for your data.
Object storage is suitable for storing a wide range of data types, including text files, audio files, video files, IoT data, virtual machine images, backup files, and data archives. It is not suitable for running operating systems, databases, or applications where the contents of the files change frequently.
graph TD;
A[User/Developer] -->|Access via API| B[Object Storage Service];
B --> C{Bucket};
C -->|Standard Tier| D[Frequent Access];
C -->|Vault Tier| E[Semi-Frequent Access];
C -->|Cold Vault Tier| F[Infrequent Access];
C --> G[Metadata Management];
G -->|Contains| H[Object ID];
G --> I[Access Time];
C -->|Stores| J[Data Objects];
Object storage buckets have storage tiers, or classes, based on how frequently the data is accessed. A standard tier bucket is used for objects that are frequently accessed and has the highest per gigabyte cost. A vault or archive tier is for documents accessed only once or twice a month, offered at a lower storage cost. There is also a cold vault tier for data accessed only once or twice a year, costing just a fraction of a US cent per gigabyte per month.
You can set up automatic archiving rules for your data, meaning that if an object isn’t accessed for a period of time, it will automatically move to a cheaper storage tier. The rule uses some of the object’s metadata to determine when it should be archived.
Object storage does not come with IOPS options and tends to be slower compared to file or block storage. Downloads typically take seconds or longer to complete. For cold vault buckets, data retrieval can take hours because the storage is kept offline. If your application needs fast access to files, object storage may not be a good option.
Object storage is priced per gigabyte used, but there can also be other costs related to data retrieval. These costs are low, but access charges can be higher for data in vault or cold vault tiers. It is important to ensure that the data is in the correct tier based on its frequency of access.
Object storage does not need to be attached to a compute node for access. Instead, you access it through an application program interface (API). The most common API for object storage is the S3 API, a standard based on the S3 object storage offered by AWS. Many providers offer S3-compatible APIs, allowing developers to write code that can access multiple vendors’ object storage. The API is an HTTP-based RESTful web service, allowing applications to manage object storage and buckets, as well as upload or download objects.
Object storage is not just for new applications but can also meet requirements for existing ones. It is an effective solution for backup and disaster recovery, replacing off-site tape-based solutions and reducing the time to restore data. Many backup packages now include the ability to back up data to the cloud using object storage. Object storage is more efficient than tape backup solutions, which require tapes to be physically loaded into and removed from tape drives and moved off-site for geographical redundancy.
In the next video, we will cover Content Delivery Network (CDN), which is driven by object storage.
Object storage is a cost-effective and scalable solution for storing large amounts of unstructured data. It is ideal for applications that need to store static files or objects, such as text files, audio and video files, IoT data, virtual machine images, backup files, and data archives. Object storage is not suitable for running operating systems, databases, or applications where the contents of the files change frequently. By using object storage, you can take advantage of its infinite capacity, pay only for what you use, and benefit from high availability and resilience provided by the service provider.