Skip to content

S3 Icon-Architecture/64/Arch_Amazon-Simple-Storage-Service_64

  • It is object-based storage.
  • Files can be from 0 bytes to 5 TB.
  • A bucket is a folder.
  • The object has a key, value (5 Tb max size), version id, metadata, subresources (such as ACL and bucket policy)
  • S3 has a flat structure but we can create a directory by using the prefixes.
  • It has read after write consistency.
  • Eventual consistency for overwriting.
  • It has a simple web service interface
  • Buckets are created in a region, not in an AZ. It means that S3 is a region-based and fully managed service.
  • We can use tags in objects to group various objects and later retrieve them
  • S3 is a restful web service.
  • Tags can be used in cloud-trail, cloud-watch, and lifecycle management, etc.
  • The lifecycle of tiers means we can move them from one tier to another tier (this can be automated as well)
  • Use MFA for delete to make sure someone else does not delete data in bucket.
  • ACL is used to grant access at the fine-grain level on bucket.
  • First-byte latency is the time between requesting an object from the service and when that data starts to arrive. With S3 (for example) that time is measured in milliseconds and in many cases could be considered instant.
  • Cross-region replication will replicate the data across the regions.

  • Users can save data to the edge locations in case of transfer acceleration.
  • We can not install an operating system on S3.
  • We can create up to 100 buckets by default.
  • Buckets have sub-resources; which are the resources that can not exist on its own.

What are we geting charged for?

  • Object tagging
  • Storage
  • Requests
  • Transfer acceleration

Remember

  • Standard IA will charge every time we are going to use the data. It is usually used for backups etc.
  • Glacier has minimum storage for 3 months and Glacier deep archive has it for a minimum of 6 months.
  • For Intelligent tiering, it will put the data automatically to IA if it is not used for 30 days and there will be no archival feel associated with it as well (special case).
  • Object size less than 128 kb will not be moved by using the Intelligent tiering, it will remain in the standard tier.
  • S3 is a universal namespace, so it has to be unique.

S3 storage tiers

  • Standard: Standard S3 is the most expensive tier.. Designed for frequently accessed data and it stores data in a minimum of 3 Availability Zones
  • Infrequently Accessed (Standard- IA): Similar to Amazon S3 Standard but has a lower storage price and higher retrieval price.
  • One zone (less cost) as we use the only 1 zone to keep data.
  • One zone -IA: S3 One Zone-IA is intended for use cases with infrequently accessed data that is re-creatable, such as storing secondary backup copies of on-premises data or for storage that is already replicated in another AWS Region for compliance or disaster recovery purposes.
  • Intelligent tiering :
  • It will use ML to move data between tiers.
  • Requires a small monthly monitoring and automation fee per object
  • In the S3 Intelligent-Tiering storage class, Amazon S3 monitors objects’ access patterns. If you haven’t accessed an object for 30 consecutive days, Amazon S3 automatically moves it to the infrequent access tier, S3 Standard-IA. If you access an object in the infrequent access tier, Amazon S3 automatically moves it to the frequent access tier, S3 Standard.
  • S3 Glacier Instant Retrieval: You can retrieve objects stored in the S3 Glacier Instant Retrieval storage class within milliseconds, with the same performance as S3 Standard.
  • Glacier Flexible Retrival: It is used for data archival, mainly for compliance reasons. Retrival time is from minutes to hours
  • Glacier deep archive: Long term data archieve with retrival time < 12 hours
  • S3 Outposts: Amazon S3 Outposts delivers object storage to your on-premises AWS Outposts environment. Amazon S3 Outposts is designed to store data durably and redundantly across multiple devices and servers on your Outposts. It works well for workloads with local data residency requirements that must satisfy demanding performance needs by keeping data close to on-premises applications.

Tip

All other tiers except one-zone replicate data in 3 or more AZ's

Lifecycle Management

An S3 Lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects. There are two types of actions:

Transition actions – These actions define when objects transition to another storage class. For example, you might choose to transition objects to the S3 Standard-IA storage class 30 days after creating them, or archive objects to the S3 Glacier Flexible Retrieval storage class one year after creating them.

Expiration actions – These actions define when objects expire. Amazon S3 deletes expired objects on your behalf. Lifecycle expiration costs depend on when you choose to expire objects.

Structure of S3

S3 consist of the following:

  • Key: it is the object name.
  • Value: it is the actual data we are storing inside the object.
  • Version id
  • Metadata
  • Sub-resources
  • ACL
  • Torrents

Consistency in S3

  • Read after write: you can read after writing.

  • Eventual consistency for update and delete.

In S3-IA, we are charged a retrieval fee.

In S3 Glacier, we can configure the retrieval time from minutes to hours.

Prices are S3 > S3 IA > S3 intelligent tiering > S3 one zone > S3 glacier > S3 glacier deep archive

Policies

2 types of policies are

Resource-based policy

  • Access control list: used for objects
  • Bucket policy(JSON) and bucket ACL (XML)

User-based

  • ACL is legacy while bucket policy is new
  • Root user does not have the IAM policy
  • Deny access always wins.
  • You pay for storage, data replication, requests, management (monitoring)
  • Bucket policy work at bucket level which the ACL works at the object level.

Security

  • Encryption in transit is done using SSL or TLS
  • Encryption in transit is optional.

  • Server-side

    • S3 managed keys, SSE- S3 (USE AES-256)
    • Key management service (KMS)
    • The customer provided keys.
  • Client-side - use your own

    • Once versioning is enabled, It can not be disabled but can only be suspended. If we delete the files in the S3 bucket with versioning turned on, then it will place a delete marker. We can restore the files if we delete the delete marker.

Lifecycle rules can be used in conjunction with versioning.

File replication

Cross-region replication needs the versioning to be enabled for both the source and destination buckets. The existing files in the bucket which were added before the replication are not replicated, so we have to replicate them manually. The subsequently updated files will be replicated automatically. For the replicated files, if you put a delete marker/or delete a file, then the file is not deleted on the replicated.

S3 Transfer Acceleration

It uses the CloudFront to fasten the process of uploading. The users will first upload to the edge location from where the files are uploaded to bucked using Amazon’s backbone network.

!!! question 'Which policies should I use?' We should use the User policy or the bucket policy as it will help in providing us in access at a much fine-grained level. ACL’s are the legacy tech.

Access Control List

  • They are XML docs use to give access to both objects and the bucket.
  • Each bucket and the object has the ACL attached to it in the sub-resources part.
  • The default ACL provides full access to a resource owner
  • A grantee can be an AWS account or one of the predefined Amazon S3 groups.

{==

When an object is created, then the only ACL is created not the user policy or bucket policy.

==}

  • ACL’s can be used to grant permissions to pre-defined groups but not to an IAM user.
  • With ACL we can NOT provide the deny rules and conditional access. All we can do is provide the basic read/write permissions.

Canned ACL: Amazon S3 supports a set of predefined grants, known as canned ACLs. Each canned ACL has a predefined set of grantees and permissions. They are an easy way to grant access.

S3 pre-defined groups: Amazon S3 has a set of predefined groups. When granting account access to a group, you specify one of our URIs instead of a canonical user ID. We provide the following predefined groups:

  1. Authenticated Users group: all AWS accounts
  2. All Users group: authenticated and anonymous users.
  3. Log Delivery group

Tip

The canonical user ID is an alpha-numeric identifier, such as 79a59df900b949e55d96 , that is an obfuscated form of the AWS account ID. You can use this ID to identify an AWS account when granting cross-account access to buckets and objects using Amazon S3.

Server-side encryption

Server-side encryption is the encryption of data at its destination by the application or service that receives it. Amazon S3 encrypts your data at the object level as it writes it to disks in its data centers and decrypts it for you when you access it.

SSE-S3

  • Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3): When you use Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3), each object is encrypted with a unique - key. As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates.
  • All the objects are encrypted using different keys
  • We can not manage keys in this case

SSE-KMS

  • Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key Management Service (SSE-KMS): Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key - Management Service (SSE-KMS) is similar to SSE-S3, but with some additional benefits and charges for using this service. There are separate permissions for the use of a CMK that - provides added protection against unauthorized access of your objects in Amazon S3.
  • We can create and use data keys, master keys, and rotate keys as well.
  • It gives us a lot of control as a user can choose a key to encrypt the object.
  • We have access to data keys and master keys.
  • AWS does not have access to the keys in this case.
  • We can audit the use of KMS using Cloudtrail that shows when your CMK was used and by whom.

SSE-C

  • Server-Side Encryption with Customer-Provided Keys (SSE-C): With Server-Side Encryption with Customer-Provided Keys (SSE-C), you manage the encryption keys and Amazon S3 manages - the encryption, as it writes to disks and decryption when you access your objects.
  • The Key is managed by the user.
  • A user generates the key and uploads it with data.
  • Must use HTTPS to upload the objects
  • If we lose the key, so we lose the data.
  • S3 will discard the key after using it in this case.

S3 and EBS difference

S3 (Simple Storage Service) and EBS (Elastic Block Store) are two file storage services provided by Amazon. The main difference between them is with what they can be used with. EBS is specifically meant for EC2 (Elastic Computing Cloud) instances and is not accessible unless mounted to one. On the other hand, S3 is not limited to EC2. The files within an S3 bucket can be retrieved using HTTP protocols and even with BitTorrent. Many sites use S3 to hold most of their files because of its accessibility to HTTP clients; web browsers for example.

As already stated above, you need some type of software in order to read or write information with S3. With EBS, a volume can be mounted on an EC2 instance and it would appear just like a hard disk partition. It can be formatted with any file system and files can be written or read by the EC2 instance just like it would to a hard drive.

When it comes to the total amount that you can store, S3 still has the upper hand. EBS has a standard limit of 20 volumes with each volume holding up to 1TB of data. With S3, the standard limit is at 100 buckets with each bucket having an unlimited data capacity. S3 users do not need to worry about filling a bucket and the only concern is having enough buckets for your needs.

A limitation of EBS is its inability to be used by multiple instances at once. Once it is mounted by an instance, no other instance can use it. S3 can have multiple images of its contents so it can be used by many at the same time. An interesting side-effect of this capability is something called ‘eventual consistency’. With EBS, data read or write occurs almost instantly. With S3, the changes are not written immediately so if you write something, it may not be the data that a read operation returns.

Summary

  • EBS can only be used with EC2 instances while S3 can be used outside EC2.
  • EBS appears as a mountable volume while the S3 requires software to read and write data.
  • EBS can accommodate a smaller amount of data than S3.
  • EBS can only be used by one EC2 instance at a time while S3 can be used by multiple instances.
  • S3 typically experiences write delays while EBS does not as EBS is attached to an instance.

Limits

  • Until 2018 there was a hard limit on S3 puts of 100 PUTs per second. To achieve this care needed to be taken with the structure of the name Key to ensuring parallel processing. As of July 2018, the limit was raised to 3500 and the need for the Key design was basically eliminated.

Notes

  • The policy is a JSON doc.
  • IAM is Universal.
  • S3 is not suitable to install OS or a database as it is object-based.
  • By default, the buckets are private and we have to make them public.
  • We can log the requests made to the S3 and then later these logs can be sent to another account as well.
  • S3 supports bittorrent protocol to retrieve any publicaly available object using torrent files (peer to peer)

Some points to remember

  • Data is encrypted by the client and the encrypted data is uploaded in case of Client-side encryption.
  • HTTP and HTTPS are both enabled in S3 by default, but we can disable the HTTP by using the bucket policy.
  • HTTPS uses asymmetric encryption.
  • We can apply lifecycle rules to a whole bucket or a subset.
  • We can have 1000 lifecycle policies per bucket.
  • Lifecycle is defined as XML and is stored in the sub-resources section.
  • Lifecycle configuration on multi-factor authentication (MFA)-enabled buckets is NOT supported. This is because the MFA needs human intervention.
  • For glacier, we are charged for at least 90 days(3 months) and 180 days(6 months) for the deep archive.
  • The difference between Security Group and ACLs is that the Security Group acts as a firewall for associated Amazon EC2 instances, controlling both inbound and outbound traffic at the instance level, while ACLs act as a firewall for associated subnets, controlling both inbound and outbound traffic at the subnet level.
  • A particular folder cannot be tagged separately from other folders; only an entire bucket can be tagged.
  • With the exception of Glacier, retrieving data from the various S3 storage classes should be virtually identical (network issues notwithstanding).

Was this page helpful?
-->