Using AWS for Huge ZIP Files

How to use AWS EC2 instances for large file compression I use Google Drive to store my data. I also use Dropbox and Microsoft OneDrive but like the integration of Google Drive with Inbox by Google and other Google apps that I use. In order to backup my data I’ve used Datto Backupify for several years and all had appeared to be going well.

Backupify Retore Fails

For some reason I lost much of my Google Drive data earlier this year. I went to restore my data from Backupify but numerous attempts at a restore failed. Backupify have kept me updated for the past 120 days or so and eventually stated that they couldn’t restore my data in the correct hierarchy (i.e. folders) into Google Drive and so I agreed to receive my backup in a compressed file.

For the technically inclined, what I asked to be restored by Backupify was a single users Google Drive data which totaled between 300 and 400 GB. It of course worries me that a company that specializes in backing up data can’t restore a single persons data in +- 120 days.

Three days ago my compressed file of my Google Drive data was available for my download. The issue was that the ZIP file size was about 850 GB in size (which is curious as my total file sizes in Google Drive was just over 300 GB).

AWS for Inexpensive Computing and Networking

On a home or work computer and network it would take ages to download a 850 GB file. Unzipping the file would then likely cause a massive headache for the average Windows, Apple or Linux computer. Here are the options I chose in downloading and then extracting files from such a large ZIP file.

Resizing EC2 Instances to reduce costs

Fortunately AWS EC2 computing instances are available in all sorts of sizing. I initiated a t2.micro instance with Windows Server 2012 (I could just as easily have used a Linux AMI) to download the file from Backupify (who had stored the massive compressed file in AWS S3).

I used the Firefox browser to download the ZIP file and there is just no point in paying for a powerful computer just for downloading.

Extracting from huge ZIP files

Once the 850 GB files had downloaded to the EC2 instance I was ready to extract the files. More memory and processing power would be needed therefore I upgraded the t2.micro to a t2.large instance which is alot more costly to run than the smaller server size.

I’ve used 7Zip for compressing and extracting files in the past and it can handle very large files. I used the 7Zip GUI for the ZIP extraction and it ran at about 39 MB/s which I was more than happy with.

The easy and quick way to copy from AWS EC2 to S3

Extracting the files only took a few hours and then I decided to copy the extracted files and folders to S3 so that I wouldn’t be paying for computing power (i.e. an EC2 instance) when all I needed was to store the files and folders. For the copy from EC2 to S3 I decided to resize my EC2 instance down to the least powerful EC2 instance, a t1.micro. In hindsight I should have chosen a more powerful instance as, during the copy, the Intel Xeon 2 GHz processor was running at 100% with memory at about 75%.

There are various software available that assist in copying friles from EC2 to S3 but I decided to use the AWS CLI. It is simple to use and takes advantage of multithreading which helps transfer data quicker than many other applications. If you are familiar with basic DOS commands then the CLI is simple to understand.

Here are the steps needed to use AWS CLI to copy from EC2 to S3 on Windows Server 12:

Download and configure the AWS CLI in your EC2 instance.
In the command prompt, go to the location of the folders and files you want copied.
Enter the following in the command prompt (by change to your directory and bucket name:
aws s3 cp my-directory s3://my-s3bucket-name -- recursive
Press Enter on your keyboard.

AWS really does make networking, storage and computing tasks simple and inexpensive for those rare occasions where we need to copy and extract from large compressed files.

How to use AWS to Extract Massive ZIP Files

Backupify Retore Fails

AWS for Inexpensive Computing and Networking

Resizing EC2 Instances to reduce costs

Extracting from huge ZIP files

The easy and quick way to copy from AWS EC2 to S3

Watch Gary in action

Customer Rating

Mailchimp Pro Partners

Contact

Backupify Retore Fails

AWS for Inexpensive Computing and Networking

Resizing EC2 Instances to reduce costs

Extracting from huge ZIP files

The easy and quick way to copy from AWS EC2 to S3

Found this useful? Please share:

Watch Gary in action

Reader Interactions

Comments

Leave a ReplyCancel reply

Footer

Customer Rating

Mailchimp Pro Partners

Contact

Discover more from OrganicWeb