Backupify Retore Fails
For some reason I lost much of my Google Drive data earlier this year. I went to restore my data from Backupify but numerous attempts at a restore failed. Backupify have kept me updated for the past 120 days or so and eventually stated that they couldn’t restore my data in the correct hierarchy (i.e. folders) into Google Drive and so I agreed to receive my backup in a compressed file.
For the technically inclined, what I asked to be restored by Backupify was a single users Google Drive data which totaled between 300 and 400 GB. It of course worries me that a company that specializes in backing up data can’t restore a single persons data in +- 120 days.
Three days ago my compressed file of my Google Drive data was available for my download. The issue was that the ZIP file size was about 850 GB in size (which is curious as my total file sizes in Google Drive was just over 300 GB).
AWS for Inexpensive Computing and Networking
On a home or work computer and network it would take ages to download a 850 GB file. Unzipping the file would then likely cause a massive headache for the average Windows, Apple or Linux computer. Here are the options I chose in downloading and then extracting files from such a large ZIP file.
Resizing EC2 Instances to reduce costs
Fortunately AWS EC2 computing instances are available in all sorts of sizing. I initiated a t2.micro instance with Windows Server 2012 (I could just as easily have used a Linux AMI) to download the file from Backupify (who had stored the massive compressed file in AWS S3).
I used the Firefox browser to download the ZIP file and there is just no point in paying for a powerful computer just for downloading.
Extracting from huge ZIP files
Once the 850 GB files had downloaded to the EC2 instance I was ready to extract the files. More memory and processing power would be needed therefore I upgraded the t2.micro to a t2.large instance which is alot more costly to run than the smaller server size.
I’ve used 7Zip for compressing and extracting files in the past and it can handle very large files. I used the 7Zip GUI for the ZIP extraction and it ran at about 39 MB/s which I was more than happy with.
The easy and quick way to copy from AWS EC2 to S3
Extracting the files only took a few hours and then I decided to copy the extracted files and folders to S3 so that I wouldn’t be paying for computing power (i.e. an EC2 instance) when all I needed was to store the files and folders. For the copy from EC2 to S3 I decided to resize my EC2 instance down to the least powerful EC2 instance, a t1.micro. In hindsight I should have chosen a more powerful instance as, during the copy, the Intel Xeon 2 GHz processor was running at 100% with memory at about 75%.
There are various software available that assist in copying friles from EC2 to S3 but I decided to use the AWS CLI. It is simple to use and takes advantage of multithreading which helps transfer data quicker than many other applications. If you are familiar with basic DOS commands then the CLI is simple to understand.
Here are the steps needed to use AWS CLI to copy from EC2 to S3 on Windows Server 12:
- Download and configure the AWS CLI in your EC2 instance.
- In the command prompt, go to the location of the folders and files you want copied.
- Enter the following in the command prompt (by change to your directory and bucket name:
aws s3 cp my-directory s3://my-s3bucket-name -- recursive
- Press Enter on your keyboard.
AWS really does make networking, storage and computing tasks simple and inexpensive for those rare occasions where we need to copy and extract from large compressed files.
I have been doing the same over the last 3 days and I am still not sure if this is the optimal way to go. To much hassle