How to download entire S3 bucket?
Amazon offers S3 Bucket where S3 is a short form of Simple Storage Service which can be used for text, images, web applications, data lake, and many more in a cost-effective way. Amazon AWS service provides GUI for managing S3 bucket which includes -
- Viewing S3 Bucket content
- Uploading the content to S3 bucket
- Download any specific content from S3 bucket
- Many other features for controlling the access and integration service and the list goes on
But AWS GUI does not provide an option for downloading the entire S3 bucket. Although you can download the individual items from the S3 bucket when you select more than one item then the Download option is disabled.
In this article, we are going to discuss the different ways to download the complete S3 bucket without using the AWS console. I have composed 4 different ways which can be used for downloading S3 bucket.
- Using aws s3 sync cli(command line interface)
- Using MinIO CLI
- Using s3cmd CLI
- Using cyberduck
- Conclusion
1. Using aws s3 sync cli(command line interface)
One of the most recommended ways to download the S3 bucket content is by using the aws command line interface which is an official utility provided by Amazon.
If you haven't set up the AWS cli before then I would recommend reading this guide on - Install and Setup AWS CLI.
1.1 List the content of S3 bucket before downloading
Let’s list the content of S3 bucket before downloading it.
1aws s3 ls
The above command will list out all the S3 buckets available in your AWS account. For this guide, I have already created an S3 bucket with the name jhooq-test-bucket-1
1.2 Use aws S3 sync to download the files from S3 bucket
Now after listing out the S3 bucket items let's run the s3 sync command to download the content of the S3 bucket.
1aws s3 sync s3://jhooq-test-bucket-1 /home/vagrant/my-destination-directory
In the above mentioned aws s3 sync command we need to pass two arguments -
- Name of the S3 bucket which we want to copy - s3://jhooq-test-bucket-1
- Destination directory path - /home/vagrant/my-destination-directory
2. Using MinIO CLI
The next option we have is to use MinIO CLI for downloading the S3 bucket. MinIO is an open-source utility distributed under GNU AGPLv3.
To use this utility you first need to download and install it based on your operating system. Here is the link for download and installation. (*Note - I am using ubuntu so the following instructions and screenshots of taken from the Linux operating system.)
2.1 Download the MinIO using wget
1wget https://dl.min.io/client/mc/release/linux-amd64/mc
Here is the screenshot for your reference -
Once you have downloaded the MinIO then you should get a file named mc onto your local drive.
2.2 Changed the mode of MinIO mc file and make it executable
For changing the mode run the chmod command on file mc.
1chmod +x mc
2-rwxrwxr-x 1 vagrant vagrant 21786624 Nov 5 10:06 mc
2.3 Copy the file mc to /usr/local/bin/
After changing the file permission and making it executable now we need to copy the file to the location /user/local/bin
1sudo cp mc /usr/local/bin/
2.4 Set the alias for aws s3 bucket using Access Key and Secret Key
Now we have installed the MinIO onto our system but to work with MinIO you need to create an alias for s3 bucket using the aws Access Key and Secret Key.
You can find the AWS Access Key and Secret Key by logging into your AWS account and going into User Name -> Security Credentials -> Access Key
After getting the Access Key and Secret Key you can create an alias for AWS S3 Bucket alias with the following command.
1mc alias set s3 https://s3.amazonaws.com <ACCESS_KEY> <SECRET_KEY>
2Added `s3` successfully.
2.5 Verify the alias
Let's verify the alias by running the mc alias ls command
1mc alias ls
It should generate the following result -
1gcs
2 URL : https://storage.googleapis.com
3 AccessKey : YOUR-ACCESS-KEY-HERE
4 SecretKey : YOUR-SECRET-KEY-HERE
5 API : S3v2
6 Path : dns
7
8local
9 URL : http://localhost:9000
10 AccessKey :
11 SecretKey :
12 API :
13 Path : auto
14
15play
16 URL : https://play.min.io
17 AccessKey : Q3AM3UQ867SPQQA43P2F
18 SecretKey : zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG
19 API : S3v4
20 Path : auto
21
22s3
23 URL : https://s3.amazonaws.com
24 AccessKey : XXXXXXXXXXXXXXXXXXXX
25 SecretKey : j8zDd9jORXXlqNT/YDl99LAAyQkWXXXXXXXXXXXX
26 API : s3v4
27 Path : auto
Here is the screenshot of the same -
2.6 Download the entire S3 bucket
After setting up the alias let's run the copy command for downloading the entire s3 bucket recursively.
1mc cp --recursive s3/jhooq-test-bucket-1 .
2...cket-1/test.txt: 146.56 KiB / 146.56 KiB┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 406.00 KiB/s 0s
2.7 Verify the downloaded content of S3 Bucket
Let's verify the content of the S3 bucket using tree command.
1tree jhooq-test-bucket-1/
2jhooq-test-bucket-1/
3├── AGL Payment Receipt.pdf
4├── Screenshot 2021-10-28 at 23.45.36.png
5└── test.txt
3. Using s3cmd CLI
The next command-line tool which we are going to use is s3cmd cli. Let's start by installing the s3cmd cli.
Here is direct link for download or else you can use the following wget command.
1wget https://github.com/s3tools/s3cmd/archive/master.zip
1--2021-11-13 15:49:50-- https://github.com/s3tools/s3cmd/archive/master.zip
2Resolving github.com (github.com)... 140.82.121.3
3Connecting to github.com (github.com)|140.82.121.3|:443... connected.
4HTTP request sent, awaiting response... 302 Found
5Location: https://codeload.github.com/s3tools/s3cmd/zip/master [following]
6--2021-11-13 15:49:50-- https://codeload.github.com/s3tools/s3cmd/zip/master
7Resolving codeload.github.com (codeload.github.com)... 140.82.121.9
8Connecting to codeload.github.com (codeload.github.com)|140.82.121.9|:443... connected.
9HTTP request sent, awaiting response... 200 OK
10Length: unspecified [application/zip]
11Saving to: ‘master.zip’
12
13master.zip [ <=> ] 465.66K 1.07MB/s in 0.4s
14
152021-11-13 15:49:51 (1.07 MB/s) - ‘master.zip’ saved [476833]
3.1 Install python for installing the s3cmd
The next tool which you need to install is python because s3cmd tool can not be installed without python, so in case if you do not have python installed onto your system then use the following command for installing the python onto your system.
1sudo apt-get install python-pip
3.2 Unzip the master.zip
After downloading the s3cmd and installing the python. Let's unzip the mater.zip containing the s3cmd.
1unzip master.zip
3.3 Install the s3cmd cli
Now install the s3cmd command-line interface tool by running the following install command.
1cd /home/vagrant/s3cmd/s3cmd-master/s3cmd
2sudo pip install s3cmd
1The directory '/home/vagrant/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
2The directory '/home/vagrant/.cache/pip' or its parent directory is not owned by the current user and caching wheels have been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
3Collecting s3cmd
4 Downloading https://files.pythonhosted.org/packages/1e/88/9630c6e894575f03c1685104a6562a31ecf9e82b5b687d8516445a051fbe/s3cmd-2.2.0-py2.py3-none-any.whl (153kB)
5 100% |████████████████████████████████| 163kB 4.9MB/s
6Collecting python-dateutil (from s3cmd)
7 Downloading https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl (247kB)
8 100% |████████████████████████████████| 256kB 3.8MB/s
9Collecting python-magic (from s3cmd)
10 Downloading https://files.pythonhosted.org/packages/d3/99/c89223c6547df268596899334ee77b3051f606077317023617b1c43162fb/python_magic-0.4.24-py2.py3-none-any.whl
11Requirement already satisfied: six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->s3cmd)
12Installing collected packages: python-dateutil, python-magic, s3cmd
13Successfully installed python-dateutil-2.8.2 python-magic-0.4.24 s3cmd-2.2.0
3.4 Configure the s3cmd cli using AWS access key and secret key
Run the s3cmd configure command for setting up Access Key and Secret Key
1s3cmd --configure
Few important points to keep in mind after running the s3cmd --configure command, it will ask for the following -
- Access Key of AWS
- Secret Key of AWS
- Region
Supply all those details so that s3cmd can be configured.
Here are sample examples which i have taken out while configuring the s3cmd
1Enter new values or accept defaults in brackets with Enter.
2Refer to user manual for detailed description of all options.
3
4Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
5Access Key: <YOUR_ACCESS_KEY>
6Secret Key: <YOUR_SECRET_KEY>
7Default Region [US]: eu-central-1
8
9Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
10S3 Endpoint [s3.amazonaws.com]: s3.amazonaws.com
11
12Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
13if the target S3 system supports dns based buckets.
14DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]:
15
16Encryption password is used to protect your files from reading
17by unauthorized persons while in transfer to S3
18Encryption password:
19Path to GPG program [/usr/bin/gpg]:
20
21When using secure HTTPS protocol all communication with Amazon S3
22servers is protected from 3rd party eavesdropping. This method is
23slower than plain HTTP, and can only be proxied with Python 2.7 or newer
24Use HTTPS protocol [Yes]: Yes
25
26On some networks all internet access must go through a HTTP proxy.
27Try setting it here if you can't connect to S3 directly
28HTTP Proxy server name:
29
30New settings:
31 Access Key: <YOUR_ACCESS_KEY>
32 Secret Key: <YOUR_SECRET_KEY>
33 Default Region: eu-central-1
34 S3 Endpoint: s3.amazonaws.com
35 DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.amazonaws.com
36 Encryption password:
37 Path to GPG program: /usr/bin/gpg
38 Use HTTPS protocol: True
39 HTTP Proxy server name:
40 HTTP Proxy server port: 0
41
42Test access with supplied credentials? [Y/n] y
43Please wait, attempting to list all buckets...
44Success. Your access key and secret key worked fine :-)
45
46Now verifying that encryption works...
47Not configured. Never mind.
48
49Save settings? [y/N] Y
50Configuration saved to '/home/vagrant/.s3cfg'
3.5 Download the entire S3 bucket recursively using s3cmd
Now run the copy command to download the entire S3 bucket recursively.
1s3cmd get --recursive s3://jhooq-test-bucket-1 /home/vagrant/s3-bucket/
You need to supply here two things -
- S3 Bucket name .i.e. - s3://jhooq-test-bucket-1
- Destination Directory - /home/vagrant/s3-bucket/
Here is the console output of the s3cmd downloading s3 bucket from AWS -
1download: 's3://jhooq-test-bucket-1/AGL Payment Receipt.pdf' -> '/home/vagrant/s3-bucket/AGL Payment Receipt.pdf' [1 of 3]
2 52562 of 52562 100% in 0s 505.08 KB/s done
3download: 's3://jhooq-test-bucket-1/Screenshot 2021-10-28 at 23.45.36.png' -> '/home/vagrant/s3-bucket/Screenshot 2021-10-28 at 23.45.36.png' [2 of 3]
4 97511 of 97511 100% in 0s 448.68 KB/s done
5download: 's3://jhooq-test-bucket-1/test.txt' -> '/home/vagrant/s3-bucket/test.txt' [3 of 3]
Here is the screenshot after downloading the entire S3 bucket -
4. Using cyberduck
The next and the final tool which I would recommend is to use cyberduck for downloading entire s3 bucket. This is a GUI tool for managing the S3 Bucket. So if you are not a big fan of using command line tool then I guess cyberduck is going to be a good fit for you.
4.1 Download and Install the cyberduck
You can install the Cyberduck from here. The Cyberduck installation utility is pretty simple and it should not take much time to install onto your operating system.
4.2 Add S3 bucket to Cyberduck
After installing the Cyberduck you need to add the s3 to it. Here is a screenshot for adding s3 bucket to Cyberduck.
4.3 Browse and Download the Content of S3 bucket using Cyberduck
You can easily browse the content of S3 bucket after adding it to Cyberduck.
5. Conclusion
Hope this post will help you to manage your s3 bucket either by using command-line tools such as - aws cli, MinIO, s3cmd and if you are big fan of UI then i think cyberduck is going to be the best fit for you.
But being a developer and I would still recommend sticking with command-line tool because it provides you with more programmatic and automation feature which you can achieve in combination with Shell script or Ansible