How to download entire S3 bucket?

Nov 20, 2021 · 9 min read ·

Last Modified : Nov 20, 2021

Share on:

Amazon offers S3 Bucket where S3 is a short form of Simple Storage Service which can be used for text, images, web applications, data lake, and many more in a cost-effective way. Amazon AWS service provides GUI for managing S3 bucket which includes -

Viewing S3 Bucket content
Uploading the content to S3 bucket
Download any specific content from S3 bucket
Many other features for controlling the access and integration service and the list goes on

But AWS GUI does not provide an option for downloading the entire S3 bucket. Although you can download the individual items from the S3 bucket when you select more than one item then the Download option is disabled.

In this article, we are going to discuss the different ways to download the complete S3 bucket without using the AWS console. I have composed 4 different ways which can be used for downloading S3 bucket.

Using aws s3 sync cli(command line interface)
Using MinIO CLI
Using s3cmd CLI
Using cyberduck
Conclusion

1. Using aws s3 sync cli(command line interface)

One of the most recommended ways to download the S3 bucket content is by using the aws command line interface which is an official utility provided by Amazon.

If you haven't set up the AWS cli before then I would recommend reading this guide on - Install and Setup AWS CLI.

1.1 List the content of S3 bucket before downloading

Let’s list the content of S3 bucket before downloading it.

1aws s3 ls

The above command will list out all the S3 buckets available in your AWS account. For this guide, I have already created an S3 bucket with the name jhooq-test-bucket-1

1.2 Use aws S3 sync to download the files from S3 bucket

Now after listing out the S3 bucket items let's run the s3 sync command to download the content of the S3 bucket.

1aws s3 sync s3://jhooq-test-bucket-1 /home/vagrant/my-destination-directory

verify the content after sync
with aws s3 — verify the content after sync with aws s3

In the above mentioned aws s3 sync command we need to pass two arguments -

Name of the S3 bucket which we want to copy - s3://jhooq-test-bucket-1
Destination directory path - /home/vagrant/my-destination-directory

2. Using MinIO CLI

The next option we have is to use MinIO CLI for downloading the S3 bucket. MinIO is an open-source utility distributed under GNU AGPLv3.

To use this utility you first need to download and install it based on your operating system. Here is the link for download and installation. (*Note - I am using ubuntu so the following instructions and screenshots of taken from the Linux operating system.)

2.1 Download the MinIO using wget

1wget https://dl.min.io/client/mc/release/linux-amd64/mc

Here is the screenshot for your reference -

wget command to download MinIO for downloading the
S3 bucket — wget command to download MinIO for downloading the S3 bucket

Once you have downloaded the MinIO then you should get a file named mc onto your local drive.

2.2 Changed the mode of MinIO mc file and make it executable

For changing the mode run the chmod command on file mc.

1chmod +x mc 
2-rwxrwxr-x 1 vagrant vagrant 21786624 Nov  5 10:06 mc

2.3 Copy the file mc to /usr/local/bin/

After changing the file permission and making it executable now we need to copy the file to the location /user/local/bin

1sudo cp mc /usr/local/bin/

2.4 Set the alias for aws s3 bucket using Access Key and Secret Key

Now we have installed the MinIO onto our system but to work with MinIO you need to create an alias for s3 bucket using the aws Access Key and Secret Key.

You can find the AWS Access Key and Secret Key by logging into your AWS account and going into User Name -> Security Credentials -> Access Key

After getting the Access Key and Secret Key you can create an alias for AWS S3 Bucket alias with the following command.

1mc alias set s3 https://s3.amazonaws.com <ACCESS_KEY> <SECRET_KEY>
2Added `s3` successfully.

2.5 Verify the alias

Let's verify the alias by running the mc alias ls command

1mc alias ls

It should generate the following result -

 1gcs
 2  URL       : https://storage.googleapis.com
 3  AccessKey : YOUR-ACCESS-KEY-HERE
 4  SecretKey : YOUR-SECRET-KEY-HERE
 5  API       : S3v2
 6  Path      : dns
 7
 8local
 9  URL       : http://localhost:9000
10  AccessKey :
11  SecretKey :
12  API       :
13  Path      : auto
14
15play
16  URL       : https://play.min.io
17  AccessKey : Q3AM3UQ867SPQQA43P2F
18  SecretKey : zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG
19  API       : S3v4
20  Path      : auto
21
22s3
23  URL       : https://s3.amazonaws.com
24  AccessKey : XXXXXXXXXXXXXXXXXXXX
25  SecretKey : j8zDd9jORXXlqNT/YDl99LAAyQkWXXXXXXXXXXXX
26  API       : s3v4
27  Path      : auto

Here is the screenshot of the same -

2.6 Download the entire S3 bucket

After setting up the alias let's run the copy command for downloading the entire s3 bucket recursively.

1mc cp --recursive s3/jhooq-test-bucket-1 .
2...cket-1/test.txt:  146.56 KiB / 146.56 KiB┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 406.00 KiB/s 0s

download entire AWS S3 bucket
recursively using MinIO — download entire AWS S3 bucket recursively using MinIO

2.7 Verify the downloaded content of S3 Bucket

Let's verify the content of the S3 bucket using tree command.

1tree jhooq-test-bucket-1/
2jhooq-test-bucket-1/
3├── AGL Payment Receipt.pdf
4├── Screenshot 2021-10-28 at 23.45.36.png
5└── test.txt

3. Using s3cmd CLI

The next command-line tool which we are going to use is s3cmd cli. Let's start by installing the s3cmd cli.

Here is direct link for download or else you can use the following wget command.

1wget https://github.com/s3tools/s3cmd/archive/master.zip

 1--2021-11-13 15:49:50--  https://github.com/s3tools/s3cmd/archive/master.zip
 2Resolving github.com (github.com)... 140.82.121.3
 3Connecting to github.com (github.com)|140.82.121.3|:443... connected.
 4HTTP request sent, awaiting response... 302 Found
 5Location: https://codeload.github.com/s3tools/s3cmd/zip/master [following]
 6--2021-11-13 15:49:50--  https://codeload.github.com/s3tools/s3cmd/zip/master
 7Resolving codeload.github.com (codeload.github.com)... 140.82.121.9
 8Connecting to codeload.github.com (codeload.github.com)|140.82.121.9|:443... connected.
 9HTTP request sent, awaiting response... 200 OK
10Length: unspecified [application/zip]
11Saving to: ‘master.zip’
12
13master.zip                       [   <=>                                         ] 465.66K  1.07MB/s    in 0.4s
14
152021-11-13 15:49:51 (1.07 MB/s) - ‘master.zip’ saved [476833]

Download s3cmd cli using before downloading the s3
bucket — Download s3cmd cli using before downloading the s3 bucket

3.1 Install python for installing the s3cmd

The next tool which you need to install is python because s3cmd tool can not be installed without python, so in case if you do not have python installed onto your system then use the following command for installing the python onto your system.

1sudo apt-get install python-pip

3.2 Unzip the master.zip

After downloading the s3cmd and installing the python. Let's unzip the mater.zip containing the s3cmd.

1unzip master.zip

3.3 Install the s3cmd cli

Now install the s3cmd command-line interface tool by running the following install command.

1cd /home/vagrant/s3cmd/s3cmd-master/s3cmd
2sudo pip install s3cmd

 1The directory '/home/vagrant/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
 2The directory '/home/vagrant/.cache/pip' or its parent directory is not owned by the current user and caching wheels have been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
 3Collecting s3cmd
 4  Downloading https://files.pythonhosted.org/packages/1e/88/9630c6e894575f03c1685104a6562a31ecf9e82b5b687d8516445a051fbe/s3cmd-2.2.0-py2.py3-none-any.whl (153kB)
 5    100% |████████████████████████████████| 163kB 4.9MB/s
 6Collecting python-dateutil (from s3cmd)
 7  Downloading https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl (247kB)
 8    100% |████████████████████████████████| 256kB 3.8MB/s
 9Collecting python-magic (from s3cmd)
10  Downloading https://files.pythonhosted.org/packages/d3/99/c89223c6547df268596899334ee77b3051f606077317023617b1c43162fb/python_magic-0.4.24-py2.py3-none-any.whl
11Requirement already satisfied: six>=1.5 in /usr/lib/python2.7/dist-packages (from python-dateutil->s3cmd)
12Installing collected packages: python-dateutil, python-magic, s3cmd
13Successfully installed python-dateutil-2.8.2 python-magic-0.4.24 s3cmd-2.2.0

3.4 Configure the s3cmd cli using AWS access key and secret key

Run the s3cmd configure command for setting up Access Key and Secret Key

1s3cmd --configure

Few important points to keep in mind after running the s3cmd --configure command, it will ask for the following -

Access Key of AWS
Secret Key of AWS
Region

Supply all those details so that s3cmd can be configured.

Here are sample examples which i have taken out while configuring the s3cmd

 1Enter new values or accept defaults in brackets with Enter.
 2Refer to user manual for detailed description of all options.
 3
 4Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
 5Access Key: <YOUR_ACCESS_KEY>
 6Secret Key: <YOUR_SECRET_KEY>
 7Default Region [US]: eu-central-1
 8
 9Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
10S3 Endpoint [s3.amazonaws.com]: s3.amazonaws.com
11
12Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
13if the target S3 system supports dns based buckets.
14DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]:
15
16Encryption password is used to protect your files from reading
17by unauthorized persons while in transfer to S3
18Encryption password:
19Path to GPG program [/usr/bin/gpg]:
20
21When using secure HTTPS protocol all communication with Amazon S3
22servers is protected from 3rd party eavesdropping. This method is
23slower than plain HTTP, and can only be proxied with Python 2.7 or newer
24Use HTTPS protocol [Yes]: Yes
25
26On some networks all internet access must go through a HTTP proxy.
27Try setting it here if you can't connect to S3 directly
28HTTP Proxy server name:
29
30New settings:
31  Access Key: <YOUR_ACCESS_KEY>
32  Secret Key: <YOUR_SECRET_KEY>
33  Default Region: eu-central-1
34  S3 Endpoint: s3.amazonaws.com
35  DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.amazonaws.com
36  Encryption password:
37  Path to GPG program: /usr/bin/gpg
38  Use HTTPS protocol: True
39  HTTP Proxy server name:
40  HTTP Proxy server port: 0
41
42Test access with supplied credentials? [Y/n] y
43Please wait, attempting to list all buckets...
44Success. Your access key and secret key worked fine :-)
45
46Now verifying that encryption works...
47Not configured. Never mind.
48
49Save settings? [y/N] Y
50Configuration saved to '/home/vagrant/.s3cfg'

3.5 Download the entire S3 bucket recursively using s3cmd

Now run the copy command to download the entire S3 bucket recursively.

1s3cmd get --recursive s3://jhooq-test-bucket-1 /home/vagrant/s3-bucket/

You need to supply here two things -

S3 Bucket name .i.e. - s3://jhooq-test-bucket-1
Destination Directory - /home/vagrant/s3-bucket/

Here is the console output of the s3cmd downloading s3 bucket from AWS -

1download: 's3://jhooq-test-bucket-1/AGL Payment Receipt.pdf' -> '/home/vagrant/s3-bucket/AGL Payment Receipt.pdf'  [1 of 3]
2 52562 of 52562   100% in    0s   505.08 KB/s  done
3download: 's3://jhooq-test-bucket-1/Screenshot 2021-10-28 at 23.45.36.png' -> '/home/vagrant/s3-bucket/Screenshot 2021-10-28 at 23.45.36.png'  [2 of 3]
4 97511 of 97511   100% in    0s   448.68 KB/s  done
5download: 's3://jhooq-test-bucket-1/test.txt' -> '/home/vagrant/s3-bucket/test.txt'  [3 of 3]

Here is the screenshot after downloading the entire S3 bucket -

Verify the content of s3 bucket after
downloading it entirely from s3 bucket — Verify the content of s3 bucket after downloading it entirely from s3 bucket

4. Using cyberduck

The next and the final tool which I would recommend is to use cyberduck for downloading entire s3 bucket. This is a GUI tool for managing the S3 Bucket. So if you are not a big fan of using command line tool then I guess cyberduck is going to be a good fit for you.

4.1 Download and Install the cyberduck

You can install the Cyberduck from here. The Cyberduck installation utility is pretty simple and it should not take much time to install onto your operating system.

4.2 Add S3 bucket to Cyberduck

After installing the Cyberduck you need to add the s3 to it. Here is a screenshot for adding s3 bucket to Cyberduck.

Cyberduck add access key and
secret key of the AWS account — Cyberduck add access key and secret key of the AWS account

After successfully adding S3 bucket
to Cyberduck — After successfully adding S3 bucket to Cyberduck

4.3 Browse and Download the Content of S3 bucket using Cyberduck

You can easily browse the content of S3 bucket after adding it to Cyberduck.

Browse the content of S3 bucket
after adding it successfully — Browse the content of S3 bucket after adding it successfully

5. Conclusion

Hope this post will help you to manage your s3 bucket either by using command-line tools such as - aws cli, MinIO, s3cmd and if you are big fan of UI then i think cyberduck is going to be the best fit for you.

But being a developer and I would still recommend sticking with command-line tool because it provides you with more programmatic and automation feature which you can achieve in combination with Shell script or Ansible