What is persistent storage in docker and how to manage it
When working with containers, one major point of concern is the persistent storage of data. Container storage is ephemeral, meaning that the data written by the containerized application is not preserved if the container is removed. Containerized applications work on the assumption that they always start with empty storage.
Container Image layers
By design, container images are immutable and layered. The running container processes use an immutable view of the container image. This allows us to launch multiple containers which can reuse the same image simultaneously. The images are composed of several layers that add or override the contents of layers below them.
A running container gets a new layer over its base container image, and this layer is the container storage. At first, this layer is the only read/write storage available for the container, and it is used to create all the files required by the container, such as logs and temporary files. These files are volatile. The containerized application does not stop working if they are lost. The container storage layer is exclusive to the running container, so if another container is created from the same base image, it gets another read/write layer. This ensures that each container's resources are isolated from other containers launched using the same image.
Most applications require data to be stored persistently and the default transient nature of container storage is inadequate in such cases. For instance, stateful applications such as databases, require permanent storage as they need to maintain data over restarts.
Persistent storage options for Docker
Docker offers the following options to store data persistently. We’ll explain each approach with examples.
- Directory mounts (also called Bind mounts)
- Named volumes
- Data containers (deprecated)
- Cloud storage
- Storage plugins
Any Linux distribution with docker installation will work for the examples shown in this tutorial. Internet access will be required for downloading container images. For the cloud storage portion, access to an AWS account is required.
1. Directory mounts
Directory mounts are also called bind mounts. Docker can mount any directory on the host inside a running container. The containerized application will see this host directory as part of the container storage. When the container stops or is deleted, the contents of these host directories are not reclaimed, and they can be mounted to new containers whenever needed.
For instance, a database container can use a host directory to store database files. If this database the container gets deleted, we can create a new container and use the same host directory, keeping the database data available to client applications. To the database container, it does not matter where this host directory is stored from the host point of view; it could be anything from a local hard disk partition to a remote networked file system.
Let’s see this in action. We’re going to create a MySQL container named “db1”, and mount the /data directory on the host inside the container on /var/lib/mysql.
1docker run --name db1 -d -v /data:/var/lib/mysql -e MYSQL_USER=user1 -e MYSQL_PASSWORD=password -e MYSQL_DATABASE=tempdb -e MYSQL_ROOT_PASSWORD=password mysql
Now that the “db1” container is up, let’s populate our database with some dummy data.
1docker exec -it db1 /usr/bin/mysql -u user1 tempdb -ppassword -e "CREATE TABLE dummy (id int(10) NOT NULL, name varchar(255) DEFAULT NULL, code varchar(255) DEFAULT NULL, PRIMARY KEY (id));" -e "insert into dummy (id, name, code) values (1,'John','XYZ');"
1docker exec -it db1 /usr/bin/mysql -u user1 tempdb -ppassword -e "select * from dummy"
After verifying that the database has been populated with dummy data, we’ll now stop and delete the “db1” container.
1docker stop db1; docker rm db1
We’re now going to create a second container, “db2” using the same mysql image, and mount the /data directory on the host inside the container on /var/lib/mysql.
1docker run --name db2 -d -v /data:/var/lib/mysql -e MYSQL_USER=user1 -e MYSQL_PASSWORD=password -e MYSQL_DATABASE=tempdb -e MYSQL_ROOT_PASSWORD=password mysql
Let’s see if our database is still populated with the dummy data.
1docker exec -it db2 /usr/bin/mysql -u user1 tempdb -ppassword -e "select * from dummy"
As you can see, the data is still intact. This is because it was saved on the /data directory on the host and not inside the container.
2. Named volumes
Named volumes are the recommended approach for creating persistent storage for containers. These volumes provide a mechanism for saving data outside the life cycle of a container. These volumes can also be mounted in more than one container. This allows for sharing of data and connecting new containers to existing storage.
Docker named volumes work by creating a directory on the host machine and then mounting that directory into a container. This approach may sound similar to the directory mount method described above, but there are a few major differences. When using named volumes, a new directory is created within Docker's storage directory on the host machine (usually inside /var/lib/docker/volumes/). The content of this storage directory is managed by Docker itself. It is independent of the directory structure of the host machine, so it is extremely unlikely that another process can use this directory for writing any data. This offers an extra layer of security for containerized applications. There is also the added advantage that named volumes can be managed through docker commands.
Example Let’s create a volume named “datavolume” and see it’s details.
1docker volume create datavolume 2docker volume ls 3docker volume inspect datavolume
It can be seen that docker creates volumes inside the /var/lib/docker/volumes directory on the host.
Let’s use the same example of mysql database to implement the concept of named volumes. Create a mysql container “db3”, and mount the “datavol” volume on the /var/lib/mysql directory inside the container.
1docker run --name db3 -d -v datavol:/var/lib/mysql -e MYSQL_USER=user1 -e MYSQL_PASSWORD=password -e MYSQL_DATABASE=datavoldb -e MYSQL_ROOT_PASSWORD=password mysql
Populate the database with some data.
1docker exec -it db3 /usr/bin/mysql -u user1 datavoldb -ppassword -e "CREATE TABLE dummy (id int(10) NOT NULL, name varchar(255) DEFAULT NULL, PRIMARY KEY (id));" -e "insert into dummy (id, name) values (1,'Docker Volumes');"
1docker exec -it db3 /usr/bin/mysql -u user1 datavoldb -ppassword -e "select * from dummy"
Stop and delete the “db3” container.
1docker stop db3; docker rm db3
We’ll now launch a new container, “db4” and attach the “datavol” volume to it. Once the container has started, verify that the data created earlier is still preserved in the database.
1docker run --name db4 -d -v datavol:/var/lib/mysql -e MYSQL_USER=user1 -e MYSQL_PASSWORD=password -e MYSQL_DATABASE=datavoldb -e MYSQL_ROOT_PASSWORD=password mysql
1docker exec -it db4 /usr/bin/mysql -u user1 datavoldb -ppassword -e "select * from dummy"
As evident from above, data is still present in the database as it was stored on the named volume.
3. Data Containers
There’s another approach for saving data persistently, called data containers. This approach has been deprecated in favor of docker named volumes explained above. Nevertheless, this was used in the earlier days of docker when volumes were not available. This method involves using a dedicated container for storing data. This container doesn’t need to run on the host, it merely needs to exist. The sole job of this container is to save data. The containers running the actual application don’t need to know where the data is located on the disk, they only require the name of the data container. The major advantage of this approach was that access to application data was managed by docker and any other process was less likely to affect the data.
Let’s see this through an example. We’ll first create a data container and specify a volume option. This is where our application containers will be writing data. We can use any container as a data container. This example uses alpine image as it is very lightweight in size.
1docker create -v /var/www/html --name datacontainer alpine
Create a sample file and copy it to the data container.
1echo "Data Container!" > index.html 2 3docker cp index.html datacontainer:/var/www/html
Now, let’s launch an httpd container using the “--volumes-from” option. This option will ensure that the httpd container uses the volumes from the alpine data container we created earlier.
1docker run --name http -d --volumes-from datacontainer httpd 2 3docker exec -it http bash 4 5cat /var/www/html/index.html
As can be seen, the http container can see data present on the alpine data container.
4. Cloud storage
There are many cloud providers available who offer different file, block and object based storage solutions. Docker can also work with those storage solutions. Using cloud based storage options offers another method of backing up your application data when running containers. For instance, Amazon offers an object storage based solution called S3. We can mount an S3 bucket created in our AWS account on the host system. That directory can then be mounted inside a container similar to the bind mount approach discussed earlier. One advantage of this approach is that the user can implement policies in their AWS account to allow or limit access at bucket level. This adds another layer of flexibility to the docker platform which allows it to share common data between any number of containers.
We’ll use Amazon’s S3 example to implement this. We have the following S3 bucket, “aws-s3-bucket-docker” in our account which contains a sample “index.html” file. You’ll need to create an AWS S3 bucket for this example. You can follow this link to create an S3 bucket in AWS.
We’re first going to mount this bucket in our host OS. Since AWS S3 is object-based storage, it cannot be mounted locally using traditional file system structures. We’ll need to mount it using the FUSE file system. We’ll be using the s3fs interface which will allow us to mount an Amazon S3 bucket as a local file system. First, install the s3fs package as follows. For CentOS/RHEL bases systems, this package can be installed using the EPEL repository.
1sudo apt-get install s3fs
You’ll need the access ID and secret key for the AWS account. You can follow this link for creating a user account and access key. Store your access ID and secret key in a file. You can restrict permissions on this file as this contains sensitive information. The access and secret key should be added in the following format:
Mount the S3 bucket as follows. Make sure to specify the correct name and path of the file which contains access ID and secret key. Note that the file system type is fuse.
1s3fs aws-s3-bucket-docker /s3 -o passwd_file=.s3fs-access,nonempty
Once the bucket has been mounted on the host, it can be mounted inside the container just like a normal directory mount.
5. Storage Plugins
Over the years, several enhancements have been made in the docker API which has made it possible to connect it with external storage platforms. These storage plugins extend the basic storage capabilities of docker and allow it to integrate with the latest storage technologies. There are several plugins available for different cloud and enterprise storage platforms that allow docker to mount storage inside the container.
Although by design, container storage is temporary and doesn’t last beyond the container’s life cycle, docker offers several methods which can be used to store application data in a persistent manner.
Posts in this Series
- How to push Docker Images to AWS ECR(Elastic Container registry)?
- How to Copy Docker images from one host to another host?
- What is persistent storage in docker and how to manage it
- Docker Installation on MacOS, Linux and Windows
- Docker - ADD, Update, Export Environment variable
- How to fix-Docker docker failed to compute cache key not found
- How to fix docker driver failed programming external connectivity on endpoint webserver?
- How to fix docker error executable file not found in $PATH?
- How to expose port on live containers?
- How to expose multiple ports with Docker?
- How to restart single docker container within multiple docker container?
- How to edit file within Docker container or edit a file after I shell into a Docker container?
- How to fix Error starting docker service Unit not found?
- How to remove old, unused images of Docker?
- How to fix docker error invalid reference format error?
- How to fix requested access to the resource is denied?
- How to fix Docker error cannot delete docker container conflict unable to remove repository reference?
- How to fix docker error no space left on device?
- How to connect localhost from docker container?
- Docker COPY vs Docker ADD?
- 6 Ways to fix - Got permission denied while trying to connect to the Docker daemon socket?
- 6 Ways to fix – Docker COPY failed: stat no source files were specified