What is persistent storage in docker and how to manage it

When working with containers, one major point of concern is the persistent storage of data. Container storage is ephemeral, meaning that the data written by the containerized application is not preserved if the container is removed. Containerized applications work on the assumption that they always start with empty storage.

Container Image layers

By design, container images are immutable and layered. The running container processes use an immutable view of the container image. This allows us to launch multiple containers which can reuse the same image simultaneously. The images are composed of several layers that add or override the contents of layers below them.

A running container gets a new layer over its base container image, and this layer is the container storage. At first, this layer is the only read/write storage available for the container, and it is used to create all the files required by the container, such as logs and temporary files. These files are volatile. The containerized application does not stop working if they are lost. The container storage layer is exclusive to the running container, so if another container is created from the same base image, it gets another read/write layer. This ensures that each container's resources are isolated from other containers launched using the same image.

Most applications require data to be stored persistently and the default transient nature of container storage is inadequate in such cases. For instance, stateful applications such as databases, require permanent storage as they need to maintain data over restarts.

Persistent storage options for Docker

Docker offers the following options to store data persistently. We’ll explain each approach with examples.

  1. Directory mounts (also called Bind mounts)
  2. Named volumes
  3. Data containers (deprecated)
  4. Cloud storage
  5. Storage plugins
  6. Conclusion


Prerequisites

Any Linux distribution with docker installation will work for the examples shown in this tutorial. Internet access will be required for downloading container images. For the cloud storage portion, access to an AWS account is required.

1. Directory mounts

Directory mounts are also called bind mounts. Docker can mount any directory on the host inside a running container. The containerized application will see this host directory as part of the container storage. When the container stops or is deleted, the contents of these host directories are not reclaimed, and they can be mounted to new containers whenever needed.

For instance, a database container can use a host directory to store database files. If this database the container gets deleted, we can create a new container and use the same host directory, keeping the database data available to client applications. To the database container, it does not matter where this host directory is stored from the host point of view; it could be anything from a local hard disk partition to a remote networked file system.

Example

Let’s see this in action. We’re going to create a MySQL container named “db1”, and mount the /data directory on the host inside the container on /var/lib/mysql.

1docker run --name db1 -d -v /data:/var/lib/mysql -e MYSQL_USER=user1 -e MYSQL_PASSWORD=password -e MYSQL_DATABASE=tempdb -e MYSQL_ROOT_PASSWORD=password mysql

Docker create mysql container with directory mount

Now that the “db1” container is up, let’s populate our database with some dummy data.

1docker exec -it db1  /usr/bin/mysql -u user1 tempdb -ppassword -e "CREATE TABLE dummy (id int(10) NOT NULL, name varchar(255) DEFAULT NULL, code varchar(255) DEFAULT NULL, PRIMARY KEY (id));" -e "insert into dummy (id, name, code) values (1,'John','XYZ');"

Docker container populate our database with some dummy data

1docker exec -it db1  /usr/bin/mysql -u user1 tempdb -ppassword -e "select * from dummy"

After verifying that the database has been populated with dummy data, we’ll now stop and delete the “db1” container.

1docker stop db1; docker rm db1

Docker stop mysql container mounted on directory mount

We’re now going to create a second container, “db2” using the same mysql image, and mount the /data directory on the host inside the container on /var/lib/mysql.

1docker run --name db2 -d -v /data:/var/lib/mysql -e MYSQL_USER=user1 -e MYSQL_PASSWORD=password -e MYSQL_DATABASE=tempdb -e MYSQL_ROOT_PASSWORD=password mysql

Docker run db2 mysql contianer with same directory mount location

Let’s see if our database is still populated with the dummy data.

1docker exec -it db2  /usr/bin/mysql -u user1 tempdb -ppassword -e "select * from dummy"

Docker run db2 mysql contianer with same directory mount location

As you can see, the data is still intact. This is because it was saved on the /data directory on the host and not inside the container.



2. Named volumes

Named volumes are the recommended approach for creating persistent storage for containers. These volumes provide a mechanism for saving data outside the life cycle of a container. These volumes can also be mounted in more than one container. This allows for sharing of data and connecting new containers to existing storage.

Docker named volumes work by creating a directory on the host machine and then mounting that directory into a container. This approach may sound similar to the directory mount method described above, but there are a few major differences. When using named volumes, a new directory is created within Docker's storage directory on the host machine (usually inside /var/lib/docker/volumes/). The content of this storage directory is managed by Docker itself. It is independent of the directory structure of the host machine, so it is extremely unlikely that another process can use this directory for writing any data. This offers an extra layer of security for containerized applications. There is also the added advantage that named volumes can be managed through docker commands.

Example Let’s create a volume named “datavolume” and see it’s details.

1docker volume create datavolume
2docker volume ls
3docker volume inspect datavolume

Docker named volume - Create volume, list volume and inspect volume

It can be seen that docker creates volumes inside the /var/lib/docker/volumes directory on the host.

Let’s use the same example of mysql database to implement the concept of named volumes. Create a mysql container “db3”, and mount the “datavol” volume on the /var/lib/mysql directory inside the container.

1docker run --name db3 -d -v datavol:/var/lib/mysql -e MYSQL_USER=user1 -e MYSQL_PASSWORD=password -e MYSQL_DATABASE=datavoldb -e MYSQL_ROOT_PASSWORD=password mysql

Docker run mysql container mounted on named volume datavol:/var/lib/mysql

Populate the database with some data.

1docker exec -it db3  /usr/bin/mysql -u user1 datavoldb -ppassword -e "CREATE TABLE dummy (id int(10) NOT NULL, name varchar(255) DEFAULT NULL, PRIMARY KEY (id));" -e "insert into dummy (id, name) values (1,'Docker Volumes');"

Docker populate database on mysql containe rmounted on named volume datavol:/var/lib/mysql

1docker exec -it db3  /usr/bin/mysql -u user1 datavoldb -ppassword -e "select * from dummy"

Stop and delete the “db3” container.

1docker stop db3; docker rm db3

Docker populate database on mysql containe rmounted on named volume datavol:/var/lib/mysql

We’ll now launch a new container, “db4” and attach the “datavol” volume to it. Once the container has started, verify that the data created earlier is still preserved in the database.

1docker run --name db4 -d -v datavol:/var/lib/mysql -e MYSQL_USER=user1 -e MYSQL_PASSWORD=password -e MYSQL_DATABASE=datavoldb -e MYSQL_ROOT_PASSWORD=password mysql
1docker exec -it db4  /usr/bin/mysql -u user1 datavoldb -ppassword -e "select * from dummy"

Docker launch new container using the same named volumed

As evident from above, data is still present in the database as it was stored on the named volume.



3. Data Containers

There’s another approach for saving data persistently, called data containers. This approach has been deprecated in favor of docker named volumes explained above. Nevertheless, this was used in the earlier days of docker when volumes were not available. This method involves using a dedicated container for storing data. This container doesn’t need to run on the host, it merely needs to exist. The sole job of this container is to save data. The containers running the actual application don’t need to know where the data is located on the disk, they only require the name of the data container. The major advantage of this approach was that access to application data was managed by docker and any other process was less likely to affect the data.

Example

Let’s see this through an example. We’ll first create a data container and specify a volume option. This is where our application containers will be writing data. We can use any container as a data container. This example uses alpine image as it is very lightweight in size.

1docker create -v /var/www/html --name datacontainer alpine

Docker container create volume using datacontainer

Create a sample file and copy it to the data container.

1echo "Data Container!" > index.html
2
3docker cp index.html datacontainer:/var/www/html

Docker container create file and put inside the datacontainer datacontainer:/var/www/html

Now, let’s launch an httpd container using the “--volumes-from” option. This option will ensure that the httpd container uses the volumes from the alpine data container we created earlier.

1docker run --name http -d --volumes-from datacontainer httpd
2
3docker exec -it http bash
4
5cat /var/www/html/index.html

Run docker container with --volumes-from to use datacontainer

As can be seen, the http container can see data present on the alpine data container.



4. Cloud storage

There are many cloud providers available who offer different file, block and object based storage solutions. Docker can also work with those storage solutions. Using cloud based storage options offers another method of backing up your application data when running containers. For instance, Amazon offers an object storage based solution called S3. We can mount an S3 bucket created in our AWS account on the host system. That directory can then be mounted inside a container similar to the bind mount approach discussed earlier. One advantage of this approach is that the user can implement policies in their AWS account to allow or limit access at bucket level. This adds another layer of flexibility to the docker platform which allows it to share common data between any number of containers.

Example

We’ll use Amazon’s S3 example to implement this. We have the following S3 bucket, “aws-s3-bucket-docker” in our account which contains a sample “index.html” file. You’ll need to create an AWS S3 bucket for this example. You can follow this link to create an S3 bucket in AWS.

We’re first going to mount this bucket in our host OS. Since AWS S3 is object-based storage, it cannot be mounted locally using traditional file system structures. We’ll need to mount it using the FUSE file system. We’ll be using the s3fs interface which will allow us to mount an Amazon S3 bucket as a local file system. First, install the s3fs package as follows. For CentOS/RHEL bases systems, this package can be installed using the EPEL repository.

1sudo apt-get  install s3fs

docker cloud storage on AWS

You’ll need the access ID and secret key for the AWS account. You can follow this link for creating a user account and access key. Store your access ID and secret key in a file. You can restrict permissions on this file as this contains sensitive information. The access and secret key should be added in the following format:

ACCESS_KEY_ID:SECRET_ACCESS_KEY

Access key and secret key for cloud storage

Mount the S3 bucket as follows. Make sure to specify the correct name and path of the file which contains access ID and secret key. Note that the file system type is fuse.

1s3fs aws-s3-bucket-docker /s3 -o passwd_file=.s3fs-access,nonempty

Docker mount s3 bucket using s3fs

Once the bucket has been mounted on the host, it can be mounted inside the container just like a normal directory mount.

Docker mount s3 bucket as container



5. Storage Plugins

Over the years, several enhancements have been made in the docker API which has made it possible to connect it with external storage platforms. These storage plugins extend the basic storage capabilities of docker and allow it to integrate with the latest storage technologies. There are several plugins available for different cloud and enterprise storage platforms that allow docker to mount storage inside the container.

6. Conclusion

Although by design, container storage is temporary and doesn’t last beyond the container’s life cycle, docker offers several methods which can be used to store application data in a persistent manner.

Posts in this Series