The purpose of this blog is to present to you how you can build a simple dash app pretty quickly. Secondly, I would like to show how easy it is to set up a basic data ingestion job that we can put on a schedule. And the third purpose is to show just how easy and awesome containerization is as well as setting up a network of containers with docker compose is incredibly easy.
The repo for this blog can be found at https://github.com/bvmcode/plotly-dash-covid-blog
We are going to collect COVID-19 data from the John Hopkins University COVID-19 repo. Since I live in NJ, I am specifically going after NJ data. Below is a overall picture of the data we want:
- csv files found here in the John Hopkins repo — csv added each night for that day
- New Jersey data only
- Only these fields — Confirmed, Deaths, Recovered, Active
- June 1st 2020 onward
Since the John Hopkins data is collected and summarized for each day (a csv added each night), we need a way to collect the data, reduce it to what we need and then store it. We also need to do this as an initial start up that loads the data from June 1st till now, and then set up a nightly schedule to only add the previous day’s data.
Once we have this data, and we can trust it will be updated automatically each night, we can build our Plotly Dash app.
This blog will be done in 3 parts:
- Part 1 — creating a minimal working project using docker-compose
- Part 2 — development of the backend data job
- Part 3 — development of the Ploty Dash app
Before we move on to developing our backend process of collecting data, or developing our simple plotly app, let’s set up our project hierarchy. Quite simply, we have an app and a backend. The app has a Dockerfile, an app.py file (the plotly dash app itself) and a requirements.txt file.
The backend is a bit more complicated. It has a main.py file that governs the schedule and the running of the data job. It has a directory called files which holds a single covid.csv file. In the helpers directory we have one file called datapull.py which does the data download, transformation of the data and the saving of the data.
Docker is such a powerful development and production tool and Docker Compose just adds additional benefits that allows us to quickly build a network of containers in a project. In my opinion its good to get into the habit of using Docker from the very start of your projects.
Below is the docker-compose file for this project. We have a service called scheduler that stems from the code in the ‘backend’ directory. The METHOD environmental variable will be covered in part 2 but it basically governs whether we are initiating the full run of a data pull or just the nightly refresh. The first volume is to tie the host clock to that of the container. The second volume allows us to have some persistence on the host for logs. The third volume allow us to share the data with the app service.
The second service below is the app service. This is the Plotly Dash app. It depends on the scheduler service (since the scheduler service has the data). Also notice that we are serving on port 5252, and the volume is attached to the app directory so that changes in development can be quickly seen in the browser. We also attach to the host clock here as well. The covid volume is a tie to the backend service for data access. This will automatically generate a files directory in ./app on the host.
Below is the Dockerfile belonging to the scheduler service (./backend/Dockerfile). We are using ubuntu, installing python and the requirements, and executing a file called main.py
The requirements.txt file has the following libraries.
- pandas — allows us to download and manipulate data
- apscheduler (Advanced Python Scheduler)— allows us to schedule our job
To get our project at least working initially, let’s set up a basic main.py file. Here we set up a basic schedule of printing “hello world” at 3am central time. If you are not familiar with apscheduler I recommend you check out the docs at https://apscheduler.readthedocs.io/en/latest/.
Below is the Dockerfile belonging to the app service (./app/Dockerfile). This Dockerfile is essentially the same as for the backend service.
The requirements.txt file references needed dash installs as well as numpy and pandas for working with data.
And now let’s create a simple hello world Plotly Dash app. Here we are creating an app instance by instantiating the Dash class. We are creating a div element with the text “hello world” inside the layout of the app. Upon execution of app.py, we are running the app via the run_server method where we have debug set to True. When we have debug set to True we get to see error stack trace in the browser. This is of course undesirable in production but useful during development. We also have the host set to 0.0.0.0 which is the machine we are developing on, as opposed to localhost.
Give it a Spin
So now that we have our very minimal project set up, let’s make sure it works. Right now the backend service is really doing nothing. It’s just printing “hello world” at 3am every day. The app service on the other hand will allow us to see “hello world” in the browser. Let’s build and run.
docker-compose up --build
If you wait till 3am you’ll see “hello world” in the console (though I don’t advise staying up for that). You can also visit http://<host>:5252 to see your beautiful dash app say “hello world” in the browser.
In part 2 of this blog we will write the code for the backend service where we will download data, filter it and save it. We will also set up a schedule for this to occur nightly.