Sign in

Senior Data Analyst at a Fortune 250. Passionate about developing data-driven apps and automated data solutions. My resume site is at www.thepythongeek.com.

With the internet at our fingertips there is an endless stream of resources that we can utilize to better our craft. To a beginner, particularly one that is taking on the daunting task of starting a self-teaching curriculum, it can be quite confusing on where to start. There are a lot of great resources, including video courses on platforms like Udemy, Coursera and others. Online books, PDFs and platforms like O’Reilly are also fantastic. But sometimes, a good old-fashioned physical copy of a book can be the best resource.

Below are four fantastic books to own for the aspiring data…


In data science and machine learning we often need to know the relationship between variables, or in this case, the relationship between numerical variables. Concepts like covariance, correlation and R-squared are useful but they are also interrelated. Knowing how they derived as well as knowing intuitively what the results of their calculations mean to a given problem can shed some light on the meaning, differences and relatedness of these statistical measures. We will also use python to dig into a dataset, perform some calculations and plot some results.

Review

Let’s review some statistical measures common in descriptive statistics that will help…


Great blog! Question about the first bubble plot. It says "Next, we are going to produce a bubble plot that represents the data points of the ‘Income’, ‘Years Employed’, and ‘Age’ attributes. "

I don't see age represented in the plot. Looks like its DebtIncomeRatio instead. Maybe I am missing something.


So I just discovered adventofcode.com and I am really digging the challenges. I did the first two of this month without much issue but the 3rd day was a bit of a challenge for me. So this blog will break down my solution. I finally got the answer after a couple of attempts.

The Problem

The title of this challenge is the “Toboggan Trajectory”. You are given text made up of characters “.” which are clear spaces, and “#” which are trees. Here is part of the text provided.

..#.#...#.#.#.##.....###.#....#
...........##.#...#.#..........
....#.....#..#.............#...
.#....###..##...#...##...#.#..#
#.......#.........#..#.......#.
...#.##..##...#.#......#.##.#..
#.#..##.....#.....#..##........
...#.####...#.##...#...........
.#...#..#..#....#.#.#.#.##.....
##.#..#.##..#......#..##.#.#..#
.#.##.....#.#...............#.#
..##.#.....#.....##..##.#....#.
#..#..........#...##........#..
#..##.#.#...............#..#...
..#....#...#.......#.......#...
.........#.#.##.#........#…

I find that I work with a lot of databases across several flavors such as postgres, MS SQL, Hive and so on. There are probably many ways of organizing your connections, but in my experience working in some python projects, the below has worked for me particularly when purely ingesting data and working with pandas.

Connection Class

Below is a class called SQLConnection. It was setup during a project in which I was needing to hit a hive database and 2 postgres databases. The postgres databases are named according to their need in the project (HR=Human Resources and FIN=Finance).

This class depends…


So what did we do in part 2?

  • We created a backend service
  • The service can initially build our data file
  • The service can provide ongoing updates on a schedule

In this part we are going to build out the Plotly Dash app that consumes the data file created in part 2. This will be a super simple app used to show how quickly you can spin up something usable in plotly.

Plotly Framework

Plotly Dash for python has been around for sometime now and is popular for a myriad of good reasons. It’s used quite frequently in data analytics and machine…


So what did we do in part 1?

  • Created a basic project structure
  • Created a docker compose file with 2 services (app, backend)
  • Created a simple backend main.py script
  • Created a simple Plotly Dash app
  • Ran our project and where blown away by the “hello world” text

In this part we are going to build out the backend so that we can do the following:

  • Download John Hopkins data
  • Filter down on the data based on our requirements
  • Save the data into a csv file
  • Run on a nightly schedule to get updated data

Main.py

Let’s update our main.py file from…


The purpose of this blog is to present to you how you can build a simple dash app pretty quickly. Secondly, I would like to show how easy it is to set up a basic data ingestion job that we can put on a schedule. And the third purpose is to show just how easy and awesome containerization is as well as setting up a network of containers with docker compose is incredibly easy.

The repo for this blog can be found at https://github.com/bvmcode/plotly-dash-covid-blog

This Project

We are going to collect COVID-19 data from the John Hopkins University COVID-19 repo. Since I…

The Python Geek

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store