With the internet at our fingertips there is an endless stream of resources that we can utilize to better our craft. To a beginner, particularly one that is taking on the daunting task of starting a self-teaching curriculum, it can be quite confusing on where to start. There are a lot of great resources, including video courses on platforms like Udemy, Coursera and others. Online books, PDFs and platforms like O’Reilly are also fantastic. But sometimes, a good old-fashioned physical copy of a book can be the best resource.
Below are four fantastic books to own for the aspiring data analyst. These are also very valuable resources for anyone in the business already. And just to note, this blog also provides some insight into the progression of my career. I hope that is at least mildly interesting.
The Language of SQL
Nope, you are right. It’s not a python book. But, SQL is an absolute must for an analyst. Around 2013 or so I started my journey into the world of analytics. At the time I had been developing dashboards and reports in SAP Crystal Reports. No knock against Crystal Reports, but it was less than ideal. I was in the muck of creating many messy formulas to mold the data to the level of aggregation needed with all the user-required calculations and metrics. And these formulas were written in Crystal’s syntax and always ended up being a web of dependencies that were hard to troubleshoot.
Then I discovered SQL. I read a blog at the time that I could just write these things called SQL scripts and Crystal would pass that onto the server and I would get back the data at the required grain with all the metrics and calculations already done. No more messy formulas or playing with table joins in Crystal’s GUI. I would just need to be skilled enough to write these scripts. That’s really what started the journey into SQL. I searched around on Amazon and I found the first edition of the The Language of SQL by Larry Rockoff.
Its a short book that could be read in a week or so. It holds your hand through the SQL clauses, joins and aggregation. It also covers more advanced topics like subqueries, functions and stored procedures. It even covers some theoretical topics like normalization.
The beauty of this book is that its short and to the point. It covers everything that a beginner needs. If you have a database to practice against you will come out of reading this book being ready to tackle some analyst problems right out of the gate.
I remember the day, several years ago, when I thought it would be nice to automate a task. The task was sending a PDF’d Excel dashboards to about 30 different parties and the PDF would have to show the specific party’s data only. At the time I found VBA to be unreliable and glitchy and the last thing we would have wanted was to have the process break or hear that we sent out the wrong data.
And so here comes Python. I searched awhile for a resource that would be truly beginner-friendly while also being broad enough so that I could branch out and try other projects. I bought the first edition of Introducing Python by Bill Lubanovic.
This covers the entirety of the basics such as data types, data structures, loops, functions and classes. The author gives practical examples and mixes in the right amount of computer science knowledge and how the python interpreter actually works. I appreciate the author’s style which is witty and sometimes humorous. What I really liked about the book is that it went into intermediate topics that other books don’t, like practical examples with the standard library, properly handling dates and times and even some dipping of the toes with web topics like WSGI for example. Truly a well-rounded gem.
Automate the Boring Stuff with Python
While I was reading Introducing Python I was also reading several chapters of this very well-known book, Automate the Boring Stuff with Python by Al Sweigert.
When you are new to coding and you realize you can automate the most mundane tasks, it is absolutely one of the most exhilarating feelings. I was able to pick up many tips from the book for tasks such as sending emails, creating PDFs, and interaction with software such as MS Excel. I have to be honest, I did not read this entire book but it was a fantastic resource for several years as I automated the hell out of my workspace.
Python for Data Analysis
As I continued my journey in analytics I was pretty comfortable coding in python at that point. So I set my eyes towards more advanced uses of python in the realm of analytics. Using Jupyter and getting exploratory insights from data was where I wanted to head to next in my career. It was frustrating at the start. I tried to quickly apply some cursory pandas knowledge I picked up online and it didn’t go well. I quickly realized pandas, numpy and matplotlib are almost like languages within themselves and I needed to be more formally educated on them. That is where this book Python for Data Analysis by Wes McKinney comes in.
This book gets you started with Jupyter notebooks and Ipython. It covers numpy and pandas in great depth, which at the time I had spent at least a couple months of hard study and practice trying the master them, with greater focus on pandas. For an aspiring analyst, the chapters on numpy, pandas and visualization will give you a great start. And practice is key. Finding datasets on the web, or at your place of employment to experiment with is crucial in the learning process.
This book goes into great depth to give you all the necessary tools to load, clean and visualize datasets. In addition to these solid chapters on the core of data analysis in python, it also has a chapter on time series analysis that is a great start in that very complex area of study.
Bonus Book —
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
So I decided to add this book by Aurelien Geron to the list as a bonus just because of how darn good it is. I will be honest, I have not finished it. But if you are a data scientist or an aspiring data scientist, particularly one who favors python over R then this book is a must have. The four books I mentioned above are great prerequisites for anyone that needs to write code and get data. A review of college math is also a good idea as well.
This book provides the coding depths to get started in modeling. It also provides solid theoretical and mathematical explanations that could either be good enough for some or a starting point for further theoretical research into a given topic on your own. The chapter on classification for example, defines performance measures like precision, recall and the ROC curve both mathematically and intuitively with great visuals and examples. He provides amazing insight into how the models work under the hood and the exploratory insight that leads up to developing a model. Just an amazingly thorough and well-written book.
These are fantastic resources for anyone starting out or anyone in the business already that needs a resource handy. This is good foundation in your career and allows you to branch out more easily to other things such as developing data-intensive web apps. I hope you enjoy these resources.