TLDR: is a blog covering technical data science topics (and more). It will evolve as I do.

The What

This project has been on my mind for the last couple years, and I’ve decided to make it real. Through, I hope to share my exploration into topics such as data science, statistics, machine learning, mathematics, film photography, and whatever else I run into on the road my dopamine hungry brain is careening down.

I believe in being a generalist over a specialist, not because it’s been rigorously proven to be better, but mostly because I couldn’t get myself to focus on one thing!
—Me, the author of, Nakai (AKA some random person on the internet)

The Why

I also hope that will motivate me to focus and finish some of my 100 different half-baked projects and share them with the world. A success rate of 5% would still net me 5 finished projects. 5 divided by 100 is 0.05, or 5%. A first lesson in statistics. Woo hoo! If those projects could also teach you something or inspire you, that would be nice too.

Perhaps a multidisciplinary blog isn’t algorithmically friendly, but since when did we learn how to flip on a light switch from a textbook on quantum electrodynamics? And if you find a light switch that can instantly download an understanding of quantum electrodynamics into my brain, please shoot me an email. The textbook explains why it works, and this video explains how to do it. But if you really want to understand it to the fullest level of detail, you’ll probably need to spend a few lifetimes of studying:

  • human psychology (why flip it in the first place?)
  • human physiology (how does my finger move?)
  • quantum field theory (how do electromagnetic fields work?)
  • electrical engineering (how does technology harness EM fields?)
  • etc…

We don’t need all of that to properly execute a light switch flip, and we don’t need to have 300 years of experience and 7 PhD’s to learn data science.

But, we do need to know just enough why and just enough how before we can do.

The How

So, what would a typical post look like on I don’t know yet! (Perhaps you’re looking at one). I’m willing to let evolve as I learn and change. But as of right now, my first vision is to explain statistical and machine learning techniques as simply as possible, but no simpler. I will focus on the why and the how, the practical execution (in R) and the reasons behind it.

I find too many great R tutorials online teaching me a fancy new machine learning technique that leave me wondering, but why? What assumptions am I making when using linear regression to draw an association between two variables? When is a random forest model a good choice for a classification problem? I hope to address questions like this in the future.

I’m also plan on providing links to resources to take you further into the theory or whatever I think might be relevant or interesting.

The Who

Not the band, the who behind Why should you trust some random person on the internet? Honestly, you probably shouldn’t. I’m not an expert, and I’ll probably never claim to be, but I’m motivated to learn and share what I’ve learned with the hope that I’ve helped you somehow. I’m not afraid to be wrong, but I do have a healthy fear of steering someone the wrong way. I’m going to do my best to present the best, and I will accept no less. Okay, terrible Dr. Seuss impressions aside, here’s some statistics about the author of

I have an AAS in Cybersecurity, and I’m pursuing my BS in Mathematics with a Concentration in Applied Statistics and Analytics from Kennesaw State University in Georgia, USA. I do not own a cat.

More Stats about the Author

stats_about_nakai_df <- data.frame(favorite_numbers = c(0^0,sqrt(2),1/3,3,
                                   times_moved = 1:13)

ggplot(stats_about_nakai_df, aes(x = times_moved, 
                                 y = favorite_numbers)) +
  geom_point() +
  ggtitle("Times Moved over Favorite Numbers")

Hmm, boring! Let’s see what happens if I plot the log of my favorite numbers over the times I’ve moved. Let’s also throw a linear regression line in there for fun.

stats_about_nakai_df <- stats_about_nakai_df %>%
  mutate(log_favorite_numbers = log(favorite_numbers))

ggplot(stats_about_nakai_df, aes(x = times_moved, y = log_favorite_numbers)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ggtitle("Times Moved over log of Favorite Numbers")

Boom. Every time I move to a new place, the higher my favorite numbers go. Clear causation.*

*This is actually wrong.

A real lesson in linear regression is coming soon!…