TLDR: kai.do is a blog covering technical data science topics (and more). It will evolve as I do.

The What

This project has been on my mind for the last couple years, and I’ve decided to make it real. Through kai.do, I hope to share my exploration into topics such as data science, statistics, machine learning, mathematics, and whatever else I run into down the road my dopamine hungry brain is careening down. I also want to use it as a platform to showcase some of my more artistic ventures, like film photography.

I believe in being a generalist over a specialist, not because it’s been rigorously proven to be better, but mostly because I couldn’t get myself to stick to one thing!
—Me, the author of kai.do, Nakai (AKA some random person on the internet)

The Why

I also hope that kai.do will motivate me to narrow down and finish some of my 100 different half-baked projects and share them with the world. A completion rate of 5% would still net me 5 finished projects. 5 divided by 100 is 0.05, or 5%. A first lesson in statistics. Woo hoo! If those projects could also teach you something or inspire you, that would be nice too.

A Word on Multidisciplinarity:

Perhaps a multidisciplinary blog isn’t algorithmically friendly, but real life problems can hardly be classified into one subject. It’s valuable to broaden our perspective and consider problems from many different angles, however, too much perspective can cause us to lose our focus or focus on unimportant details. Consider the following extreme example:

Imagine you’ve been tasked with finding a box in a pitch black, cavernous warehouse. What’s the first problem? You really need some light. You see a light switch to the right, but you stop for a moment and contemplate why you even want to find the box in the first place, and what, for that matter, is contemplation anyway? You’d need to brush up on some neuroscience and psychology to try to explain that one. You push that thought aside, but before you can flip the switch, how can you be sure that the light will behave in a way that helps you see? Maybe you need to know how light itself works? You pull out your phone to look up information on light. You stumble upon Richard Feynman’s book “QED: The Strange Theory of Light and Matter” on Amazon and download the kindle version. (I’m trying to work my way through it, and it’s worth looking into if you’re curious). After an hour of reading, you know a lot more about how light works, but the room’s still dark, and your supervisor is wondering where the box is at. This problem had a simple solution. All you had to do was flip the switch.

Some of this info might be useful for engineering a light switch, studying quantum field theories, or optimizing switch placement, but it’s certainly too much detail if we just want to properly execute a light switch flip. (You might even argue that this entire analogy included too much detail for the point I’m trying to convey. Wow, super meta.)

Data science problems might need a bit more perspective that what it takes to flip a light switch, but they don’t need lifetimes of experience in every subject to solve them.

We need to know just enough why and just enough how before we can do.

The How

So, what would a typical post look like on kai.do? I don’t know yet! (Perhaps you’re looking at one). I’m willing to let kai.do evolve as I learn and change. But as of right now, my first vision is to explain statistical and machine learning techniques as simply as possible, but no simpler. I will focus on the why and the how, the practical execution (in R) and the reasons behind it.

I find too many great R tutorials online teaching me a fancy new machine learning technique that leave me wondering, but why? What assumptions am I making when using linear regression to draw an association between two variables? When is a random forest model a good choice for a classification problem? I hope to address questions like these in the future.

I also plan on providing links to resources to take you further into the theory or whatever I think might be relevant or interesting.

The Who

Not the band, the who behind kai.do. Why should you trust some random person on the internet? Honestly, you probably shouldn’t. I’m not an expert, and I’ll probably never claim to be, but I’m motivated to learn and share what I’ve learned with the hope that I’ll learn something in the process and help someone (maybe even you). I’m not afraid to be wrong, but I do have a healthy fear of steering someone the wrong way. I’m going to do my best to present the best, and I will accept no less. Okay, terrible Dr. Seuss impressions aside, here’s some statistics about the author of kai.do:

I have an AAS in Cybersecurity, and I’m pursuing my BS in Mathematics with a Concentration in Applied Statistics and Analytics from Kennesaw State University in Georgia, USA. I do not own a cat.

More Stats about the Author

stats_about_nakai_df <- data.frame(favorite_numbers = c(0^0,sqrt(2),1/3,3,
                                                        7,21,117,121,343,
                                                        678,3711,9696,65536),
                                   times_moved = 1:13)

ggplot(stats_about_nakai_df, aes(x = times_moved, 
                                 y = favorite_numbers)) +
  geom_point() +
  ggtitle("Favorite Numbers over Times Moved")

Hmm, boring! Let’s see what happens if I plot the log of my favorite numbers over the times I’ve moved. Let’s also throw a linear regression line in there for fun.

stats_about_nakai_df <- stats_about_nakai_df %>%
  mutate(log_favorite_numbers = log(favorite_numbers))

ggplot(stats_about_nakai_df, aes(x = times_moved, y = log_favorite_numbers)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ggtitle("log of Favorite Numbers over Times Moved")

Boom. Every time I move to a new place, I prefer larger numbers. Clear causation.*

*This is actually wrong.

A real lesson in linear regression is coming soon!…