Experimental data Today I want to talk about two-sample hypothesis testing, or A/B testing in data science parlance. This topic has been on my mind a lot as of late because I’m planning on leaving academia for data science, and that means I may also be transitioning from analyzing survey data to running experiments on unstructured, “big” data. Because academia and business have different goals and resources available to them, the methods they use are also different, even when approaching a similar problem.

Yes, this is yet another post about using the open source Titanic dataset to predict whether someone would live or die. At this point, there’s not much new I (or anyone) can add to accuracy in predicting survival on the Titanic, so I’m going to focus on using this as an opportunity to explore a couple of R packages and teach myself some new machine learning techniques. I will be doing this over two blog posts.

