My Coding Marathon


A journey of my training to be a coder

NYC Restaurant Yelp and Inspection Analysis

Background


Hyperparameter Tuning For Random Forest

In this blog post I will discuss how to do hyperparamter tuning for a classification model, specifically for the Random Forest model. So what exactly is hyperparameter tuning? In Machine Learning, a hyperparameter is a paramater that can be set prior to the beginning of the learning process. Different classification methods have different hyperparamters. We are going to be focusing on Random Forest Classification, which is an ensemble method for decision trees that both trains trees on different samples of data (bagging) and randomly selects a subset of features to use as predictors (subspace sampling method) to create a ‘forest’ of decision trees. This method typically gives better predictions than any single decision tree would give. This image provides a visual representation of how Random Forests work.


Decision Tree Classification Guide

Decision trees are a Supervised Machine Learning algorithm used to classify data. Decision Trees are very popular because they are effective and easy to visualize and interpret. Essentially, a decision tree is a flowchart where each level represents a different yes/no question. There are 3 different parts to a tree:

  1. Internal Nodes - each represents a ‘test’ on an attribute (i.e. does a coin flip end up heads or tail)
  2. Edges/Branches - the outcome of the test
  3. Leaf Nodes - predicts outcome by classifying data (this is a terminal node meaning there are no further nodes/branches)

ARIMA Models For Time Series

Time series datasets have the progress of time as a main dimension in the data. In other words, we are recording data over set intervals of time. One of the benefits of having time series data is that you can use it to forecast future values. In this post, I will be walking through how to forecast a time series dataset using an ARIMA model. The data I am working with includes average house prices per county in the US over a 22 year period (from Zillow).


Hypothesis Testing

Performing statistical analyses is an important part of many business decisions. It allows the business to have supporting evidence and confidence in the decisions that it makes. Hypothesis testing is one helpful way to perform statistical analyses, that is both simple to perform and reliable. Through this post, I will explain what a hypothesis test is and will use an example to show how to perform one. Hopefully, this will provide some guidance for anyone hoping to use hypothesis testing to inform business decisions.