Titanic Dataset

Notebook from the Titanic - Machine Learning from Disaster challenge on Kaggle.

Titanic - Machine Learning from Disaster

  • Use machine learning to create a model that predicts which passengers survived the Titanic shipwreck

    In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Submission File Format

You should submit a csv file with exactly 418 entries plus a header row. Your submission will show an error if you have extra columns (beyond PassengerId and Survived) or rows.

  • Two columns should be included in submission:
    • PassengerId (in any order)
    • Survived (contains binary predictions, 1 for survived, 0 deceased)

Variable Notes

  • pclass A proxy for socio-economic status (1st = upper)
  • sibsp The number of siblings + number of spouses
import pandas as pd 
import numpy as np 
import os 
from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer, KNNImputer
from sklearn.svm import LinearSVC, SVC
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier 
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_score, train_test_split, RandomizedSearchCV

def get_path_to_data_folder():
    return os.path.join(os.getcwd(),"data","titanic")
def get_test():
    df = pd.read_csv(os.path.join(get_path_to_data_folder(),"test.csv"))
    df = df.set_index("PassengerId")
    return df
def get_train():
    df = pd.read_csv(os.path.join(get_path_to_data_folder(),"train.csv"))
    df = df.set_index("PassengerId")
    return df
def get_test_survived():
    df = pd.read_csv(os.path.join(get_path_to_data_folder(),"gender_submission.csv"))
    df = df.set_index("PassengerId")
    return df
def get_submission_path():
    return os.path.join(get_path_to_data_folder(),"submission.csv")
train = get_train()
X_pred = get_test()
out[2]
train.describe()    
out[3]

Survived Pclass Age SibSp Parch Fare

count 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000

mean 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208

std 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429

min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400

50% 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200

75% 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000

max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

train.info()
X_pred.info()
out[4]

<class 'pandas.core.frame.DataFrame'>
Index: 891 entries, 1 to 891
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Survived 891 non-null int64
1 Pclass 891 non-null int64
2 Name 891 non-null object
3 Sex 891 non-null object
4 Age 714 non-null float64
5 SibSp 891 non-null int64
6 Parch 891 non-null int64
7 Ticket 891 non-null object
8 Fare 891 non-null float64
9 Cabin 204 non-null object
10 Embarked 889 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 83.5+ KB
<class 'pandas.core.frame.DataFrame'>
Index: 418 entries, 892 to 1309
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pclass 418 non-null int64
1 Name 418 non-null object
2 Sex 418 non-null object
3 Age 332 non-null float64
4 SibSp 418 non-null int64
5 Parch 418 non-null int64
6 Ticket 418 non-null object
7 Fare 417 non-null float64
8 Cabin 91 non-null object
9 Embarked 418 non-null object
dtypes: float64(2), int64(3), object(5)
memory usage: 35.9+ KB

np.abs(train.count()-train.count().max())
out[5]

Survived 0

Pclass 0

Name 0

Sex 0

Age 177

SibSp 0

Parch 0

Ticket 0

Fare 0

Cabin 687

Embarked 2

dtype: int64

  • The Age, Cabin, and Embarked columns have na values that need to be filled
  • The Sex column needs to be processed using OneHotEncoder (1 for male)
train.describe(include=[np.object_])
out[7]

Name Sex Ticket Cabin Embarked

count 891 891 891 204 889

unique 891 2 681 147 3

top Braund, Mr. Owen Harris male 347082 B96 B98 S

freq 1 577 7 4 644

  • Drop the Cabin, Name, and Ticket column since they has so many unique values and embedding doesn't seem like it would help
unique_embarked = train["Embarked"].unique()
unique_sex = train["Sex"].unique()
print("Unique Embarked: ",unique_embarked)
print("Unique Sex: ",unique_sex)
out[9]

Unique Embarked: ['S' 'C' 'Q' nan]
Unique Sex: ['male' 'female']

  • Plot a histogram of how the embarked values affect the survival rate
from collections import Counter
embarked_survived = train[["Embarked","Survived"]]
embarked_survived = embarked_survived.dropna()
embarked_counts = Counter(embarked_survived[embarked_survived["Survived"]==1]["Embarked"])
neg_embarked_counts = Counter(embarked_survived[embarked_survived["Survived"]==0]["Embarked"])
embarked_counts_df = pd.DataFrame.from_dict(embarked_counts, orient='index')
neg_embarked_counts_df = pd.DataFrame.from_dict(neg_embarked_counts,orient="index")
fig, axes = plt.subplots(nrows=1, ncols=2)
embarked_counts_df.sort_index().plot(kind="bar",title="Survived",ax=axes[0],ylim=[0,450],legend=None)
neg_embarked_counts_df.sort_index().plot(kind="bar",title="Died",ax=axes[1],ylim=[0,450],legend=None)
fig.tight_layout(w_pad=50)
out[11]

C:\Users\fmb20\AppData\Local\Temp\ipykernel_11672\2076951307.py:11: UserWarning: Tight layout not applied. tight_layout cannot make axes width small enough to accommodate all axes decorations
fig.tight_layout(w_pad=50)

Jupyter Notebook Image

<Figure size 640x480 with 2 Axes>

  • All else being equal, you are about twice as likely to die if you embarked from "S"
y = train["Survived"]
X = train.drop(columns=["Survived"])
np.random.seed(42)
rand_num = np.random.randint(0,100)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=rand_num)

drop_encode = ColumnTransformer(transformers=[
    ('','drop',['Cabin','Name','Ticket']),
    ('ord',OrdinalEncoder(),['Sex'])
],remainder="passthrough",sparse_threshold=0) 
# Sex is 0, Pclass is 1, Age is 2, SibSp is 3, Parch is 4, Fare is 5, Embarked is 6
impute = ColumnTransformer(transformers=[
    ('imp_age',KNNImputer(),[2]),
    ('imp_fare',KNNImputer(),[5]),
    ('imp_embarked',SimpleImputer(strategy="most_frequent"),[-1])
],remainder="passthrough",sparse_threshold=0)
# Age is 0, Fare is 1, Embarked is 2, Sex is 3, Pclass is 4,  SibSp is 5, Parch is 6 
scale_encode = ColumnTransformer(transformers=[
    ('imp',StandardScaler(),[0,1,4,5,6]),
    ('ohe',OneHotEncoder(),[2])
],remainder="passthrough",sparse_threshold=0)

preperation_pipeline = Pipeline(steps=[
    ("initial",drop_encode),
    ("impute",impute),
    ("scale",scale_encode),
    ("predict",SVC(gamma="auto",C=1,random_state=rand_num))
])

# RandomForestClassifier(n_estimators=100,criterion="gini") Random Forest Classifier: {'predict__n_estimators': 10, 'predict__max_features': 'log2', 'predict__criterion': 'entropy'}
# LinearSVC(max_iter=1000,dual=False) scoe about 75
# SVC(gamma="auto",C=1,random_state=rand_num)) score about 77
# preperation_pipeline.fit(X_train,y_train)

param_dist_svc = {
    "predict__C": np.linspace(0.9,1,20)
}
param_dist_rfclf = {
    "predict__n_estimators": [10,100,200],
    "predict__criterion": ["gini","entropy","log_loss"],
    "predict__max_features": ["sqrt","log2",None]
}
clf = RandomizedSearchCV(preperation_pipeline,param_dist_svc,verbose=3)

search = clf.fit(X_test,y_test)
print(search.best_params_)

print("Average of Cross Val Score: ",np.average(cross_val_score(search,X_train,y_train)))

predictions = search.predict(X_pred)
passenger_ids = X_pred.index

submission = pd.DataFrame({"PassengerId": passenger_ids,"Survived": predictions})
submission = submission.set_index("PassengerId")
submission.to_csv(get_submission_path())
print("Submissions Submitted")

out[13]

Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END ....................predict__C=0.9;, score=0.833 total time= 0.0s
[CV 2/5] END ....................predict__C=0.9;, score=0.778 total time= 0.0s
[CV 3/5] END ....................predict__C=0.9;, score=0.806 total time= 0.0s
[CV 4/5] END ....................predict__C=0.9;, score=0.722 total time= 0.0s
[CV 5/5] END ....................predict__C=0.9;, score=0.800 total time= 0.0s
[CV 1/5] END ......predict__C=0.968421052631579;, score=0.806 total time= 0.0s
[CV 2/5] END ......predict__C=0.968421052631579;, score=0.778 total time= 0.0s
[CV 3/5] END ......predict__C=0.968421052631579;, score=0.806 total time= 0.0s
[CV 4/5] END ......predict__C=0.968421052631579;, score=0.722 total time= 0.0s
[CV 5/5] END ......predict__C=0.968421052631579;, score=0.829 total time= 0.0s
[CV 1/5] END .....predict__C=0.9421052631578948;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9421052631578948;, score=0.778 total time= 0.0s
[CV 3/5] END .....predict__C=0.9421052631578948;, score=0.806 total time= 0.0s
[CV 4/5] END .....predict__C=0.9421052631578948;, score=0.722 total time= 0.0s
[CV 5/5] END .....predict__C=0.9421052631578948;, score=0.829 total time= 0.0s
[CV 1/5] END .....predict__C=0.9052631578947369;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9052631578947369;, score=0.778 total time= 0.0s
[CV 3/5] END .....predict__C=0.9052631578947369;, score=0.806 total time= 0.0s
[CV 4/5] END .....predict__C=0.9052631578947369;, score=0.722 total time= 0.0s
[CV 5/5] END .....predict__C=0.9052631578947369;, score=0.800 total time= 0.0s
[CV 1/5] END .....predict__C=0.9789473684210527;, score=0.806 total time= 0.0s
[CV 2/5] END .....predict__C=0.9789473684210527;, score=0.778 total time= 0.0s
[CV 3/5] END .....predict__C=0.9789473684210527;, score=0.806 total time= 0.0s
[CV 4/5] END .....predict__C=0.9789473684210527;, score=0.722 total time= 0.0s
[CV 5/5] END .....predict__C=0.9789473684210527;, score=0.829 total time= 0.0s
[CV 1/5] END .....predict__C=0.9263157894736842;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9263157894736842;, score=0.778 total time= 0.0s
[CV 3/5] END .....predict__C=0.9263157894736842;, score=0.806 total time= 0.0s
[CV 4/5] END .....predict__C=0.9263157894736842;, score=0.722 total time= 0.0s
[CV 5/5] END .....predict__C=0.9263157894736842;, score=0.829 total time= 0.0s
[CV 1/5] END ....................predict__C=1.0;, score=0.806 total time= 0.0s
[CV 2/5] END ....................predict__C=1.0;, score=0.778 total time= 0.0s
[CV 3/5] END ....................predict__C=1.0;, score=0.806 total time= 0.0s
[CV 4/5] END ....................predict__C=1.0;, score=0.722 total time= 0.0s
[CV 5/5] END ....................predict__C=1.0;, score=0.829 total time= 0.0s
[CV 1/5] END .....predict__C=0.9578947368421052;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9578947368421052;, score=0.778 total time= 0.0s
[CV 3/5] END .....predict__C=0.9578947368421052;, score=0.806 total time= 0.0s
[CV 4/5] END .....predict__C=0.9578947368421052;, score=0.722 total time= 0.0s
[CV 5/5] END .....predict__C=0.9578947368421052;, score=0.829 total time= 0.0s
[CV 1/5] END .....predict__C=0.9157894736842106;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9157894736842106;, score=0.778 total time= 0.0s
[CV 3/5] END .....predict__C=0.9157894736842106;, score=0.806 total time= 0.0s
[CV 4/5] END .....predict__C=0.9157894736842106;, score=0.722 total time= 0.0s
[CV 5/5] END .....predict__C=0.9157894736842106;, score=0.829 total time= 0.0s
[CV 1/5] END .....predict__C=0.9210526315789473;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9210526315789473;, score=0.778 total time= 0.0s
[CV 3/5] END .....predict__C=0.9210526315789473;, score=0.806 total time= 0.0s
[CV 4/5] END .....predict__C=0.9210526315789473;, score=0.722 total time= 0.0s
[CV 5/5] END .....predict__C=0.9210526315789473;, score=0.829 total time= 0.0s
{'predict__C': 0.9421052631578948}
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END ....................predict__C=1.0;, score=0.798 total time= 0.0s
[CV 2/5] END ....................predict__C=1.0;, score=0.851 total time= 0.0s
[CV 3/5] END ....................predict__C=1.0;, score=0.789 total time= 0.0s
[CV 4/5] END ....................predict__C=1.0;, score=0.851 total time= 0.0s
[CV 5/5] END ....................predict__C=1.0;, score=0.823 total time= 0.0s
[CV 1/5] END .....predict__C=0.9842105263157894;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9842105263157894;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9842105263157894;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9842105263157894;, score=0.851 total time= 0.0s
[CV 5/5] END .....predict__C=0.9842105263157894;, score=0.823 total time= 0.0s
[CV 1/5] END .....predict__C=0.9789473684210527;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9789473684210527;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9789473684210527;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9789473684210527;, score=0.851 total time= 0.0s
[CV 5/5] END .....predict__C=0.9789473684210527;, score=0.823 total time= 0.0s
[CV 1/5] END .....predict__C=0.9263157894736842;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9263157894736842;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9263157894736842;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9263157894736842;, score=0.851 total time= 0.0s
[CV 5/5] END .....predict__C=0.9263157894736842;, score=0.823 total time= 0.0s
[CV 1/5] END .....predict__C=0.9210526315789473;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9210526315789473;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9210526315789473;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9210526315789473;, score=0.851 total time= 0.0s
[CV 5/5] END .....predict__C=0.9210526315789473;, score=0.823 total time= 0.0s
[CV 1/5] END .....predict__C=0.9631578947368421;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9631578947368421;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9631578947368421;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9631578947368421;, score=0.851 total time= 0.0s
[CV 5/5] END .....predict__C=0.9631578947368421;, score=0.823 total time= 0.0s
[CV 1/5] END .....predict__C=0.9736842105263158;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9736842105263158;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9736842105263158;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9736842105263158;, score=0.851 total time= 0.0s
[CV 5/5] END .....predict__C=0.9736842105263158;, score=0.823 total time= 0.0s
[CV 1/5] END .....predict__C=0.9368421052631579;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9368421052631579;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9368421052631579;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9368421052631579;, score=0.851 total time= 0.0s
[CV 5/5] END .....predict__C=0.9368421052631579;, score=0.823 total time= 0.0s
[CV 1/5] END .....predict__C=0.9157894736842106;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9157894736842106;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9157894736842106;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9157894736842106;, score=0.851 total time= 0.0s
[CV 5/5] END .....predict__C=0.9157894736842106;, score=0.823 total time= 0.0s
[CV 1/5] END .....predict__C=0.9315789473684211;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9315789473684211;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9315789473684211;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9315789473684211;, score=0.851 total time= 0.0s
[CV 5/5] END .....predict__C=0.9315789473684211;, score=0.823 total time= 0.0s
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END .....predict__C=0.9526315789473684;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9526315789473684;, score=0.877 total time= 0.0s
[CV 3/5] END .....predict__C=0.9526315789473684;, score=0.816 total time= 0.0s
[CV 4/5] END .....predict__C=0.9526315789473684;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9526315789473684;, score=0.832 total time= 0.0s
[CV 1/5] END .....predict__C=0.9368421052631579;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9368421052631579;, score=0.877 total time= 0.0s
[CV 3/5] END .....predict__C=0.9368421052631579;, score=0.816 total time= 0.0s
[CV 4/5] END .....predict__C=0.9368421052631579;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9368421052631579;, score=0.832 total time= 0.0s
[CV 1/5] END ....................predict__C=0.9;, score=0.825 total time= 0.0s
[CV 2/5] END ....................predict__C=0.9;, score=0.877 total time= 0.0s
[CV 3/5] END ....................predict__C=0.9;, score=0.816 total time= 0.0s
[CV 4/5] END ....................predict__C=0.9;, score=0.833 total time= 0.0s
[CV 5/5] END ....................predict__C=0.9;, score=0.832 total time= 0.0s
[CV 1/5] END .....predict__C=0.9105263157894737;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9105263157894737;, score=0.877 total time= 0.0s
[CV 3/5] END .....predict__C=0.9105263157894737;, score=0.816 total time= 0.0s
[CV 4/5] END .....predict__C=0.9105263157894737;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9105263157894737;, score=0.832 total time= 0.0s
[CV 1/5] END .....predict__C=0.9263157894736842;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9263157894736842;, score=0.877 total time= 0.0s
[CV 3/5] END .....predict__C=0.9263157894736842;, score=0.816 total time= 0.0s
[CV 4/5] END .....predict__C=0.9263157894736842;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9263157894736842;, score=0.832 total time= 0.0s
[CV 1/5] END .....predict__C=0.9947368421052631;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9947368421052631;, score=0.877 total time= 0.0s
[CV 3/5] END .....predict__C=0.9947368421052631;, score=0.833 total time= 0.0s
[CV 4/5] END .....predict__C=0.9947368421052631;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9947368421052631;, score=0.832 total time= 0.0s
[CV 1/5] END .....predict__C=0.9842105263157894;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9842105263157894;, score=0.877 total time= 0.0s
[CV 3/5] END .....predict__C=0.9842105263157894;, score=0.833 total time= 0.0s
[CV 4/5] END .....predict__C=0.9842105263157894;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9842105263157894;, score=0.832 total time= 0.0s
[CV 1/5] END .....predict__C=0.9631578947368421;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9631578947368421;, score=0.877 total time= 0.0s
[CV 3/5] END .....predict__C=0.9631578947368421;, score=0.816 total time= 0.0s
[CV 4/5] END .....predict__C=0.9631578947368421;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9631578947368421;, score=0.832 total time= 0.0s
[CV 1/5] END .....predict__C=0.9210526315789473;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9210526315789473;, score=0.877 total time= 0.0s
[CV 3/5] END .....predict__C=0.9210526315789473;, score=0.816 total time= 0.0s
[CV 4/5] END .....predict__C=0.9210526315789473;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9210526315789473;, score=0.832 total time= 0.0s
[CV 1/5] END .....predict__C=0.9736842105263158;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9736842105263158;, score=0.877 total time= 0.0s
[CV 3/5] END .....predict__C=0.9736842105263158;, score=0.825 total time= 0.0s
[CV 4/5] END .....predict__C=0.9736842105263158;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9736842105263158;, score=0.832 total time= 0.0s
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END ....................predict__C=0.9;, score=0.807 total time= 0.0s
[CV 2/5] END ....................predict__C=0.9;, score=0.833 total time= 0.0s
[CV 3/5] END ....................predict__C=0.9;, score=0.807 total time= 0.0s
[CV 4/5] END ....................predict__C=0.9;, score=0.833 total time= 0.0s
[CV 5/5] END ....................predict__C=0.9;, score=0.798 total time= 0.0s
[CV 1/5] END .....predict__C=0.9947368421052631;, score=0.789 total time= 0.0s
[CV 2/5] END .....predict__C=0.9947368421052631;, score=0.833 total time= 0.0s
[CV 3/5] END .....predict__C=0.9947368421052631;, score=0.816 total time= 0.0s
[CV 4/5] END .....predict__C=0.9947368421052631;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9947368421052631;, score=0.798 total time= 0.0s
[CV 1/5] END .....predict__C=0.9315789473684211;, score=0.807 total time= 0.0s
[CV 2/5] END .....predict__C=0.9315789473684211;, score=0.833 total time= 0.0s
[CV 3/5] END .....predict__C=0.9315789473684211;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9315789473684211;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9315789473684211;, score=0.798 total time= 0.0s
[CV 1/5] END .....predict__C=0.9421052631578948;, score=0.807 total time= 0.0s
[CV 2/5] END .....predict__C=0.9421052631578948;, score=0.833 total time= 0.0s
[CV 3/5] END .....predict__C=0.9421052631578948;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9421052631578948;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9421052631578948;, score=0.798 total time= 0.0s
[CV 1/5] END .....predict__C=0.9894736842105263;, score=0.789 total time= 0.0s
[CV 2/5] END .....predict__C=0.9894736842105263;, score=0.833 total time= 0.0s
[CV 3/5] END .....predict__C=0.9894736842105263;, score=0.816 total time= 0.0s
[CV 4/5] END .....predict__C=0.9894736842105263;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9894736842105263;, score=0.798 total time= 0.0s
[CV 1/5] END .....predict__C=0.9578947368421052;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9578947368421052;, score=0.833 total time= 0.0s
[CV 3/5] END .....predict__C=0.9578947368421052;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9578947368421052;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9578947368421052;, score=0.798 total time= 0.0s
[CV 1/5] END .....predict__C=0.9210526315789473;, score=0.807 total time= 0.0s
[CV 2/5] END .....predict__C=0.9210526315789473;, score=0.833 total time= 0.0s
[CV 3/5] END .....predict__C=0.9210526315789473;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9210526315789473;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9210526315789473;, score=0.798 total time= 0.0s
[CV 1/5] END .....predict__C=0.9473684210526316;, score=0.807 total time= 0.0s
[CV 2/5] END .....predict__C=0.9473684210526316;, score=0.833 total time= 0.0s
[CV 3/5] END .....predict__C=0.9473684210526316;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9473684210526316;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9473684210526316;, score=0.798 total time= 0.0s
[CV 1/5] END .....predict__C=0.9631578947368421;, score=0.789 total time= 0.0s
[CV 2/5] END .....predict__C=0.9631578947368421;, score=0.833 total time= 0.0s
[CV 3/5] END .....predict__C=0.9631578947368421;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9631578947368421;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9631578947368421;, score=0.798 total time= 0.0s
[CV 1/5] END .....predict__C=0.9526315789473684;, score=0.798 total time= 0.0s
[CV 2/5] END .....predict__C=0.9526315789473684;, score=0.833 total time= 0.0s
[CV 3/5] END .....predict__C=0.9526315789473684;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9526315789473684;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9526315789473684;, score=0.798 total time= 0.0s
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END .....predict__C=0.9263157894736842;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9263157894736842;, score=0.842 total time= 0.0s
[CV 3/5] END .....predict__C=0.9263157894736842;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9263157894736842;, score=0.816 total time= 0.0s
[CV 5/5] END .....predict__C=0.9263157894736842;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9105263157894737;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9105263157894737;, score=0.842 total time= 0.0s
[CV 3/5] END .....predict__C=0.9105263157894737;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9105263157894737;, score=0.816 total time= 0.0s
[CV 5/5] END .....predict__C=0.9105263157894737;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9526315789473684;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9526315789473684;, score=0.842 total time= 0.0s
[CV 3/5] END .....predict__C=0.9526315789473684;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9526315789473684;, score=0.816 total time= 0.0s
[CV 5/5] END .....predict__C=0.9526315789473684;, score=0.816 total time= 0.0s
[CV 1/5] END ....................predict__C=1.0;, score=0.825 total time= 0.0s
[CV 2/5] END ....................predict__C=1.0;, score=0.842 total time= 0.0s
[CV 3/5] END ....................predict__C=1.0;, score=0.789 total time= 0.0s
[CV 4/5] END ....................predict__C=1.0;, score=0.807 total time= 0.0s
[CV 5/5] END ....................predict__C=1.0;, score=0.816 total time= 0.0s
[CV 1/5] END ....................predict__C=0.9;, score=0.833 total time= 0.0s
[CV 2/5] END ....................predict__C=0.9;, score=0.842 total time= 0.0s
[CV 3/5] END ....................predict__C=0.9;, score=0.789 total time= 0.0s
[CV 4/5] END ....................predict__C=0.9;, score=0.816 total time= 0.0s
[CV 5/5] END ....................predict__C=0.9;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9631578947368421;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9631578947368421;, score=0.842 total time= 0.0s
[CV 3/5] END .....predict__C=0.9631578947368421;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9631578947368421;, score=0.816 total time= 0.0s
[CV 5/5] END .....predict__C=0.9631578947368421;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9473684210526316;, score=0.842 total time= 0.0s
[CV 2/5] END .....predict__C=0.9473684210526316;, score=0.842 total time= 0.0s
[CV 3/5] END .....predict__C=0.9473684210526316;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9473684210526316;, score=0.816 total time= 0.0s
[CV 5/5] END .....predict__C=0.9473684210526316;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9842105263157894;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9842105263157894;, score=0.842 total time= 0.0s
[CV 3/5] END .....predict__C=0.9842105263157894;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9842105263157894;, score=0.807 total time= 0.0s
[CV 5/5] END .....predict__C=0.9842105263157894;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9052631578947369;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9052631578947369;, score=0.842 total time= 0.0s
[CV 3/5] END .....predict__C=0.9052631578947369;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9052631578947369;, score=0.816 total time= 0.0s
[CV 5/5] END .....predict__C=0.9052631578947369;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9210526315789473;, score=0.833 total time= 0.0s
[CV 2/5] END .....predict__C=0.9210526315789473;, score=0.842 total time= 0.0s
[CV 3/5] END .....predict__C=0.9210526315789473;, score=0.789 total time= 0.0s
[CV 4/5] END .....predict__C=0.9210526315789473;, score=0.816 total time= 0.0s
[CV 5/5] END .....predict__C=0.9210526315789473;, score=0.816 total time= 0.0s
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5] END .....predict__C=0.9157894736842106;, score=0.816 total time= 0.0s
[CV 2/5] END .....predict__C=0.9157894736842106;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9157894736842106;, score=0.798 total time= 0.0s
[CV 4/5] END .....predict__C=0.9157894736842106;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9157894736842106;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9052631578947369;, score=0.816 total time= 0.0s
[CV 2/5] END .....predict__C=0.9052631578947369;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9052631578947369;, score=0.798 total time= 0.0s
[CV 4/5] END .....predict__C=0.9052631578947369;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9052631578947369;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9210526315789473;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9210526315789473;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9210526315789473;, score=0.798 total time= 0.0s
[CV 4/5] END .....predict__C=0.9210526315789473;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9210526315789473;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9263157894736842;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9263157894736842;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9263157894736842;, score=0.798 total time= 0.0s
[CV 4/5] END .....predict__C=0.9263157894736842;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9263157894736842;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9789473684210527;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9789473684210527;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9789473684210527;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9789473684210527;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9789473684210527;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9842105263157894;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9842105263157894;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9842105263157894;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9842105263157894;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9842105263157894;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9473684210526316;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9473684210526316;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9473684210526316;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9473684210526316;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9473684210526316;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9526315789473684;, score=0.825 total time= 0.0s
[CV 2/5] END .....predict__C=0.9526315789473684;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9526315789473684;, score=0.807 total time= 0.0s
[CV 4/5] END .....predict__C=0.9526315789473684;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9526315789473684;, score=0.816 total time= 0.0s
[CV 1/5] END ......predict__C=0.968421052631579;, score=0.825 total time= 0.0s
[CV 2/5] END ......predict__C=0.968421052631579;, score=0.851 total time= 0.0s
[CV 3/5] END ......predict__C=0.968421052631579;, score=0.807 total time= 0.0s
[CV 4/5] END ......predict__C=0.968421052631579;, score=0.833 total time= 0.0s
[CV 5/5] END ......predict__C=0.968421052631579;, score=0.816 total time= 0.0s
[CV 1/5] END .....predict__C=0.9105263157894737;, score=0.816 total time= 0.0s
[CV 2/5] END .....predict__C=0.9105263157894737;, score=0.851 total time= 0.0s
[CV 3/5] END .....predict__C=0.9105263157894737;, score=0.798 total time= 0.0s
[CV 4/5] END .....predict__C=0.9105263157894737;, score=0.833 total time= 0.0s
[CV 5/5] END .....predict__C=0.9105263157894737;, score=0.816 total time= 0.0s
Average of Cross Val Score: 0.8286516300600809
Submissions Submitted

Conclusion - 77%

I really did not perform enough feature engineering at all on the dataset after looking at good scores. Check out this Jupyter Notebook to see people who scored better and why they scored better. A good score is about 4% higher than I scored.