Chapter:2-How to split your dataset to train and test datasets using SciKit Learn

  • test_size — This parameter decides the size of the data that has to be split as the test dataset. This is given as a fraction. For example, if you pass 0.5 as the value, the dataset will be split 50% as the test dataset. If you’re specifying this parameter, you can ignore the next parameter.
  • train_size — You have to specify this parameter only if you’re not specifying the test_size. This is the same as test_size, but instead, you tell the class what percent of the dataset you want to split as the training set.
  • random_state — Here you pass an integer, which will act as the seed for the random number generator during the split. Or, you can also pass an instance of the RandomState class, which will become the number generator. If you don’t pass anything, the RandomState instance used by np.random will be used instead.
Data Set
from sklearn.model_selection import train_test_split 
xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size = 0.2, random_state = 0)
after split

--

--

DevOps/Cloud | 2x AWS Certified | Terraform | Gitlab

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store