{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.10.14","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"To apply data transformation, follow the steps below:\n\n### Create a version of the data named **\"train_FS\"** with the following steps:\n1. Check the **descriptive statistics**, particularly skewness and kurtosis, for continuous fields to identify non-normal distributions.\n2. Apply **discretization** using the Chi-Merge method for \"VehBCost\" and \"WarrantyCost,\" as they are not normally distributed. Use 1 to 5 for the created classes. Finally, discard \"VehBCost\" and \"WarrantyCost\" from train_FS. You should have 22 fields after this step.\n3.Perform **one-hot encoding** for all nominal fields (\"Auction\", \"Make\", \"Color\", \"Transmission\", \"WheelType\", \"Nationality\", \"Size\", and \"TopThreeAmericanName\"). Finally discard the mentioned nominal fields from the dataframe. You should have 62 fields after this step.\n4. **Scale** all fields except \"IsBadBuy\" using the min-max method. After this step you should have 62 fields.\n\n### Create a version of the data named **\"train_FE\"** with the following steps:\n1. Apply **feature transformation** using the Box-Cox method to make the distributions of \"VehBCost\" and \"WarrantyCost\" more Gaussian. Finally, discard \"VehBCost\" and \"WarrantyCost\" from train_FE. You should have 22 fields after this step.\n2. Perform **one-hot encoding** for all nominal fields (\"Auction\", \"Make\", \"Color\", \"Transmission\", \"WheelType\", \"Nationality\", \"Size\", and \"TopThreeAmericanName\").Finally discard the mentioned nominal fields from the dataframe. You should have 62 fields after this step.\n3. **Scale** all fields except \"IsBadBuy\" using the z-score method. After this step you should have 62 fields.\n","metadata":{}}]}