{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.10.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[{"sourceId":8677062,"sourceType":"datasetVersion","datasetId":5201213}],"dockerImageVersionId":30732,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"**K-Nearest Neighbors** (KNN) is a versatile algorithm that extends *beyond traditional classification and regression* tasks to scenarios like fraud detection and hiring decisions. In such cases, KNN serves as a proximity-based method for **finding similar instances** in a dataset **without specific labels** or target variables. For instance, in **fraud detection**, KNN can identify transactions or behaviors resembling known fraudulent patterns by measuring similarity to historical fraud cases. Similarly, in **hiring decisions**, KNN can assist in identifying candidates with characteristics similar to successful past employees who held a particular position. This approach relies on the assumption that instances with similar features or attributes are likely to exhibit similar behaviors or outcomes, enabling KNN to provide valuable insights for decision-making in diverse problem domains. However, it's essential to consider the potential challenges, such as the need for *careful feature selection* and *scalability issues*, particularly with large datasets, when applying KNN in such contexts.","metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19"}},{"cell_type":"markdown","source":"\nIn the given dataset containing **various features** influencing **car prices**, we encounter a scenario where new cars—a car and a truck—have been manufactured, and we aim to determine suitable prices for them based on the characteristics of the **most similar existing cars**. Leveraging the **K-Nearest Neighbors** (KNN) algorithm, we can identify the most similar cars in the dataset to the newly produced ones by computing their distances in the feature space. By selecting a suitable value for 'k,' the number of nearest neighbors to consider, we can then *average the prices of these similar cars to estimate the appropriate prices for the new car and truck*. ","metadata":{}},{"cell_type":"markdown","source":"### Read the datasets","metadata":{}},{"cell_type":"code","source":"import pandas as pd\n\n# Loading the dataset of priced cars from the specified CSV file into a DataFrame called PricedCars.\nPricedCars = pd.read_csv('/kaggle/input/cars-dataset/PricedCars.csv')\n\n# Displaying summary information about the PricedCars DataFrame, such as the column names, data types, and memory usage.\nPricedCars.info()\n\n# Loading the dataset of unpriced cars from the specified CSV file into a DataFrame called UnpricedCars.\nUnpricedCars = pd.read_csv('/kaggle/input/cars-dataset/UnpricedCars.csv')\n\n# Displaying summary information about the UnpricedCars DataFrame, such as the column names, data types, and memory usage.\nUnpricedCars.info()","metadata":{"trusted":true},"execution_count":26,"outputs":[{"name":"stdout","text":"<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 2 entries, 0 to 1\nData columns (total 11 columns):\n #   Column    Non-Null Count  Dtype  \n---  ------    --------------  -----  \n 0   manufact  0 non-null      float64\n 1   model     2 non-null      object \n 2   price     0 non-null      float64\n 3   engine_s  2 non-null      float64\n 4   horsepow  2 non-null      int64  \n 5   wheelbas  2 non-null      float64\n 6   width     2 non-null      float64\n 7   length    2 non-null      float64\n 8   curb_wgt  2 non-null      float64\n 9   fuel_cap  2 non-null      float64\n 10  mpg       2 non-null      int64  \ndtypes: float64(8), int64(2), object(1)\nmemory usage: 304.0+ bytes\n","output_type":"stream"}]},{"cell_type":"markdown","source":"### Prepare the Data","metadata":{}},{"cell_type":"code","source":"from sklearn.preprocessing import StandardScaler\n\n# Creating an instance of the StandardScaler class to scale the features.\nz_score = StandardScaler()\n\n# Standardizing the features of the priced cars dataset using the fit_transform method,\nPricedCars_scaled = z_score.fit_transform(PricedCars.iloc[:, 3:11])\n\n# Standardizing the features of the unpriced cars dataset using the transform method,\nUnpricedCars_scaled = z_score.transform(UnpricedCars.iloc[:, 3:11])","metadata":{"execution":{"iopub.status.busy":"2024-06-21T19:07:10.239093Z","iopub.execute_input":"2024-06-21T19:07:10.239622Z","iopub.status.idle":"2024-06-21T19:07:10.256151Z","shell.execute_reply.started":"2024-06-21T19:07:10.239585Z","shell.execute_reply":"2024-06-21T19:07:10.254675Z"},"trusted":true},"execution_count":27,"outputs":[]},{"cell_type":"markdown","source":"### Find the Nearest Neighbors","metadata":{}},{"cell_type":"code","source":"from sklearn.neighbors import NearestNeighbors\n\n# Creating an instance of the NearestNeighbors class with the specified parameters:\nmodel = NearestNeighbors(n_neighbors=5, radius=1.0, metric='minkowski', p=2, n_jobs=-1)\n\n# Fitting the model to the standardized features of the priced cars dataset.\nmodel.fit(PricedCars_scaled)\n\n# Finding the k-nearest neighbors for the unpriced cars using the standardized features.\n# The kneighbors method returns the distances and indices of the neighbors.\ndistance, index = model.kneighbors(UnpricedCars_scaled)\n\n# Retrieving the nearest neighbors for the new car from the priced cars dataset using the indices.\nnewcar_neighbors = PricedCars.iloc[index[0]]\n\n# Retrieving the nearest neighbors for the new truck from the priced cars dataset using the indices.\nnewtruck_neighbors = PricedCars.iloc[index[1]]","metadata":{"trusted":true},"execution_count":34,"outputs":[{"execution_count":34,"output_type":"execute_result","data":{"text/plain":"          manufact     model   price  engine_s  horsepow  wheelbas  width  \\\n104         Nissan     Quest  26.399       3.3     170.0     112.2   74.9   \n91         Mercury  Villager  22.510       3.3     170.0     112.2   74.9   \n61           Honda   Odyssey  26.000       3.5     210.0     118.1   75.6   \n13           Buick   LeSabre  27.885       3.8     205.0     112.2   73.5   \n100  Mercedes-Benz   M-Class  35.300       3.2     215.0     111.0   72.2   \n\n     length  curb_wgt  fuel_cap   mpg  \n104   194.8     3.991      20.0  21.0  \n91    194.7     3.944      20.0  21.0  \n61    201.2     4.288      20.0  23.0  \n13    200.0     3.591      17.5  25.0  \n100   180.6     4.387      19.0  20.0  ","text/html":"<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>manufact</th>\n      <th>model</th>\n      <th>price</th>\n      <th>engine_s</th>\n      <th>horsepow</th>\n      <th>wheelbas</th>\n      <th>width</th>\n      <th>length</th>\n      <th>curb_wgt</th>\n      <th>fuel_cap</th>\n      <th>mpg</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>104</th>\n      <td>Nissan</td>\n      <td>Quest</td>\n      <td>26.399</td>\n      <td>3.3</td>\n      <td>170.0</td>\n      <td>112.2</td>\n      <td>74.9</td>\n      <td>194.8</td>\n      <td>3.991</td>\n      <td>20.0</td>\n      <td>21.0</td>\n    </tr>\n    <tr>\n      <th>91</th>\n      <td>Mercury</td>\n      <td>Villager</td>\n      <td>22.510</td>\n      <td>3.3</td>\n      <td>170.0</td>\n      <td>112.2</td>\n      <td>74.9</td>\n      <td>194.7</td>\n      <td>3.944</td>\n      <td>20.0</td>\n      <td>21.0</td>\n    </tr>\n    <tr>\n      <th>61</th>\n      <td>Honda</td>\n      <td>Odyssey</td>\n      <td>26.000</td>\n      <td>3.5</td>\n      <td>210.0</td>\n      <td>118.1</td>\n      <td>75.6</td>\n      <td>201.2</td>\n      <td>4.288</td>\n      <td>20.0</td>\n      <td>23.0</td>\n    </tr>\n    <tr>\n      <th>13</th>\n      <td>Buick</td>\n      <td>LeSabre</td>\n      <td>27.885</td>\n      <td>3.8</td>\n      <td>205.0</td>\n      <td>112.2</td>\n      <td>73.5</td>\n      <td>200.0</td>\n      <td>3.591</td>\n      <td>17.5</td>\n      <td>25.0</td>\n    </tr>\n    <tr>\n      <th>100</th>\n      <td>Mercedes-Benz</td>\n      <td>M-Class</td>\n      <td>35.300</td>\n      <td>3.2</td>\n      <td>215.0</td>\n      <td>111.0</td>\n      <td>72.2</td>\n      <td>180.6</td>\n      <td>4.387</td>\n      <td>19.0</td>\n      <td>20.0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"},"metadata":{}}]}]}