{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.10.13","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"kaggle":{"accelerator":"none","dataSources":[],"dockerImageVersionId":30698,"isInternetEnabled":true,"language":"python","sourceType":"notebook","isGpuEnabled":false}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"markdown","source":"# Final Challenge: Dive into Data Science with Python  \n\nHere's a practical project to evaluate your skills in Python programming, data manipulation with Pandas, data visualization with Matplotlib and Seaborn, and basic data analysis. The project is divided into multiple sections and includes questions of increasing complexity.  \n\n> *The optional advanced sections and bonus challenges can be used to challenge more advanced students.*\n\n\n#### Section 1: Data Loading and Exploration\n\n1. Load the \"Stores.csv\" dataset using Pandas.  \n (Find this dataset in [Dayche kaggle's profile](https://www.kaggle.com/datasets/rouzbeh/stores-dataset))  \n \n2. Check data types for all features.  \n\n3. Display the first few rows of the dataset to get an overview.  \n  \n4. Describe the basic statistics of all qualitative an quantitative columns.  \n\nWrite your interpretation of the obtained outputs in terms of analysis and preparation actions.\n","metadata":{}},{"cell_type":"code","source":"# Here's for your code ... If necessary, add new cell.","metadata":{},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### Section 2: Data Cleaning and Transformation  \n\n5. Convert the \"Revenue\" column to a numeric data type.\n\n6. Create a new column \"RevToArea\" based on the ratio between \"Revenue\" column and \"AreaStore\" column.\n\n7. Set \"Store Number\" column as a lable of rows in data frame.  \n\n8. Check for missing values in the dataset and use an **if-else statement** to handle them by filling in the missing values with Median/Mode values of features.","metadata":{}},{"cell_type":"code","source":"# Here's for your code ... If necessary, add new cell.","metadata":{},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### Section 3: Data Visualization  \n\n9. Create a bar chart showing the distribution of store \"Type\" with matplotlib.\n\n10. Generate a histogram of store \"Revenue\" with seaborn.","metadata":{}},{"cell_type":"code","source":"# Here's for your code ... If necessary, add new cell.","metadata":{},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### Section 4: Data Analysis  \n\n12. Compare \"AreaStore\" values for \"Store Numbers\" 5 and 117, and print which one is greater than. *(hint: uisng .loc)*\n\n13. Write a **function** to get \"Store Numbers\" and numeric features name, then compare values and print which one is greater than.\n\n15. Determine the most common \"Property\" type among the stores.   \n\n14. Identify the store with the highest \"Revenue\" per square meter of \"AreaStore\".  \n\n16. Write a **function** to find and return the top N stores with the highest \"Revenue\" in number K stores. Allow the user to specify the number N and K as an argument. (N <= K)\n","metadata":{}},{"cell_type":"code","source":"# Here's for your code ... If necessary, add new cell.","metadata":{},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"#### Section 5: Bonus Challenge (Advanced) --Optional \n\n17. Use Scikit-Learn to perform test and trian split dataset.  \n\n18. Use Scikit-Learn to resclale numerical data for modeling. \n\n19. Use Scikit-Learn to perform linear regression to predict store \"Revenue\", based on **numerical features**.  \n\n20. Use Scikit-Learn to evaluate model with **R square** metric.","metadata":{}},{"cell_type":"code","source":"# Here's for your code ... If necessary, add new cell.","metadata":{},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"# Good luck!","metadata":{}}]}