Project Objective
Using the given data, including "season, holiday, working day, weather, temp, attempt, humidity, windspeed, registered, casual, and count (rental quantity)," predict the future rental demand.
Data Ingestion
The data is provided by Kaggle.
Data Processing
Data Observation: Use train.info(), train.describe(), test.info(), test.describe() to check for null values and outliers.
2. Feature engineering
Remove outliers.
Merge the data.
Split the datetime into various time format data such as 'date', 'hour', 'year', 'weekday', etc.
Use distplot to observe the data distribution of various features, including 'temp,' 'atemp,' 'humidity,' 'windspeed,' etc. Identify an issue with the distribution of windspeed.
Use RandomForestRegression with features such as 'season, weather, humidity, month, year, temp, atemp' to predict windspeed.
3. Resplit the 'train' and 'test' data.
4. Transform the distribution of the count values from a "positive direction" to a normal distribution using logarithm (log).
5. Perform data prediction and export the data.
Analysis Methods
Since this machine learning utilizes a multi-feature dataset with known outcomes ('count') to predict unknown outcomes, random forest regression analysis is adopted to predict.
Presentation
Obtain the expected rental quantity for each time point.
Comments