top of page

Prediction of the demand of bike sharing

Writer's picture: Philip LaiPhilip Lai

Project Objective

Using the given data, including "season, holiday, working day, weather, temp, attempt, humidity, windspeed, registered, casual, and count (rental quantity)," predict the future rental demand.


Data Ingestion

The data is provided by Kaggle.


Data Processing

  1. Data Observation: Use train.info(), train.describe(), test.info(), test.describe() to check for null values and outliers.

2. Feature engineering

  1. Remove outliers.

  2. Merge the data.

  3. Split the datetime into various time format data such as 'date', 'hour', 'year', 'weekday', etc.

  4. Use distplot to observe the data distribution of various features, including 'temp,' 'atemp,' 'humidity,' 'windspeed,' etc. Identify an issue with the distribution of windspeed.

  5. Use RandomForestRegression with features such as 'season, weather, humidity, month, year, temp, atemp' to predict windspeed.




3. Resplit the 'train' and 'test' data.

4. Transform the distribution of the count values from a "positive direction" to a normal distribution using logarithm (log).


5. Perform data prediction and export the data.


Analysis Methods

Since this machine learning utilizes a multi-feature dataset with known outcomes ('count') to predict unknown outcomes, random forest regression analysis is adopted to predict.


Presentation

Obtain the expected rental quantity for each time point.


3 views0 comments

Recent Posts

See All

Comments


bottom of page