Predicting Taxi Ride Duration
There are roughly 200 million taxi rides in NYC each year. Analysis and understanding of taxi supply and demand could increase the efficiency of the city’s taxi system. Predicting taxi ridership could present valuable insights to city planners and taxi dispatchers.
This dataset is collected by the NYC Taxi and Limousine Commission (TLC) and includes trip records from all trips completed in Yellow and Green taxis in NYC from 2009 to present. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
  • There are 1458644 rows with 11 variables in train.
  • There are 625134 rows with 9 variables in test.
Train has two additional fields trip_duration and dropoff_datetime to the test set. The variable trip_duration is the independent, response variable we are trying to predict and is derived as the difference between dropoff_datetime and pickup_datetime. Each row in the datasets represent one taxi trip. All variable headings are populated.
For more information. Please click the following link