Birds Migration Patterns
The case study consists of analysis of migration patterns of three birds
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
birddata = pd.read_csv("C:\\Users\\ashish\\Work\\Case_Studies\\Bird Migration\\bird_tracking.csv", index_col=0)
birddata.head()
birddata.info
birddata.tail()
The data consists of almost 62,000 data points and 9 features or columns
birddata.bird_name.value_counts()
There are 3 types of birds in our dataset, named Nico, Sanne, Eric
Linear estimation - because the earth is not flat - of flight trajectory of bird migration of a particular bird "Eric". The trajectory will be substantially distorted because we have not done any Cartographic Projection of the flight trajectory.
This plot is just to get a rought look at the flight trajectory of a bird
ind = birddata.bird_name == "Eric"
x, y = birddata.longitude[ind], birddata.latitude[ind]
plt.figure(figsize=(7,7))
plt.plot(x, y, "o", ms=2)
plt.savefig("Eric_migration_2D_traj.pdf")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.title("Eric flight trajectory")
plt.show()
Let's plot the flight trajectory for all of three birds
birds = birddata.bird_name.unique()
plt.figure(figsize=(7,7))
for bird in birds:
ind = birddata.bird_name == bird
x, y = birddata.longitude[ind], birddata.latitude[ind]
plt.plot(x, y, "o", ms=2, label=bird)
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.title("Birds flight trajectory")
plt.legend(loc="lower right")
plt.savefig("birds_flight_traj.pdf")
plt.show()
To further proceed, we would like to chech if our data consists of missing values and handle them accordingly We'll be using sklearn for the preprocessing of the data and handling the missing values
birddata.isnull().sum()
Two columns direction and speed_2d consists of same no. of missing values but for direction column mean is not an appropriate approximation. Therefor we'll first impute speed_2d with mean and then we'll use n_neighbours strategy for imputation of direction
from sklearn.impute import SimpleImputer, KNNImputer
# default args are what we want i.e. missing_values=nan, strategy='mean'
imputer = SimpleImputer()
birddata["speed_2d"] = imputer.fit_transform(birddata[['speed_2d']])
birddata.isnull().sum()
Let's impute the direction column with default args
imputer = KNNImputer()
imputer.fit(birddata.loc[:, 'direction':'speed_2d'])
birddata.loc[:, 'direction':'speed_2d'] = imputer.transform(birddata.loc[:, 'direction':'speed_2d'])
birddata.tail()
Ommit the last row as it's unnecessarily introduced into the dataset.
birddata = birddata.iloc[:-1, :]
birddata.tail()
birddata.isnull().sum()
Let's try plotting a histogram of speed_2d for a particular bird Eric
speed = birddata.speed_2d[ind]
plt.figure(figsize=(7,7))
plt.hist(speed, bins=np.linspace(0,30,20), density=True)
plt.title("Eric 2D speed Histogram")
plt.xlabel("Speed (m/s)")
plt.ylabel("Frequency")
plt.savefig("Eric_2D_speed_hist.pdf")
plt.show()
Notice that in our dataset we have a column that consists of datetime, so lets check what is the datatype of this column
type(birddata.date_time[0])
birddata.date_time[0]
datetime in our dataset is in str format and to be able to perform computation - computing time interval between two data points - on datetime we would like it convert to a datetime object
import datetime as dt
# remove '+00 from the strings as the time is already in UTC'
timestamps = birddata.date_time
timestamps = [stamp[:-3] for stamp in timestamps]
timestamps[:3]
timestamps = list(map(lambda str_stamp: dt.datetime.strptime(
str_stamp, "%Y-%m-%d %H:%M:%S"), timestamps))
birddata["timestamp"] = pd.Series(timestamps, index=birddata.index)
birddata.tail()
birddata.timestamp[0]
birddata.timestamp[4] - birddata.timestamp[3]
Now that we have our timestamp in place, we'd like to see how often or when the data was collected in the process. Also for this we'll limit ourselves to Eric
times = birddata.timestamp[birddata.bird_name == "Eric"]
elapsed_time = [time - times[0] for time in times]
plt.figure(figsize=(7,7))
plt.plot(np.array(elapsed_time) / dt.timedelta(days=1))
plt.xlabel("Observations")
plt.ylabel("Elapsed Time")
plt.title("Elapsed time for Eric")
plt.savefig("Eric_elapsed_time.pdf")
plt.show()
Our next goal is to find when does "Eric" migrate. To achieve that we'll plot the daily mean speed of Eric. The data is recorded unevenly i.e. on some days data was collected more times and some days it was collected less no. of times. We'll start by getting indices of speed_2d that were collected on the same day and then take mean of those speeds, followed by plotting them to see if there's any pattern.
data = birddata[birddata.bird_name == "Eric"]
elapsed_days = np.array(elapsed_time) / dt.timedelta(days=1)
daily_mean_speed = []
next_day = 1
inds = []
for i,t in enumerate(elapsed_days):
if t < next_day:
inds.append(i)
else:
daily_mean_speed.append(np.mean(data.speed_2d[inds]))
next_day += 1
inds = []
plt.figure(figsize=(7,7))
plt.plot(daily_mean_speed)
plt.xlabel("Days")
plt.ylabel("Speed (m/s)")
plt.title("Eric Daily Mean Speed")
plt.savefig("Eric_daily_mean_speed.pdf")
plt.show()
from the 2D-Speed of Eric it can be argued that during days 90 - 100 and 230 - 240, speed of Eric was significantly higher than other days. So it can be said that Eric migrated during those days. To corroborate our beliefs about the migration we would like to look at the place at which Eric ended up during those days.
Earlier we tried plotting migration pattern of birds but it was not quite what we were looking for because it was not a cartographic projection. So now we'll use Cartopy for cartographic projection of flight patterns of the birds.
import cartopy.crs as ccrs
import cartopy.feature as cfeature
proj = ccrs.Mercator()
plt.figure(figsize=(10,10))
ax = plt.axes(projection=proj)
ax.set_extent((-25.0, 20.0, 52.0, 10.0))
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS, linestyle=':', alpha = 0.95)
for bird in birds:
ix = birddata["bird_name"] == bird
x, y = birddata.longitude[ix], birddata.latitude[ix]
ax.plot(x,y, '.', transform=ccrs.Geodetic(), label=bird)
plt.legend(loc="upper left")
plt.savefig("map.pdf")
plt.show()
We'll now group the data by bird_name to get the average 2D speed of the birds
data = birddata.groupby('bird_name')
names = pd.Series(birds, name="Bird Name")
mean_speeds = data.speed_2d.mean()
data.speed_2d.describe().set_index(names)
mean_altitudes = data.altitude.mean()
data.altitude.describe().set_index(names)
We'll now group our data by each date to get the mean altitude of each day
birddata.date_time = pd.to_datetime(birddata.date_time)
birddata["date"] = birddata.date_time.dt.date
grouped_bydates = birddata.groupby("date")
mean_altitudes_perday = grouped_bydates.altitude.mean()
mean_altitudes_perday
grouped_birdday = birddata.groupby(["bird_name", "date"])
mean_altitudes_perday = grouped_birdday.altitude.mean()
mean_altitudes_perday.head()
eric_daily_speed = grouped_birdday.speed_2d.mean()["Eric"]
sanne_daily_speed = grouped_birdday.speed_2d.mean()["Sanne"]
nico_daily_speed = grouped_birdday.speed_2d.mean()["Nico"]
plt.figure(figsize=(10,10))
eric_daily_speed.plot(label="Eric")
sanne_daily_speed.plot(label="Sanne")
nico_daily_speed.plot(label="Nico")
plt.xlabel("Date")
plt.ylabel("Mean 2D Speed (m/s)")
plt.title("Mean 2D Speeds")
plt.legend(loc="upper left")
plt.savefig("mean_2d_speed.pdf")
plt.show()