Beruflich Dokumente
Kultur Dokumente
Indexing
time series
pandas Foundations
Parse dates
In [1]: import pandas as pd
Parse dates
In [3]: sales.head()
Out[3]:
Company Product Units
Date
2015-02-02 08:30:00 Hooli Software 3
2015-02-02 21:00:00 Mediacore Hardware 9
2015-02-03 14:00:00 Initech Software 13
2015-02-04 15:30:00 Streeplex Software 13
2015-02-04 22:00:00 Acme Coporation Hardware 14
pandas Foundations
Parse dates
In [4]: sales.info()
DatetimeIndex: 19 entries, 2015-02-02 08:30:00 to 2015-02-26
09:00:00
Data columns (total 3 columns):
Company 19 non-null object
Product 19 non-null object
Units 19 non-null int64
dtypes: int64(1), object(2)
memory usage: 608.0+ bytes
pandas Foundations
pandas Foundations
In [10]: evening_2_11
Out[10]:
DatetimeIndex(['2015-02-11 20:00:00', '2015-02-11 21:00:00',
'2015-02-11 22:00:00', '2015-02-11 23:00:00'],
dtype='datetime64[ns]', freq=None)
pandas Foundations
Reindexing DataFrame
In [11]: sales.reindex(evening_2_11)
Out[11]:
Company Product Units
2015-02-11 20:00:00 Initech Software 7.0
2015-02-11 21:00:00 NaN NaN NaN
2015-02-11 22:00:00 NaN NaN NaN
2015-02-11 23:00:00 Hooli Software 4.0
pandas Foundations
Lets practice!
PANDAS FOUNDATIONS
Resampling
time series data
pandas Foundations
Sales data
In [1]: import pandas as pd
In [3]: sales.head()
Out[3]:
Company Product Units
Date
2015-02-02 08:30:00 Hooli Software 3
2015-02-02 21:00:00 Mediacore Hardware 9
2015-02-03 14:00:00 Initech Software 13
2015-02-04 15:30:00 Streeplex Software 13
2015-02-04 22:00:00 Acme Coporation Hardware 14
pandas Foundations
Resampling
Statistical methods over dierent time intervals
mean(), sum(), count(), etc.
Down-sampling
reduce datetime rows to slower frequency
Up-sampling
increase datetime rows to faster frequency
pandas Foundations
Aggregating means
In [4]: daily_mean = sales.resample('D').mean()
In [5]: daily_mean
Out[5]:
Units
Date
2015-02-02 6.0
2015-02-03 13.0
2015-02-04 13.5
2015-02-05 14.5
2015-02-06 NaN
2015-02-07 1.0
2015-02-08 NaN
2015-02-09 13.0
2015-02-10 NaN
2015-02-11 5.5
2015-02-12 NaN
2015-02-13 NaN
2015-02-14 NaN
pandas Foundations
Verifying
In [6]: print(daily_mean.loc['2015-2-2'])
Units 6.0
Name: 2015-02-02 00:00:00, dtype: float64
Method chaining
In [9]: sales.resample('D').sum()
Out[9]:
Units
Date
2015-02-02 6.0
2015-02-03 13.0
2015-02-04 13.5
2015-02-05 14.5
2015-02-06 NaN
2015-02-07 1.0
2015-02-08 NaN
2015-02-09 13.0
2015-02-10 NaN
2015-02-11 5.5
2015-02-12 NaN
2015-02-13 NaN
pandas Foundations
Method chaining
In [10]: sales.resample('D').sum().max()
Out[10]:
Units 29.0
dtype: float64
pandas Foundations
Resampling strings
In [11]: sales.resample('W').count()
Out[11]:
Company Product Units
Date
2015-02-08 8 8 8
2015-02-15 4 4 4
2015-02-22 5 5 5
2015-03-01 2 2 2
pandas Foundations
Resampling frequencies
Input Description
min, T minute
H hour
D day
B business day
W week
M month
Q quarter
A year
pandas Foundations
Multiplying frequencies
In [12]: sales.loc[:,'Units'].resample('2W').sum()
Out[12]:
Date
2015-02-08 82
2015-02-22 79
2015-03-08 14
Freq: 2W-SUN, Name: Units, dtype: int64
pandas Foundations
Upsampling
In [13]: two_days = sales.loc['2015-2-4': '2015-2-5', 'Units']
In [13]: two_days
Out[13]:
Date
2015-02-04 15:30:00 13
2015-02-04 22:00:00 14
2015-02-05 02:00:00 19
2015-02-05 22:00:00 10
Name: Units, dtype: int64
pandas Foundations
Lets practice!
PANDAS FOUNDATIONS
Manipulating
time series data
pandas Foundations
Sales data
In [1]: import pandas as pd
In [3]: sales.head()
Out[3]:
Date Company Product Units
0 2015-02-02 08:30:00 Hooli Software 3
1 2015-02-02 21:00:00 Mediacore Hardware 9
2 2015-02-03 14:00:00 Initech Software 13
3 2015-02-04 15:30:00 Streeplex Software 13
4 2015-02-04 22:00:00 Acme Coporation Hardware 14
pandas Foundations
String methods
In [4]: sales['Company'].str.upper()
Out[4]:
0 HOOLI
1 MEDIACORE
2 INITECH
3 STREEPLEX
4 ACME COPORATION
5 ACME COPORATION
6 HOOLI
7 ACME COPORATION
8 STREEPLEX
9 MEDIACORE
10 INITECH
11 HOOLI
12 HOOLI
13 MEDIACORE
14 MEDIACORE
15 MEDIACORE
pandas Foundations
Substring matching
In [5]: sales['Product'].str.contains('ware')
Out[5]:
0 True
1 True
2 True
3 True
4 True
5 True
6 False
7 True
8 False
9 True
10 True
11 True
12 True
13 True
14 False
pandas Foundations
Boolean arithmetic
In [6]: True + False
Out[6]: 1
Boolean reduction
In [9]: sales['Product'].str.contains('ware').sum()
Out[9]: 14
pandas Foundations
Datetime methods
In [9]: sales['Date'].dt.hour
Out[9]:
0 8
1 21
2 14
3 15
4 22
5 2
6 22
7 23
8 9
9 13
10 20
11 23
12 12
13 11
14 16
pandas Foundations
Set timezone
In [10]: central = sales['Date'].dt.tz_localize('US/Central')
In [11]: central
Out[11]:
0 2015-02-02 08:30:00-06:00
1 2015-02-02 21:00:00-06:00
2 2015-02-03 14:00:00-06:00
3 2015-02-04 15:30:00-06:00
4 2015-02-04 22:00:00-06:00
5 2015-02-05 02:00:00-06:00
6 2015-02-05 22:00:00-06:00
7 2015-02-07 23:00:00-06:00
8 2015-02-09 09:00:00-06:00
9 2015-02-09 13:00:00-06:00
10 2015-02-11 20:00:00-06:00
11 2015-02-11 23:00:00-06:00
12 2015-02-16 12:00:00-06:00
Convert timezone
In [12]: central.dt.tz_convert('US/Eastern')
Out[12]:
0 2015-02-02 09:30:00-05:00
1 2015-02-02 22:00:00-05:00
2 2015-02-03 15:00:00-05:00
3 2015-02-04 16:30:00-05:00
4 2015-02-04 23:00:00-05:00
5 2015-02-05 03:00:00-05:00
6 2015-02-05 23:00:00-05:00
7 2015-02-08 00:00:00-05:00
8 2015-02-09 10:00:00-05:00
9 2015-02-09 14:00:00-05:00
10 2015-02-11 21:00:00-05:00
11 2015-02-12 00:00:00-05:00
12 2015-02-16 13:00:00-05:00
13 2015-02-19 12:00:00-05:00
14 2015-02-19 17:00:00-05:00
Method chaining
In [13]: sales['Date'].dt.tz_localize('US/Central').
...: dt.tz_convert('US/Eastern')
Out[13]:
0 2015-02-02 09:30:00-05:00
1 2015-02-02 22:00:00-05:00
2 2015-02-03 15:00:00-05:00
3 2015-02-04 16:30:00-05:00
4 2015-02-04 23:00:00-05:00
5 2015-02-05 03:00:00-05:00
6 2015-02-05 23:00:00-05:00
7 2015-02-08 00:00:00-05:00
8 2015-02-09 10:00:00-05:00
9 2015-02-09 14:00:00-05:00
10 2015-02-11 21:00:00-05:00
11 2015-02-12 00:00:00-05:00
12 2015-02-16 13:00:00-05:00
13 2015-02-19 12:00:00-05:00
14 2015-02-19 17:00:00-05:00
World Population
In [14]: population = pd.read_csv('world_population.csv',
...: parse_dates=True, index_col= 'Date')
In [15]: population
Out[15]:
Population
Date
1960-12-31 2.087485e+10
1970-12-31 2.536513e+10
1980-12-31 3.057186e+10
1990-12-31 3.644928e+10
2000-12-31 4.228550e+10
2010-12-31 4.802217e+10
pandas Foundations
Upsample population
In [16]: population.resample('A').first()
Out[16]:
Population
Date
1960-12-31 2.087485e+10
1961-12-31 NaN
1962-12-31 NaN
1963-12-31 NaN
1964-12-31 NaN
1965-12-31 NaN
1966-12-31 NaN
1967-12-31 NaN
1968-12-31 NaN
1969-12-31 NaN
1970-12-31 2.536513e+10
1971-12-31 NaN
1972-12-31 NaN
pandas Foundations
Lets practice!
PANDAS FOUNDATIONS
Time series
visualization
pandas Foundations
Topics
Line types
Plot types
Subplots
pandas Foundations
In [4]: sp500.head()
Out[4]:
Open High Low Close Volume Adj Close
Date
2010-01-04 1116.560059 1133.869995 1116.560059 1132.989990 3991400000 1132.989990
2010-01-05 1132.660034 1136.630005 1129.660034 1136.520020 2491020000 1136.520020
2010-01-06 1135.709961 1139.189941 1133.949951 1137.140015 4972660000 1137.140015
2010-01-07 1136.270020 1142.459961 1131.319946 1141.689941 5270680000 1141.689941
2010-01-08 1140.520020 1145.390015 1136.219971 1144.979980 4389590000 1144.979980
pandas Foundations
Pandas plot
In [5]: sp500['Close'].plot()
In [6]: plt.show()
pandas Foundations
Default plot
pandas Foundations
In [9]: plt.show()
pandas Foundations
One week
In [10]: sp500.loc['2012-4-1':'2012-4-7', 'Close'].plot(title='S&P
...: 500')
In [12]: plt.show()
pandas Foundations
One week
pandas Foundations
Plot styles
In [13]: sp500.loc['2012-4', 'Close'].plot(style='k.-',
...: title='S&P500')
In [15]: plt.show()
pandas Foundations
One week
pandas Foundations
r: red s: square
c: cyan +: plus
pandas Foundations
Area plot
In [16]: sp500['Close'].plot(kind='area', title='S&P 500')
In [18]: plt.show()
pandas Foundations
Area plot
pandas Foundations
Multiple columns
In [19]: sp500.loc['2012', ['Close','Volume']].plot(title='S&P
...: 500')
In [20]: plt.show()
pandas Foundations
Multiple columns
pandas Foundations
Subplots
In [21]: sp500.loc['2012', ['Close','Volume']].plot(subplots=True)
In [22]: plt.show()
pandas Foundations
Subplots
PANDAS FOUNDATIONS
Lets practice!