Commit a3c4250a authored by Jamie Forth's avatar Jamie Forth

t06 intro and summary and final checks

parent a29a530b
......@@ -24,11 +24,13 @@
\DeclareBibliographyCategory{essential}
\addtocategory{essential}{Carr2010, Brath2015, Lechne2020,
Wickham2014, Hould2016, Heer2010, Kirk2012, Kelleher2011, Myatt2009,
Myatt2014, Raman2015, VanderPlas2016, Mckinney2017, Ware2012}
Myatt2014, Raman2015, Telea2015, VanderPlas2016, Mckinney2017,
Ware2012, Wilke2019}
\DeclareBibliographyCategory{documentation}
\addtocategory{documentation}{AnacondaDoc, GeoPandasDoc,
JupyterLabDoc, pandasDoc, PythonDoc, MatplotlibDoc, SeabornDoc}
JupyterLabDoc, pandasDoc, PythonDoc, MatplotlibDoc, SeabornDoc,
vispyDoc}
\DeclareBibliographyCategory{unavailable}
\addtocategory{unavailable}{Card1999, Few2009, Few2012, Kirk2016,
......
This diff is collapsed.
No preview for this file type
......@@ -1016,6 +1016,36 @@
{https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#descriptive-statistics}
}
@manual{pandasGroupBy,
xdata = {pandasInfo},
title = {{pandas user guide: Group by: split-apply-combine}},
shorttitle = {Group by: split-apply-combine},
url =
{https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html}
}
@manual{pandasMerge,
xdata = {pandasInfo},
title = {{pandas user guide: Merge, join, concatenate and compare}},
shorttitle = {Merge, join, concatenate and compare},
url =
{https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html}
}
@manual{pandasAutocorrelationPlot,
xdata = {pandasInfo},
title = {{pandas user guide: Autocorrelation plot}},
url =
{https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.autocorrelation_plot.html}
}
@manual{pandasLagPlot,
xdata = {pandasInfo},
title = {{pandas user guide: Lag plot}},
url =
{https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.lag_plot.html}
}
@manual{pandasMultiIndex,
xdata = {pandasInfo},
title = {{pandas user guide: MultiIndex / advanced indexing}},
......
......@@ -12,7 +12,7 @@
- percentage increase
- 2d time (year x month): [[http://bokeh.pydata.org/en/latest/docs/gallery/unemployment.html][unemployment]] grid plot (bokeh)
* IN-PROGRESS Time-series data
* FINAL Time-series data
#+include: "../mdp/06.org::*Topic learning outcomes"
#+include: "../mdp/06.org::*Topic description"
......@@ -44,7 +44,22 @@ nottype=incollection, nottype=inproceedings, nottype=manual]
** Introduction
Use topic description text.
Welcome to topic 6: Time-series data. In this topic we will look at
discovering patterns and trends in data that varies across time. In
order to work effectively with time-series data we need to understand
the different ways in which time can be represented, such as
instantaneous points in time or as spans of time. Time information is
typically stored in the index of a pandas =DataFrame= or =Series=
object, so we will learn new ways of indexing and slicing our data by
time, as well as how to apply familiar data processing concepts such
as aggregation and grouping to time-series data. We will also cover
the concept of =MultiIndex= in more detail because very often
time-series datasets contain additional index variables representing
particular aspects of the observations or measurements. Line plots are
our primary visualisation form for time-series data, but we will also
look at lag plots and auto-correlation plots as additional tools to
help us understand and communicate information about time-varying
data.
** Lesson 1 – Time-series data
*** FINAL Video 1 – Time-series data
......@@ -1786,7 +1801,10 @@ What data type do you get if you subtract a =Timedelta= object from
- [ ] =Timedelta=
- [ ] =DateOffset=
*** TODO Code activity – Parsing and data processing
*** Code activity – Timestamps
See notebook:
- [[https://www.doc.gold.ac.uk/~jfort010/dv/topic6-timestamps.zip]]
** Lesson 3 – Time-series data processing
*** FINAL Video 4 – Pre-processing and advanced indexing
......@@ -2250,6 +2268,10 @@ TBC
TBC
*** Essential reading – \citetitle{pandasMultiIndex}
\fullcite{pandasMultiIndex}
*** FINAL Video 5 – Aggregation and merging
:PROPERTIES:
:export_file_name: export/06-slides+scripts/dv-06-5-aggregation
......@@ -2971,7 +2993,17 @@ TBC
\fullcite{pandasMultiIndex}
*** TODO Code activity – Time-series data processing
*** Essential reading – Merging and grouping data
\fullcite{pandasMerge}
\fullcite{pandasGroupBy}
*** Code activity – Time-series data processing
See notebook:
- [[https://www.doc.gold.ac.uk/~jfort010/dv/topic6-london-population.zip]]
** Lesson 4 – Time-series data analysis
*** FINAL Video 6 – Time-series analysis
:PROPERTIES:
......@@ -3523,7 +3555,7 @@ TBC
TBC
**** Testing for randomness: Autocorrelation plot
**** Testing for randomness: Auto-correlation plot
:PROPERTIES:
:CUSTOM_ID: 558877d0-6109-4015-a41f-395d40dfc687
:FORMAT: slides and audio
......@@ -3585,6 +3617,11 @@ TBC
- select all rows of inner London column
*** Essential reading – Auto-correlation and lag plots
\fullcite{pandasAutocorrelationPlot}
\fullcite{pandasLagPlot}
*** Quiz – Time-series analysis
{{{quiz-intro}}}
......@@ -3645,9 +3682,14 @@ Which of the following statements are true?
- [X] Both lag plots and autocorrelation plots visualise how
correlated a time-series is with itself.
** TODO Topic summary
** Topic summary
TBC
Visualising data measured over time presents particular challenges for
data processing and visualisation, and we have seen yet again the
importance of generating tidy data at the early stages of data
analysis and the benefits this provides in terms of flexibility and
efficiency of analysis. In the next topic we will turn our attention
to data measured over geographical space.
** Further resources
......
No preview for this file type
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment