diff --git a/topics/dv-resit-coursework.org b/topics/dv-resit-coursework.org new file mode 100644 index 0000000000000000000000000000000000000000..d0e0e6ff06300e3e1dd3c1a67809eb3934aa15c3 --- /dev/null +++ b/topics/dv-resit-coursework.org @@ -0,0 +1,422 @@ +#+title: Resit coursework +#+subtitle: UoL MSc Data Science: Data Visualisation +#+export_file_name: export/dv-resit-coursework +#+options: H:3 toc:nil author:nil date:t +#+date: October 2021 +#+setupfile: ~/org/latex/latex-setup.org +#+author: Jamie Forth +#+latex_header_extra: \setlist{nosep} +#+latex_header_extra: \usepackage{parskip} + + +This resit coursework is for students resitting DSM050 and in the +previous session: + +- passed Final Survey Report (C1.1) =[25%]= +- passed Mid-term test (C1.2) =[5%]= +- failed the Exam =[70%]=. + +#+begin_center +*This coursework replaces the Exam* +#+end_center + +This coursework is in two parts. Part 1 is worth 30%, Part 2 is worth +40%. The total weighting of this coursework is 70%. + +* Part 1 – Survey project: Reflective commentary =[30%]= +** Assignment specification + +Produce a 1500–2000 word reflective essay about your previous survey +coursework (Final Survey Report C1.1). + +- Submit your essay as a *PDF* file (1500–2000 words). +- References and appendices do /not/ count towards the word limit. + +Give a brief introduction to your topic, but do not substantially +recount your analysis here. The purpose of this essay is to +self-evaluate the research you carried out. + +You should reflect and comment on: +1. the design of the research project + - e.g. research questions, scope, population, background research +2. the design and implementation of your survey + - e.g. operationalisation, wording and ordering of questions, + potential bias, the mapping of research questions to survey + questions +3. your analysis + - e.g. data pre-processing, exploratory visualisation, explanatory + visualisation. + +Consider the strengths and weakness of the approach taken in your +previous coursework. Identify and discuss the most important aspects +of the process that both led to successful findings and that you would +improve upon were you were to carry out this research again. + +** Essay structure and mark breakdown +*** Introduction =[10%]= + +- brief summary of survey research topic and key findings +- brief summary of the structure of the reflective essay (what are the + main points about your process that you are going to discuss?) + +*** Commentary / self-evaluation =[80%]= + +- discuss strengths and weakness of the previous research project +- prioritise the most important or significant aspects of your process + to keep within the word limit +- marks will be awarded for critical insight backed up by evidence, + references and/or best practices in the field +- if any new important findings become apparent from revising this + work they can be briefly presented, but the main focus of the essay + should be critical reflection not re-analysis of your data + +*** Conclusion =[10%]= + +- specific: highlight the main takeaway messages from this process of + self-reflection regarding your survey project +- general: summarise the most important things you have learnt from + this process that will be of value to future data analysis and + visualisation projects + +** Rubric +*** Introduction =[5 marks]= + +- [0] No introduction +- [1] Brief summary of topic +- [2] Discretionary +- [3] Overview of essay structure / key point to be discussed +- [4] Discretionary +- [5] Clear, concise, professionally written introduction + +*** Commentary / self-evaluation =[40 marks]= +**** Strengths + +- [0] No strengths discussed +- [2] Some attempt to identify strengths, but may lack coherence or + significance +- [4] Discretionary +- [6] Coherent set of strengths identified, providing a sound basis + for critical discussion +- [8] Discretionary +- [10] Important and significant strengths identified demonstrating a + thorough and thoughtful consideration of the research process and + prioritisation of key strengths + +**** Weaknesses + +- [0] No weaknesses discussed +- [2] Some attempt to identify weaknesses, but may lack coherence or + significance +- [4] Discretionary +- [6] Coherent set of weaknesses identified, providing a sound basis + for critical discussion +- [8] Discretionary +- [10] Important and significant weaknesses identified demonstrating a + thorough and thoughtful consideration of the research process and + prioritisation of key weaknesses + +**** Critical insight + +- [0] No real critical reflection or deeper discussion around + strengths and weaknesses +- [2] Basic insight into the impact of strengths and weaknesses on the + overall success of the project +- [4] Discretionary +- [6] Reflective insight with balanced and justified analysis of what + went well and what could have been improved +- [8] Discretionary +- [10] Highly insightful self-evaluation grappling with nuanced or + complex issues in data analysis and visualisation + +**** Argumentation and supporting evidence + +- [0] Unclear, irrelevant or highly subjective argumentation +- [2] Some attempt to present reflective commentary in a logical and + coherent way +- [4] Discretionary +- [6] Logically structured and convincing discussion grounded in best + practices or general data science principles +- [8] Discretionary +- [10] Well referenced and evidenced insight resulting in a highly + informative and convincing discussion + +*** Conclusion =[5 marks]= + +- [0] No introduction +- [1] Brief summary of key strengths/weakness of original process and + analysis +- [2] Discretionary +- [3] Brief summary of learning and new insights +- [4] Discretionary +- [5] Synthesis of critical reflection into useful guidelines or + principles for future work + +* Part 2 – Secondary data analysis project =[40%]= +** Assignment specification + +Conduct a data visualisation-led investigation into a topic of your +choice using secondary data (e.g. data found online), and produce a +2500–3000 report in the form of a Jupyter notebook. + +- You must write your report as a Jupyter notebook using inline + markdown (see template provided on the VLE). +- You must submit a *PDF* of your notebook (“print as PDF†from your + browser), and a separate *ZIP* file containing: + - your notebook (=ipynb=) + - all secondary data (=csv=, =xlsx=, =ods= files etc., make sure you + observe all legal and/or ethical restrictions) + - any supplementary scripts. +- The maximum word limit is 3000 words (suggested range 2500–3000 + words). +- Include any supplementary information not essential to the main body + of the report as appendices. References and appendices do /not/ + count towards the word limit. +- No marks will be directly awarded for material submitted in + appendices. +- No marks will be awarded for analysis discussion submitted as + comments in code cells. +- See provided template notebook for how to count the number of words + in your notebook. + +** Project guidelines +*** Steps + +1. Define your topic and research questions. What are you going to + investigate? +2. Gather data (clean, pre-process, merge datasets etc.) +3. Data visualisation. Focus on exploratory data visualisation + initially, and then progress to explanatory visualisation to + present key findings. The visuals presented in the main body of the + final notebook should be of a professional standard and communicate + your findings effectively and efficiently. Use your research + questions to structure your analysis and the presentation of + results. +4. Conclusion and evaluation + +*** Where to find data? + +Data can come from multiple sources in order to answer your research +questions. Make sure you are aware of any legal and/or ethical issues +with the data you use. + +- Google + - [[http://www.powersearchingwithgoogle.com/][power search]] + - [[https://www.google.com/publicdata/directory][public data]] +- https://www.kaggle.com/ (do not use a dataset that already has + extensive published analysis, especially if Python code is available + – check with your tutor if you are unsure) +- [[https://data.world]] +- https://data.gov.uk +- http://data.london.gov.uk +- https://www.ons.gov.uk +- https://data.europa.eu +- http://blog.visual.ly/data-sources +- http://datajournalismhandbook.org/1.0/en/getting_data_0.html + +** Report structure and mark breakdown +*** Research topic =[10%]= + +- summary of the domain of research/field of enquiry + - some references to contextualise the investigation +- research question(s) + - what do you want to find out? + - can be general as the emphasis here is exploratory data + visualisation +- identify and define important concepts w.r.t research question(s) + - reference academic/technical literature where appropriate + +*** Data =[10%]= + +- data source(s), where/how did you find your data? +- data format +- data cleaning and pre-processing +- critically evaluate your data, is the data reliable? +- how was the data originally gathered, could there be bias? + +*** Exploratory and explanatory data visualisation =[60%]= + +- brief description of the variables of interest +- appropriate graphs and/or tables summarising key variables +- *descriptive statistics* +- visualisations of all basic variable types + - categorical (nominal, ordinal) and quantitative (interval and/or + ratio scales) + - aim for *at least one* of each basic kind of graph: pie, bar, box, + histogram and scatterplot + - marks will be awarded for *appropriateness* and *effectiveness* of + the visualisations +- visualise relationships between variables +- full marks will require examples of more specialised or advanced + types of visualisation, e.g., time-series, geospatial, + high-dimensional, networks, clustering, qualitative data etc. +- briefly justify visualisation choices in terms of *data types* and + aspects of *human perception* + +*** Conclusion and evaluation =[10%]= + +- summarise key findings (these don't have to be ground breaking + discoveries!) +- evaluate your process and visualisations +- what did you learn? +- what could you have improved? + +*** Code =[10%]= + +- all python code files should be submitted +- all pre-processing and data cleaning should be implemented in code + for transparency and reproducibility (do not manually edit data in a + spreadsheet programme or hard-code data values in your notebooks) +- code should be legible, with brief comments +- re-using and adapting code you find in documentation or elsewhere + online is completely fine, but sources must be attributed correctly + (web link and date accessed) +- re-using and adapting code that we have covered in class is + encouraged + +** Rubric +*** Research topic =[10 marks]= +**** Report introduction +- [0] No introduction +- [1] Brief overview of the analysis undertaken +- [2] Clear and concise overview of the report, summarising its + structure and key findings + +**** Background and context +- [0] No background +- [1] Brief discussion of related wider issues +- [2] Clear and concise discussion of related research or news stories + +**** Motivation +- [0] No motivation +- [1] Key motivations briefly discussed +- [2] Clear and concise rational motivating the study and potential + impact + +**** Research questions +- [0] Not explicitly stated +- [1] Attempt to construct research questions, but some inconsistency + or lack of focus +- [2] Basic set of coherent research questions with realistic scope +- [3] Clear and well-designed set of research questions with potential + to generate novel insight within the scope of the study +- [4] Research questions informed by previous research or in + conjunction with a theoretical framework + +*** Data =[10 marks]= +**** Data source +- [0] Not discussed +- [1] States the source of the dataset +- [2] Discussion of how the data was found +- [3] Discussion of why the data was selected: trustworthiness and validity +- [4] Discussion of how the data was initially collected +- [5] Thorough critical assessment of the data, discussing potential + issues or bias + +**** Data format and pre-processing +- [0] Not discussed +- [1] Brief description of data format and available variables +- [2] Brief discussion of data tidiness +- [3] Lists appropriate data types for all variables used +- [4] Discussion of all data pre-processing undertaken, including + cleaning, parsing, handling missing values, transformation and + filtering +- [5] Sophisticated data pre-processing that ensures clean and + accurate data, and providing sound reasoning for all pre-processing + undertaken + +*** Exploratory and explanatory data visualisation =[60 marks]= +**** Tables and summary stats +- [0] No tables or summary stats used +- [1] Basic tables for key variables, (e.g. using pandas describe); + use of simple statistics in prose (99% of cats...) +- [2] Discretionary +- [3] Appropriate use of cross-tabulation and sorting; good use of + language to convey quantitative facts (1 in 4 cats...) +- [4] Discretionary +- [5] Highly effective communicative tables showing more advanced data + processing such as grouping, aggregating, filtering or normalisation + +**** Appropriate plots for each variable data type + +- [0] Many highly inappropriate visualisations, e.g. using line graphs + for non-sequential data, pie charts with many categories, or + pointless use of 3D +- [2] Some inappropriate visualisations for certain variables +- [4] Discretionary +- [6] Appropriate univariate visualisations of all data types + (nominal, ordinal and numerical) +- [8] Appropriate multivariate visualisations across a range of + different data type combinations +- [10] Appropriate use of advanced visualisation techniques + +**** Presentation quality +- [0] Poor quality and lack of attention to detail +- [2] Inconsistent titles and/or axes labelling, screenshot/low + resolution images, unreadable labels or occluding visual elements +- [4] Discretionary +- [6] Consistent titles/figure captions and labels, attention to + spacing and meaningful use of colour +- [8] Discretionary +- [10] Professional level presentation quality, immaculate plots with + no extraneous or obscuring details + +**** Visual communication +- [0] Many meaningless or pointless plots +- [5] Some effective simple plots, but also some confusing or + misleading visualisations +- [10] Discretionary +- [15] Consistent high quality univariate plots, with some effective + multivariate plots +- [20] Good range of univariate and multivariate plots, each able to + effectively communicate a strong message +- [25] Discretionary +- [30] Consistent highly efficient visual communication requiring + little or no explanation; key elements of visual design related to + human visual perception + +**** Exploratory and explanatory data vis process +- [0] Disorganised approach, no clear method to the analysis and no + coherent story presented +- [1] Some attempt to explore the data methodically and to construct a + basic narrative +- [2] Discretionary +- [3] Clear evidence of direction in exploratory analysis and findings + presented logically and related to stated research questions +- [4] Discretionary +- [5] Exploratory analysis is well planned and executed, leading to + interesting insights that are conveyed within a clear and logical + narrative + +*** Conclusion and evaluation =[10 marks]= +**** Conclusion +- [0] No conclusion +- [1] Superficial conclusion listing key findings +- [2] Discretionary +- [3] Discussion of findings in relation to research questions +- [4] Discretionary +- [5] Clear and concise discussion of main findings in relation to + research questions, scope and possible impact + +**** Evaluation +- [0] No evaluation +- [1] Superficial discussion of problems +- [2] Discretionary +- [3] Insightful discussion of problems, solutions and what could have + been improved +- [4] Discretionary +- [5] Insightful and honest reflection on the aims, process and + execution of the study, and pointers to possible future directions + of research + +*** Code =[10 marks]= +**** Code +- [0] Missing code +- [2] Python code for each visualisation +- [4] Discretionary +- [6] Scripts and notebooks are well commented and make idiomatic use + of Python data science libraries (i.e. using the APIs correctly + results in fewer lines of code, which is generally better) +- [8] Discretionary +- [10] Scripts and notebooks are well commented and logically + structured with minimal copy-and-paste code, ensuring that the + process of analysis is transparent and reproducible