Geographic Data Science

Updated: 2020-10-15

Levi John Wolf (levi.john.wolf[at]bristol.ac.uk)

Office hours: 3-6 PM Mondays (calendly.com/ljwolf)1 Or, of course, by request.

Quick Info:

Lectures are 2PM Monday Local Time.

Labs are

All meetings are held in the Digi-Haggett, which is always open for students for co-working.

Assignments in the schedule are due at 5PM Thursday on the week they appear in the schedule. That is:

Workbooks in the schedule are purely formative.

Purpose

Geographic data science is an important emerging set of practices and skills that have become useful in a wide variety of environmental and social sciences. This module will teach students the introduction to critical/core concepts in the arrangement and analysis of data. Beyond linear modelling, this module offers students an “instrumental” knowledge of various high-level methods in data science, but also offers a “deeper” route to understanding the more fundamental concepts and theory behind many of the estimators used in day-to-day data science.

The purpose of this module is twofold. Its immediate aims are to ensure that students are provided a working introduction to common concepts and concerns that practicing geographic data scientists face. It will include some practical programming and data cleaning skills, but is mainly oriented towards statistical analysis. This is not a programming course, but requires some basic programming at the outset to prepare for analysis. Instead, this course is focused on analysis, and successful students will need to be able to conduct a successful analysis from start to finish.

This course is based on a solid understanding of multivariate regression. If you would like to refresh your memory/understanding of linear regression, please consider the review reading listed below in the reading section of this document.

Mark Structure

The course will be structured in five (approximately) two-week blocks. Each block will have

Final marks are awarded for the interim assessments on a “best three out of four” basis. They comprise 60% of your overall mark. This means that you recieve feedback on each interim assessment, but only the best 3 of the 4 are counted.2 Since three are 60%, each interim assessment will each be worth 20% of the overall mark. For each assessment, answer keys will be posted after the due date, and the answers will be walked through in class. The final assessment is worth 40% of the overall mark.3 meaning it’s worth two of the interim assessments.

In addition to the timetabled lectures, there may be pre-recorded videos to help explain or discuss specific components of the reading. All lectures will be delivered live online.

The labs are intended as time for peer teaching and learning, so fostering a sense of community is critical for the module. 72-hour assessment that assess the main topic or purpose of the content in that block.

All assessments are due at 5PM on Thursday.

Materials

Data for assessments will be uploaded to Blackboard, as well as on the schedule at the bottom of this syllabus. The data required for the course is uploaded here, as well as on blackboard.

Reading

Readings are listed in the schedule. Please attempt the reading each week before the timetabled lecture. In some weeks, there may also be a short recorded lecture to clarify the reading. Readings for the module will be drawn primarily from three sources.

Often, ISL and ARM contain very different developments of the same material. Broadly speaking, this arises from the fact that ISL is written from a “machine learning” perspective and “ARM” is written from a “statistical” perspective. I have marked where “alternative” readings can be used to understand or cover the topic from a different perspective. You do not have to read both sources.

A short diagnostic quiz to check your background knowledge is here. You can take it as many times as you like. Your responses are anonymous, and will not be connected to your grade in any way. A short refresher module may be useful for students who would like to consolidate their knowledge of linear regression. The best coverage I know of is Chapter 3.1-3.4 of ISL, skipping the section on K-nearest neighbor regression in Chapter 3.5. If you want this refresher, try to both read the text and run the code at the end of the chapter.

For reference, other good books to review and consolidate your programming and computation knowledge include:

Schedule

Lectures are held synchronously on Zoom at 2PM Mondays local time.

Two lab practicals are held each week. One is Thursday, from 10-12, and another is Friday from 4-6. Answers to the submission on Thursday will always be immediately covered the following Friday lab.

block week_starting topic reading assessment
1 5 October The normal form for data R4DS 12.1-2, Paper workbook
1 12 October A vocabulary for data shaping R4DS Ch. 12.3-4, 5
1 19 October Theory of Statistical Learning ISL 2 Sculpting Data
2 26 October Regression as a Distributional Model ARM 2.1-2, 3.1-3.4 workbook
2 2 November Going Beyond The Normal Model ISL 4.1-3 What Do You Mean??
3 9 November Varying Effects with Multilevel Models ARM 11-12.5 workbook
3 16 November Review and Consolidation Getting on Another Level
4 23 November Data Reduction with PCA ISL 10.1-2 workbook
4 30 November Clustering Features & Geography ISL 10.3 Unsupervised Learning
5 7 December Review and Consolidation
5 14 December No Lecture Take Home Final

NOTE: abbreviations used in the table are covered in the reading section of this document.

For blocks 2 and 3, you can get additional reading by consulting the chapters in the other source. For example, ISL 3.1-4 provides quite a different overview of linear regression than the ARM reading in block 2. And, ARM 5.1-2, 6.1-2 provides a “classical” statistical perspective on generalized linear models, which you may find easier to consult. Try the recommended reading first, and then swap to the other main book’s treatment if you prefer