As archaeologists deal with an increasing number of datasets (both new and reused) and want to analyse larger quantities of data, data science can provide some of the necessary skills and tools. There are three main aspects: programming languages (Python and R being the most commonly used), machine learning and statistics. The resources provided in this section are mainly to give researchers the basics with useful resources that will enable further development of these skills, since data science covers a wide range of data-related activities. The list starts with some training websites and directories containing multiple courses that can be perused followed by some individual recommended starter level resources.
Datacamp
Datacamp provides online learning courses and is available as a website and mobile app. It covers the most common programming skills such as Python, SQL and R as well as using scripting and spreadsheets, and other technologies. Users can search for courses by topic and there are also case studies available.
Source: Datacamp website & mobile app. (Apple & Android)
LEVEL: All levels
Towards Data Science
Website resource on all aspects of Data Science with articles on specific topics, described as an eco system for end users.
Source: Towards Data Science
LEVEL: All levels
The Programming Historian
Excellent website with several courses on commonly used programming languages, techniques and tools for analysing Humanities data.
Source: The Programming Historian
LEVEL: All levels
SSHOC Training Toolkit
Various (mainly 3rd party) courses and training sources for Social Scientists and Digital Humanists which include some programming courses.
Source: SSHOC
LEVEL: All levels
Programming
Python
Python is an easy to learn, powerful programming language favoured by Data Scientists. which is easily installed. The documentation enables everyone to learn and use the basics through to more complex aspects, all for free.
Source: Python.org
LEVEL: Basic-Intermediate
Introduction to Python programming
Free course from Udemy in bitesize chunks – given by youthful Avinash Jain who makes each step as easy as possible using the PyCharm tool.
Source: Udemy
LEVEL: Basic
Excellent introduction to R with recommended resources by Oleksii Kharkovyna who provides a step-by-step to the background, installing the necessary software and some courses for learning the basic syntax.
The Ultimate R Guide For Data Science
Source: Towards Data Science
LEVEL: Basic
R for Data Science
This book by Garrett Grolemund and Hadley Wickham “is to help you learn the most important tools in R that will allow you to do data science.” I.e. how to get your data into R, get it into the most useful structure, transform it, visualise it and model it.
Source: R for Data Science
LEVEL: Basic
Introduction to Data Science. Data Analysis and Prediction Algorithms with R
Another good introduction to using R covering programming, visualisation and statistics which started out as the HarvardX Data Science Course notes. Different aspects are explored through the use of case studies and data wrangling, machine learning and useful tools are also covered.
Source: GitHub
LEVEL: Basic
Statistics
The Elements of Statistical Learning
This book by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (Springer) covers the statistical methods that are used for activities such as data mining and which help researchers to interpret their results.
Source: Stanford University
LEVEL: Basic-Intermediate
Statistical Learning
Introduces some of the main tools used in statistical modelling and data science, covering both traditional as well as new methods, and how to use them in R.
Source: edX
LEVEL: Basic
Machine Learning
Data Science 101 – Machine Learning Tutorials
Beginner guide for anyone who wants to study data science and make their own machine learning models.
Source: App
LEVEL: Basic-Intermediate
Machine Learning Levels that a 5yr old can understand
Article providing a 101 level overview of Machine Learning Models with diagrams.
Source: TNW Website
LEVEL: Basic
Useful Tools
18 Essential Software Every Data Scientist Should Know About
This article summarises a collection of data science tools that cover SQL and similar database applications, visualisation, data scraping, programming languages and Integrated Development Environments (IDEs).
Source: Geekflare
LEVEL: Basic-Intermediate
The 17 Best Free Tools for Data Science
Article more focussed on programming, this covers languages (R, Python and SQL), software packages and libraries plus some tools and also some free learning resources.
Source: Data Quest
LEVEL: Basic
Top Tools for Data Scientists: Analytics Tools, Data Visualization Tools, Database Tools and More
Comprehensive overview of 50 tools and packages available mainly for free (plus some paid for).
Source: NG Data
LEVEL: Basic
Orange
Orange is a tool that makes data science fun and interactive. Orange allows users to analyse and visualise data without the need to code. It also offers machine learning options for beginners.
Source: Orange
LEVEL: Basic-Intermediate
Jupyter Notebook/ JupyterLab
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualisations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modelling, data visualisation, machine learning, and much more. JupyterLab is a web-based interface version.
Source: Jupyter
LEVEL: Basic-Intermediate