Data Science Skills

Square

As archaeologists deal with an increasing number of datasets (both new and reused) and want to analyse larger quantities of data, data science can provide some of the necessary skills and tools. There are three main aspects: programming languages (Python and R being the most commonly used), machine learning and statistics. The resources provided in this section are mainly to give researchers the basics with useful resources that will enable further development of these skills, since data science covers a wide range of data-related activities. The list starts with some training websites and directories containing multiple courses that can be perused followed by some individual recommended starter level resources.

article

Datacamp

Datacamp provides online learning courses and is available as a website and mobile app. It covers the most common programming skills such as Python, SQL and R as well as using scripting and spreadsheets, and other technologies. Users can search for courses by topic and there are also case studies available.

Source: Datacamp website & mobile app. (Apple & Android)

LEVEL: All levels

article

Towards Data Science

Website resource on all aspects of Data Science with articles on specific topics, described as an eco system for end users.

Source: Towards Data Science

LEVEL: All levels

article

The Programming Historian

Excellent website with several courses on commonly used programming languages, techniques and tools for analysing Humanities data.

Source: The Programming Historian

LEVEL: All levels

article

SSHOC Training Toolkit

Various (mainly 3rd party) courses and training sources for Social Scientists and Digital Humanists which include some programming courses.

Source: SSHOC

LEVEL: All levels


Programming


article

Python

Python is an easy to learn, powerful programming language favoured by Data Scientists. which is easily installed. The documentation enables everyone to learn and use the basics through to more complex aspects, all for free.

Source: Python.org

LEVEL: Basic-Intermediate

presentation

Introduction to Python programming

Free course from Udemy in bitesize chunks – given by youthful Avinash Jain who makes each step as easy as possible using the PyCharm tool.

Source: Udemy

LEVEL: Basic

article

Excellent introduction to R with recommended resources by Oleksii Kharkovyna who provides a step-by-step to the background, installing the necessary software and some courses for learning the basic syntax.

The Ultimate R Guide For Data Science

Source: Towards Data Science

LEVEL: Basic

article

R for Data Science

This book by Garrett Grolemund and Hadley Wickham “is to help you learn the most important tools in R that will allow you to do data science.” I.e. how to get your data into R, get it into the most useful structure, transform it, visualise it and model it.

Source: R for Data Science

LEVEL: Basic

article

Introduction to Data Science. Data Analysis and Prediction Algorithms with R

Another good introduction to using R covering programming, visualisation and statistics which started out as the HarvardX Data Science Course notes. Different aspects are explored through the use of case studies and data wrangling, machine learning and useful tools are also covered.

Source: GitHub

LEVEL: Basic


Statistics


PDF

The Elements of Statistical Learning

This book by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (Springer) covers the statistical methods that are used for activities such as data mining and which help researchers to interpret their results.

Source: Stanford University

LEVEL: Basic-Intermediate

article

Statistical Learning

Introduces some of the main tools used in statistical modelling and data science, covering both traditional as well as new methods, and how to use them in R.

Source: edX

LEVEL: Basic


Machine Learning


article

Data Science 101 – Machine Learning Tutorials

Beginner guide for anyone who wants to study data science and make their own machine learning models.

Source: App

LEVEL: Basic-Intermediate

article

Machine Learning Levels that a 5yr old can understand

Article providing a 101 level overview of Machine Learning Models with diagrams.

Source: TNW Website

LEVEL: Basic


Useful Tools


article

18 Essential Software Every Data Scientist Should Know About

This article summarises a collection of data science tools that cover SQL and similar database applications, visualisation, data scraping, programming languages and Integrated Development Environments (IDEs).

Source: Geekflare

LEVEL: Basic-Intermediate

article

The 17 Best Free Tools for Data Science

Article more focussed on programming, this covers languages (R, Python and SQL), software packages and libraries plus some tools and also some free learning resources.

Source: Data Quest

LEVEL: Basic

article

Top Tools for Data Scientists: Analytics Tools, Data Visualization Tools, Database Tools and More

Comprehensive overview of 50 tools and packages available mainly for free (plus some paid for).

Source: NG Data

LEVEL: Basic

article

Orange

Orange is a tool that makes data science fun and interactive. Orange allows users to analyse and visualise data without the need to code. It also offers machine learning options for beginners.

Source: Orange

LEVEL: Basic-Intermediate

article

Jupyter Notebook/ JupyterLab

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualisations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modelling, data visualisation, machine learning, and much more. JupyterLab is a web-based interface version.

Source: Jupyter

LEVEL: Basic-Intermediate