Class Notes

Notes: 1. please visit this page frequently, as it will be updated constantly during the term
2. Items under Required Material are considered mandatory reading and will be tested in the exams.
3. Items under Additional Material should be useful in helping you understand the Required Material.
4. Items in the third column (Online Resources and Reference Books) are provided for reference and in order to help you explore the different topics further.
5. Links to O'Reilly's Safari Online Bookshelf are only available within pitt.edu or from outside pitt.edu using the University's VPN service (https://sremote.pitt.edu). Please remember to sign out once you finished reading -- the University has a very limited number of concurrent user licenses.

Shortcuts:

15: RDF/SPARQL (Dec 5, 7)
Required Reading
(24) Intro to RDF / SPARQL

(24) SPARQL

(25) SPARQL II

Additional Material
Online Resources
SPARQL Examples from Learning SPARQL, 1st Edition

DBpedia

ARQ - a SPARQL processor

twinkle - simple SPARQL GUI

topbraid composer - free sparql tool

Reference Books
Learning SPARQL, 2nd Edition, O'Reilly, 2013

14: GraphDB / Cypher (Nov 21, 30)
Required Reading
(22) Intro to Graph Databases
(22) Sample Cypher Queries
(22) Handout (Solutions)

(23) Handout (Solutions)

Additional Material
Online Resources
Neo4j online console

Getting started with Neo4j and Cypher
Introduction to Cypher
Cypher Reference Card

Sample Graph Datasets and Queries
Movie Database Example Dataset

The Neo4j Developer Manual v3.0 (includes complete reference to Cypher language)

Reference Books

13: XML/XPath (Nov 9, 16)
Required Reading
(19) XML/XPath
(21) XQuery
Additional Material
Online Resources
XPath tutorial

XPath Tester

XQuery tutorial

OxygenXML Editor

Reference Books
XQuery, by Priscilla Walmsley, O'Reilly, 2007

12: SQL over Real Data (Nov 7, 14)
Required Reading
Additional Material
(18) Overall Instructions
(18) real-estate.schema.sql
(18) real-estate.load.py
(18) real-estate.queries.sql

(20) 20.schema.census-transport.sql
(20) 20.schema.pgh-police-blotter-archive.sql
(20) 20.schema.pgh-police-blotter.sql

(20) 20.load.py
(20) 20.queries.txt
(20) 20.queries.inclass.txt
(20) 20.queries.census-data.sql
(20) 20.queries.pgh-police-blotter.sql

Online Resources
(18) 2016-OCT Property Assessments Parcel Data

(20) data.census-transport.csv
(20) data.pgh-police-blotter.csv
(20) data.pgh-police-blotter-archive.csv

sqlite3
sqlite quick guide
Python sqlite API reference
sqlite in Python tutorial

Reference Books

11: SQL (Oct 24, 26, 31, and Nov 2)
Required Reading
(14) Intro to SQL
(15) SQL II
(16) SQL III
(17) SQL IV
Additional Material
(14) Intro to SQL -- Handout
Online Resources
Interactive SQL Tutorial

SQL Tutorial

Reference Books

10: Classification (Oct 12)
Required Reading
(13) Classification
Additional Material
(13) Classification Handout

(13) Classification Jupyter Notebook in pdf, python, and ipynb format. Or you can view it with the jupyter nbviewer here

Online Resources
A friendly introduction to linear regression (using Python)
Reference Books
Data Mining Concepts and Techniques (3rd Edition), 2012

09: Network Analysis (Oct 10)
Required Reading
(12) Network Analysis
Additional Material
(12) Network Analysis Handout
Online Resources
Reference Books

08: Data Summarization and Visualization (Oct 5)
Required Reading
(11) Data Summarization and Visualization
Additional Material
(11) Data Summarization and Visualization Handout
Online Resources
Reference Books
Data Mining Concepts and Techniques (3rd Edition), 2012 (Chapter 4)

07: Data Warehousing (Oct 3)
Required Reading
(10) Data Warehousing
Additional Material
(10) Data Warehousing Handout
Online Resources
Reference Books

06: Recommender Systems (Sep 26, 28)
Required Reading
(08) Recommender Systems
(09) Recommender Systems - II
Additional Material
(08) Recommender Systems Class Handout
(09) Recommender Systems - II Class Handout
Online Resources
A Programmer's Guide to Data Mining (Chapter 2, 3)
Reference Books

05: Data Mining (Sep 19, 21)
Required Reading
(06) Data Mining / Association Rule Mining

(07) Data Mining Intro / Clustering Methods

Additional Material
(06) Association Rule Mining Handout
(06) Graph from in-class activity and corresponding PageRank computation (Jupyter Notebook / Python)

(07) Data Clustering Handout

Online Resources
How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did, Forbes, Feb 16, 2012

The parable of the beer and diapers, The Register, August 15, 2006

Reference Books
Data Mining Concepts and Techniques (3rd Edition), 2012

Mining of Massive Datasets (Sec 6.1, 6.2, 7.1.1, 7.1.2, 7.2.1, 7.3.1, 7.3.2)

04: Web Information Retrieval (Sep 14)
Required Reading
(05) Web Information Retrieval
Additional Material
Online Resources
The Google Pagerank Algorithm and How It Works (click Cancel on login prompt)

PageRank Calculator

PageRank explained

Reference Books

03: Intro to Python (Sep 7)
Required Reading
(03) Python Code Examples
Additional Material
(03) Python Code Examples (cont)
(05) PANDAS Code Examples


Online Resources
Python for Beginners
How to think like a computer scientist
The Hitchhiker’s Guide To Python
Google Python Course
CodeAcademy's Python Course
Reference Books
Think Python, 2nd Ed
Learning Python, 5th Ed
Introducing Python
Head First Python

02: Information Retrieval (Aug 31, Sep 12)
Required Reading
(02) Information Retrieval
(04) Information Retrieval II
Additional Material
(02) Information Retrieval Class Handout
(04) Information Retrieval II Class Handout and Solutions
Online Resources
Online Log Base 2 Calculator
Reference Books
Modern Information Retrieval (2nd Edition), 2011

01: Introduction to Data Science (Aug 29)
Required Reading
(01) Introduction to Data Science
Additional Material
Online Resources
Big Data and Its Technical Challenges in Communications of the ACM (July 2014)

10min introduction to github
Github intro for students

Reference Books