Class Notes

Notes: 1. please visit this page frequently, as it will be updated constantly during the term
2. Items under Required Material are considered mandatory reading and will be tested in the exams.
3. Items under Additional Material should be useful in helping you understand the Required Material.
4. Items in the third column (Online Resources and Reference Books) are provided for reference and in order to help you explore the different topics further.
5. Links to O'Reilly's Safari Online Bookshelf are only available within pitt.edu or from outside pitt.edu using the University's VPN service (https://sremote.pitt.edu). Please remember to sign out once you finished reading -- the University has a very limited number of concurrent user licenses.

Shortcuts:

15: RDF / SPARQL - MATERIAL FOR FINAL ENDS HERE (including this)
Required Reading
(19) Introduction to RDF / SPARQL
Additional Material
Online Resources
SPARQL Examples from Learning SPARQL, 1st Edition

DBpedia

ARQ - a SPARQL processor

twinkle - simple SPARQL GUI

topbraid composer - free sparql tool

Reference Books
Learning SPARQL, 2nd Edition, O'Reilly, 2013

14: Graph Databases
Required Reading
(19) Graph Databases
Additional Material
(19) Graph Databases Handout

(20) Cypher Handout

(23) Graph Databases Handout

Online Resources
Get started with Neo4j

Neo4j online console

Getting started with Neo4j and Cypher
Introduction to Cypher
Cypher Reference Card

Sample Graph Datasets and Queries
Movie Database Example Dataset

The Neo4j Developer Manual v3.1 (includes complete reference to Cypher language)

Reference Books

13: Good/Bad SQL
Required Reading
(20) Class Quiz
Additional Material
(20) Whiteboard photo #1

(20) Whiteboard photo #2

Online Resources
Reference Books

12: SQL
Required Reading
(14) SQL 1
(15) SQL 2
Additional Material
(14) SQL Handout

(16) SQL with Python: ipython notebook, py,

(16) SQL with Python - Take 2: ipython notebook, py,
Related files: courses.csv, grades.csv, majors.csv, students.csv

(18) Queries with real data: real-estate-queries (using real-estate.db)

Online Resources
sqlite3
sqlite quick guide
Python sqlite API reference
sqlite in Python tutorial

Western Pennsylvania Regional Data Center
Property Assessment Data
Property Assessment Data Dictionary

Reference Books

11: Classification
Required Reading
(13) Classification
Additional Material
(13) Classification Handout
Online Resources
Reference Books

10: Network Analysis -- MATERIAL FOR FINAL STARTS HERE
Required Reading
(12) Network Analysis
Additional Material
(12) Network Analysis Handout
Online Resources
Reference Books

09: Recommender Systems (Feb 9, 16)
Required Reading
(10) Recommender Systems
(11) Recommender Systems - II
Additional Material
(10) Recommender Systems Class Handout
(11) Recommender Systems - II Class Handout
Online Resources
A Programmer's Guide to Data Mining (Chapter 2, 3)
Reference Books

08: Recommender Systems (Feb 9, 16)
Required Reading
(10) Recommender Systems
(11) Recommender Systems - II
Additional Material
(10) Recommender Systems Class Handout
(11) Recommender Systems - II Class Handout
Online Resources
A Programmer's Guide to Data Mining (Chapter 2, 3)
Reference Books

07: Data Summarization and Visualization (Feb 2, 7)
Required Reading
(09) Data Summarization and Visualization
Additional Material
(09) Data Summarization and Visualization Handout
Online Resources
Reference Books
Data Mining Concepts and Techniques (3rd Edition), 2012 (Chapter 4)

06: Data Warehousing (Jan 31)
Required Reading
(08) Data Warehousing
Additional Material
(08) Data Warehousing Handout
Online Resources
Reference Books

05: Data Mining (Jan 24/26, 2017)
Required Reading
(06) Association Rule Mining
(07) Clustering
Additional Material
(06) Association Rule Mining Class Handout
(07) Data Clustering Class Handout
Online Resources
How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did, Forbes, Feb 16, 2012

The parable of the beer and diapers, The Register, August 15, 2006

Reference Books
Data Mining Concepts and Techniques (3rd Edition), 2012

Mining of Massive Datasets (Sec 6.1, 6.2, 7.1.1, 7.1.2, 7.2.1, 7.3.1, 7.3.2)

04: Web Information Retrieval (Jan 19, 2017)
Required Reading
(05) Web Information Retrieval
Additional Material
Online Resources
The Google Pagerank Algorithm and How It Works (click Cancel on login prompt)

PageRank Calculator

PageRank explained

Reference Books

03: Information Retrieval (Jan 12/17, 2017)
Required Reading
(03) Information Retrieval
(04) Information Retrieval II
Additional Material
(03) Information Retrieval Class Handout
(04) Information Retrieval II Class Handout and Solutions
Online Resources
Online Log Base 2 Calculator
Reference Books

02: Intro to Python (Jan 10, 2017)
Required Reading
(02) Python Code Examples: Jupyter Notebook, pdf, txt
Additional Material
Online Resources
Python for Beginners
How to think like a computer scientist
The Hitchhiker’s Guide To Python
Google Python Course
CodeAcademy's Python Course
Reference Books
Think Python, 2nd Ed
Learning Python, 5th Ed
Introducing Python
Head First Python

01: Introduction to Data Science (Jan 5, 2017)
Required Reading
(01) Introduction to Data Science
Additional Material
Online Resources
Big Data and Its Technical Challenges in Communications of the ACM (July 2014)

10min introduction to github
Github intro for students

Reference Books