Free Preview Lectures

Click on a lecture name to watch a free preview. If you like what you see, scroll down further on this page to read the full course curriculum.

Earn Your Data Mining Certificate

If you complete this class, you'll be issued a digital completion certificate. Our certificates are shareable, unique, blockchain verified and independently verifiable.

Prerequisites & Suitability

Check the requirements below before considering this course.

  • First Timers

    Sorry, this course is not appropriate for newbies. Please take our introductory Python course (Python Is Easy) first, and then come back to this class after you have some experience.

  • Junior Engineers

    You'll get the most out of this course if you're already comfortable with Python. The version of Python (2 or 3) doesn't matter, as the instructor will use both, and the tooling shown allows either.

  • Senior Engineers

    You should be fine in this course as long as you don't already have experience with Data Mining. If you're a Data Scientist, this course may be review for you. Check the syllabus below to make sure we're covering topics that interest you.

Course Curriculum

38 Lectures, 7 Homeworks, 3 Large Projects

  • 1
    Course Overview
    • Introduction
  • 2
    Data Wrangling
    • Section Overview
    • Cleaning Data - Part A
    • Cleaning Data - Part B
    • Cleaning Data - Part C
    • What are Statistics? - Part A
    • What are Statistics? - Part B
    • Practical Examples of Data Mining
    • Sample Datasets
    • Section Review
    • Homework #1: Setup Your Workstation
  • 3
    Data Mining Fundamentals
    • Cluster Analysis - Part A
    • Cluster Analysis - Part B
    • Classification and Regression - Part A
    • Classification and Regression - Part B
    • Support Vector Machine - Part A
    • Support Vector Machine - Part B
    • Association, Correlation and Covariance - Part A
    • Association, Correlation and Covariance - Part B
    • Dimensionality Reduction
    • Homework #2: Correlation
  • 4
    Frameworks Explained
    • Apache Spark - Part A
    • Apache Spark - Part B
    • Apache Spark - Part C
    • Homework #3: Apache Spark
    • Map vs FlatMap - Part A
    • Map vs FlatMap - Part B
    • Spark-ML
    • Transformers, Estimators, and Pipelines
    • Homework #4: Map and Flatmap
    • Project #1: Spark-ML
  • 5
    Mining and Storing Data
    • Text Mining - Part A
    • Text Mining - Part B
    • Network Mining
    • Python Matrix Libraries
    • Homework #5: Matrices
    • Mining a SQL Database - Part A
    • Mining a SQL Database - Part B
    • Homework #6: SQL Databases
    • Project #2: Thinking About Fake News
  • 6
    Natural Language Processing
    • Key Concepts & Text Cleaning
    • Count Vectorizer, TFIDF
    • Examples with Spam Data
    • Tweaking the Spam Data Model
    • Pipelining with Spam Data
    • Summary Challenge
    • Homework #7: StackOverflow Dataset
    • Project #3 (Final Exam): Fake News Detection
  • 7
    Completion Certificate
    • How to Get Your Certificate