MINING MASSIVE DATA SETS
12.09.2023 |
This course begins by introducing modern distributed file systems and MapReduce, with a focus on what distinguishes effective MapReduce algorithms for handling large datasets. The remainder of the course delves into algorithms for extracting valuable models and insights from these vast datasets. Topics include Google’s PageRank algorithm for assessing web page importance and its various extensions, locality-sensitive hashing for identifying similar items in massive datasets, and efficient dimensionality reduction techniques for large, sparse matrices. The course also explores a range of other large-scale algorithms, as detailed in the syllabus.
The course lasts for 7 weeks. Before taking it, a course in database systems is recommended, as is a basic course on algorithms and data structures.
Course Syllabus
Week 1:
MapReduce
Link Analysis — PageRank
Week 2:
Locality-Sensitive Hashing — Basics + Applications
Distance Measures
Nearest Neighbors
Frequent Itemsets
Week 3:
Data Stream Mining
Analysis of Large Graphs
Week 4:
Recommender Systems
Dimensionality Reduction
Week 5:
Clustering
Computational Advertising
Week 6:
Support-Vector Machines
Decision Trees
MapReduce Algorithms
Week 7:
More About Link Analysis — Topic-specific PageRank, Link Spam.
More About Locality-Sensitive Hashing
You can find additional information HERE
Details
Website
Target audience
Digital skills for ICT professionals
Digital technology
Big Data
Level
Middle
Format of the training
Online
Training fee
Free training
Duration of the training
Type of training
Language of the training
English
Country providing the training
Other
Classification
Single opportunity