Skip to main content
MINING MASSIVE DATA SETS

12.09.2023 |

This course begins by introducing modern distributed file systems and MapReduce, with a focus on what distinguishes effective MapReduce algorithms for handling large datasets. The remainder of the course delves into algorithms for extracting valuable models and insights from these vast datasets. Topics include Google’s PageRank algorithm for assessing web page importance and its various extensions, locality-sensitive hashing for identifying similar items in massive datasets, and efficient dimensionality reduction techniques for large, sparse matrices. The course also explores a range of other large-scale algorithms, as detailed in the syllabus.

The course lasts for 7 weeks. Before taking it, a course in database systems is recommended, as is a basic course on algorithms and data structures. 

Course Syllabus

Week 1:
MapReduce
Link Analysis — PageRank

Week 2:
Locality-Sensitive Hashing — Basics + Applications
Distance Measures
Nearest Neighbors
Frequent Itemsets

Week 3:
Data Stream Mining
Analysis of Large Graphs

Week 4:
Recommender Systems
Dimensionality Reduction

Week 5:
Clustering
Computational Advertising

Week 6:
Support-Vector Machines
Decision Trees
MapReduce Algorithms

Week 7:
More About Link Analysis —  Topic-specific PageRank, Link Spam.
More About Locality-Sensitive Hashing

You can find additional information HERE

Details

Target audience

Digital skills for ICT professionals

Digital technology

Big Data

Level

Middle

Format of the training

Online

Training fee

Free training

Duration of the training

Type of training

Language of the training

English

Country providing the training

Other

Classification

Single opportunity

Leave a Reply