Wednesday, February 8, 2017

Recommendation for a Good DLI Dataset that Can be used in a Computer Science Data Mining Course

Question
I have a Computer Science professor who is looking for DLI dataset that can be used by his students in their data warehousing/data mining courses. He is looking for a large dataset that comes with all the fixings (data dictionaries, documentation, analytics, etc.). The students are at a third-year and fourth-year programmer level. The purpose of the courses is to teach data ware housing and data mining techniques and skills.

Anyone have a favorite DLI dataset they can recommend?

Answer
My recommendation would be either the one of the General Social Surveys (GSS) or one of the Cdn Community Health Surveys (CCHS). They have all the associated documentation with them and are relatively large datasets.