The fastest way to get help with homework assignments is to post your questions on Piazza. That way, only our TAs and instructor can help, your peers can too.
If you prefer that your question addresses to only our TAs and the instructor, you can use the private post feature (i.e., check the "Individual Students(s) / Instructors(s)" radio box).
While we welcome everyone to share their experiences in tackling issues and helping each other out, but please do not post your answers, as that may affect the learning experience of your fellow classmates.
For special cases such as failed submissions due to system errors, missing grades, failed file uploads, emergencies that prevent you from submitting, personal issues, you can contact the staff using a private Piazza post.
TAs will hold office hours starting week 2, except on Georgia Tech holidays (e.g., thanksgiving, MLK day, Recess break). Each office hour session will be run by at least one TA, and is 1 hour long. See GT’s academic calendar for the full list of holidays (https://registrar.gatech.edu/calendar). We will spread the office hours across weekdays.
Please note that you are always welcome to ask questions on Piazza. Office hours supplement Piazza, and do not replace it.
Mahdi Roozbahani | Tue, 5:45PM-6:45PM | After the class (my office access in Coda is very hard) | |
Vatsal Srivastava | Thursday 1:00pm to 2:00pm / Location: in Klaus building lobby at the first floor (next to room 1325) | ||
Sharmila Baskaran | Tuesday 11:30am to 12:30pm / Location: in Klaus building lobby at the first floor (next to room 1325) |
Wk | Dates | Topics | Tue | Thu | Homework (HW) | Project | |
---|---|---|---|---|---|---|---|
1 | Jan | 7,9 |
* Course Introduction * Analytics Building Blocks * Data Science Buzzwords * Data Collection |
intro | building blocks, buzzwords, data collection | ||
2 | 14,16 |
* SQLite * Data Cleaning * Code Back-up & Version Control * Class Project Overview |
SQLite, git | cleaning, project overview |
HW1 out Fri, Jan 17 |
||
3 | 21,23 |
Example projects:
(1) Compare Cuisine (2) TRENDER: Interactive Visualization Exploring News Trend of the World Class Project Overview Data Integration |
Example project presentations, project overview | project overview data integration, vis 101 | |||
4 | 28,30 |
* Data Integration * Visualization 101 * Data Visualization for Web (D3) |
D3 | cont'd |
HW1 due |
||
5 | Feb | 4,6 |
* Data Visualization for Web (D3) * Data Analytics, Concepts and Tasks * Introduction to Clustering: k-means, hierarchical clustering, DBSCAN, vis |
analytics tasks, clustering |
Form project teams by Fri, Feb 7 |
||
6 | 11,13 |
* Fixing Common Visualization Issues (* publication-quality figures) * Scalable Computing: Hadoop * Scalable Computing: Pig * Scalable Computing: Hive |
fix vis,hadoop, pig | hive, spark | |||
7 | 18,20 |
* Scalable Computing: Spark * Scalable Computing: HBase * Classification: concepts, cross-validation, k-NN * Overview of project proposal and presentation |
hbase | classification |
HW2 due Fri, Feb 21 HW3 out Fri, Feb 21 |
||
8 | 25,27 | * Project proposal presentation * Advice for Getting Models Work | Show time! | Show time! |
Proposal document due Mon, Feb 24 Proposal presentation slides due Mon, Feb 24 |
||
9 | March | 3,5 |
AWS tech day; * Decision trees |
AWS Tech DAY | Decision trees; | ||
10 | 10,12 |
* Decision trees * Ensemble Method Guest Lecture: Dr. Tianlong Xu from HomeDepot (Cancelled because of COVID-19 |
Ensemble Method: bagging, random forests |
Guest Lecture from HomeDepot Data Analysis department |
HW3 due Fri, March 13 HW4 out Fri, March 13 |
||
11 | March | 17,19 | No Class - Spring Break | X | X | ||
12 | March | 24,26 | Transition to online class | Class Video Lecture | Class Video Lecture | ||
13 | March/April | 31,2 |
* Visualization for Classification: ROC, AUC, confusion matrix * Graph analytics: basics & power laws * Graph analytics: centrality, personalized PageRank, and interactive applications |
Evaluating Machine Learning Methods, Graph basics, laws, centrality, pagerank, mmap Class Video Lecture |
Graph basics, laws, centrality, pagerank, mmap
Hbase Class Video Lecture |
Progress Report due Fri, April 3 |
|
14 | April | 7,9 |
* Graph analytics: scaling up with virtual memory HBASE Publication-quality figures * Text Analytics: concepts, algorithms (LSI=SVD) |
Graph Analytics MMAP, HBASE Class Video Lecture |
text algorithms Class Video Lecture |
HW4 due Fri, April 10 |
|
15 | April | 14,16 |
* Text Analytics: concepts, algorithms (LSI=SVD) * Time series: basics and linear forecasting |
text algorithms Class Video Lecture |
Time series: basics and linear forecasting Class Video Lecture |
||
15 | 21,23 |
* Time series: non-linear forecasting, visualization * [Potentially] Review/lessons learned |
Non-linear forecasting | No class - Posting your optional project video presentation |
Final report due |
The amounts of time students spend on this class greatly vary, based on their backgrounds, and what they may already know. Some former students told us they spent about 40-60 hours on each homework assignment (we have 4 big assignments, and no exams), and some reported much less. For example, for the homework assignment about D3 visualization programming, students who are completely new to javascript, css, and html likely will spend significantly more time than their peers who have already tried them before. Some former students who do not have a computer science background found the homework assignments challenging, would take significant time and effort, but were rewarding, fun, and "do-able."
Students have at least 2 weeks to complete each homework assignment. Some students waited until the last week, and could not finish. It is critical to plan ahead and prepare for the significant time needed.
Some programming assignments involve high-level languages or scripting (e.g., Python, Java, SQL etc.). Some assignments involve web programming and D3 (e.g., Javascript, CSS, HTML). For example, an assignment on Hadoop and Spark may require you to learn some basic Java and Scala quickly, which should not be too challenging if you already know another high-level language like Python or C++. It is unlikely that you all know tools/skills needed in the programing tasks, so you are expected to learn many of them on the fly.
Basic linear algebra, probability and statistics knowledge is also expected.
All content and course materials can be accessed online. There is no textbook for this course.
All Georgia Tech students have FREE access to https://www.safaribooksonline.com, where you can find a huge number of highly rated and classic books (e.g., the "animal" books) from O'Reilly and Pearson covering a wide variety of computer science topics, including some of those listed below. Just log in with your official GT email address, e.g., jdoe3@gatech.edu.
The Office of Disability Services offers accommodations for students with disabilities. Please contact the office should you need help.
None, but you should have taken courses similar to those listed in the next section, at Georgia Tech or at another school.
If you are an Analytics (OMS or campus) degree student, you should first take CSE 6040 and do very well in it; if necessary, please also first take CS 1301.
We thank Intel's support in curriculum development for the memory mapping module (scaling up algorithms with virtual memory).
We thank Amazon Educate for providing free cloud credit for Amazon Web Services. We are excited to be am AWS partner university and part of AWS Educate's private beta.
We thank Microsoft Azure's special grant for providing free cloud credit.
We thank Tableau for Teaching program's data visualization software.
Many thanks to my colleagues for sharing their course materials: