About Us

We are Data Science at Georgia Tech, Georgia Tech's largest community of student data scientists.

Our vision is to foster passion in data science and create a positive impact on the community. This passion inspired Hacklytics, our data-science themed hackathon.

We encourage both experienced students and beginners to come together and build something new and impactful, while having fun!


Please see the MLH code of conduct.


Hacklytics is Georgia Tech’s 24 hour datathon brought to you by Data Science at Georgia Tech! Our theme for this year is "Change the World" with a focus on sustainability and growth. We hope to see fun, impactful and insightful projects that seek to bring change to society.

A datathon is a specific type of hackathon that focuses on data science. While this may make them seem constricted, we choose to approach them with the same idea as a normal hackathon, but applied to data science: whatever data you want, whatever language you want, whichever APIs you want, whatever ML/DL algorithm to analyze it you want. You can create visualizations, make a model, come up with insights and anything you find impactful!

Hacklytics will be held from February 22nd-23rd 2020 in the Klaus Advanced Computing Building, (266 Ferst Dr NW, Atlanta, GA 30332).

You can fill out the form using the following link: tiny.cc/hacklytics-apply.
Applications close February 1st!

We will evaluate your responses to our questions and interest in participating and making the most of the datathon. (Don't worry - we won't be judging your writing skills!)

We will have mentors and volunteer forms up shortly. Keep an eye out for those!

Teams of up to 4 people are permitted. We will have a channel in Slack for other people who need to find a team and we will allow for team-building time before the hacking begins.

Nothing at all! Hacklytics is free of cost for you to attend and we will be providing food and hacking space.

Minors and non-students are not allowed to attend Hacklytics.

We have events and workshops as well as networking sessions planned out for you throughout the day, so you have something to do while that model finishes training!

Not at all! We welcome students with varying skill levels, and we have workshops and sessions planned for you to learn something. So whether you are a beginner or an experienced data scientist, we will have something for you!

Any hacking gear you need (laptop, hardware, chargers, batteries, etc.), comfortable clothes, toiletries (toothpaste, toothbrush, deodorant, etc.), a photo ID for registration, a government ID to rent hardware, and most importantly, yourself!

Unfortunately this year we will not be able to provide travel reimbursements to students.

Contact us at hacklyticsgt2020@gmail.com!

All hackers must adhere to the MLH code of conduct.

Data Sets

GDP per capita (current US$)

Topic: General Economics
Website: https://data.worldbank.org/indicator/ny.gdp.pcap.cd
Description: Contains 9 different datasets that can be downloaded as CSV, Excel, or XML files. Most of the data sets are different measures of GDP per capita for a lot of countries (at least 200). It also contains a data set on inflation and another on oil rents.

Harvard Opportunity Insights

Topic: Socioeconomic factors
Website: https://opportunityinsights.org/data/
Description: This site probably contains any information you might want about socioeconomic equality. There are around 50 datasets, A lot of them have to do with economic mobility and how it relates to race, gender, income, and other factors. In addition to economic mobility, datasets also cover geographical mobility and even the role colleges play in economic success. All datasets can be downloaded as Excel spreadsheets and there are READMEs associated with each one that talk about each field in the dataset.

AHAR Reports

Topic: Homelessness
Website: https://www.hudexchange.info/homelessness-assistance/ahar/#2019-reports
Description: This site has around 30 datasets if you click around to find them. All these datasets are about homelessness. There are a few on here I think are really useful. The 2018 CoC Point in Time estimates contains homelessness counts for every county in the United States from 2007 to 2018. The 2019 section also contains data on veteran homelessness including change over the years. All datasets can be downloaded as Excel spreadsheets

Environmental Data Explorer

Topic: Environment
Website: http://geodata.grid.unep.ch/
Description: This is a database of environmental datasets. You can download all these datasets as either CSVs or Excel spreadsheets. The database is also searchable making it easy to hone in on certain topics you might be interested in. Unfortunately, it seems like the newest data on there is still a few years old, so it might not be the best dataset for looking at new trends, but for looking at long term trends from the past 50 years or so, this would be a good source.

Google Dataset Search

Topic: Everything
Website: https://toolbox.google.com/datasetsearch
Description: Description: Use Google dataset search to find a dataset on anything you want.

Federal Reserve Economic Data

Topic: General Economics
Website: https://fred.stlouisfed.org/
Description: 627,000 searchable datasets about information from unemployment to inflation to population statistics. You can not only download these datasets as csv files or excel files, but also download graphs created by the data as an image, PowerPoint, or PDF. This is probably one of the best sources you can get for American economic data.

Humanitation Data Exchange

Topic: Socioeconomic factors
Website: https://data.humdata.org/
Description: This dataset focuses a lot of developing countries and the problems they face. You can find information like access to electricity, access to education. Datasets are usually available as CSV files and Excel files. Unfortunately, you need to request access for some of the listings in this database. A lot of the data also has accompanying reports in the form of PDFs and even some PowerBI’s that you can use to make more sense of the data.

NORA Climate Datasets

Topic: Environment
Website: https://www.climate.gov/maps-data/datasets
Description: There are around 50 different datasets and datamaps on this site, all related to weather patterns and changes over time. Not all the data can be downloaded as CSVs or Excel spreadsheets as there are some maps. In addition, actually acquiring the dataset seems somewhat involved for a lot of the datasets listed. There are how-to tabs that describe the process of getting a particular dataset, but it’s a bit more than simply clicking a download link.


Topic: Everything
Website: https://catalog.data.gov/dataset
Description: This is a searchable database from the US Government about everything. There are over 250,000 datasets in this database. Unfortunately, not all of them are downloadable as CSV or Excel files, but there are enough datasets to be useful here. You can search by tags as well. As a side note, I actually found this database while looking for pollution datasets, so this might be pretty good if you are looking for environment related data.

Big Bad NLP Database

Topic: General NLP / Machine Learning
Website: https://quantumstat.com/dataset/dataset.html
Description: 200 datasets that all have descriptions on what they are about. If you want to do an NLP project, this might be a good place to start. Most datasets are available for download in either JSON, CSV formats.

Lyft Level 5

Topic: Autonomous Vehicles
Website: https://level5.lyft.com/dataset/
Description: This is pretty different from the other datasets as it contains annotated images taken from Lyft L5’s self-driving car fleet. There are over 55,000 human-labeled 3d annotated frames, data from cameras and lidars, and drivable surface maps included.

Taskmaster-1 dataset

Topic: NLP / Human-understanding AIs
Website: https://storage.googleapis.com/dialog-data-corpus/TASKMASTER-1-2019/landing_page.html
Description: The dataset consists of 13,215 task-based dialogs, including 5,507 spoken and 7,708 written dialogs created with two distinct procedures. Each conversation falls into one of six domains: ordering pizza, creating auto repair appointments, setting up ride service, ordering movie tickets, ordering coffee drinks and making restaurant reservations. For more information, there is a readme on the website. You’ll need to download the dataset using command line tools.

The Upshot

Topic: Everything
Website: https://github.com/TheUpshot
Description: These are open source data sets used by New York Times' The Upshot.


Topic: Politics and Sports
Website: https://data.fivethirtyeight.com/
Description: These are open source data and code by FiveThirtyEight.

UCI Machine Learning Repository

Topic: Everything
Website: https://archive.ics.uci.edu/ml/index.php
Description: These 488 data sets provided by and maintaned by UC Irvine and are intended for machine learning.

General Social Survey

Topic: US Social Issues
Website: https://gss.norc.org/
Description: This data set includes survey data going back to 1972. The survey covers the social opinions of Americans.

Worldbank Open Data

Topic: Everything
Website: https://data.worldbank.org/
Description: Search for data sets provided by the The World Bank.

Seattle Open Data

Topic: Seattle City Data
Website: https://data.seattle.gov/
Description: The city of Seattle has open sourced its data on education, finance, and various other topics.

Capital One Hackathon API

Topic: Financial
Website: http://api.reimaginebanking.com/
Description: The is Capital One's hackathon API Nessie. This contains data such as atm and bank branch locations.


2/22 - Saturday 12:00 PM Participant check-in Klaus Atrium
1:00 PM Opening ceremony CoC 016
1:45 PM Team formation Klaus Atrium
Sponsor expo
Work time begins
2:00 PM Workshop: Basics of Python Klaus 1456
Late check-in closes
3:00 PM Workshop: Ideation and Prototyping Klaus 1456
4:00 PM Workshop: Making Use of Data Klaus 1456
Workshop: Dim Reduction and PCA Klaus 2456
5:00 PM Workshop: Unsupervised Learning Klaus 1456
5:30 PM MLH: US Air Force CTF Klaus 2456
6:00 PM Dinner! Klaus 1116E
6:20 PM The New York Times talk, featuring Andrew Marchese Klaus 1456
8:00 PM Workshop: Predictive Analysis Klaus 1456
9:00 PM Workshop: Deep Learning Klaus 1456

2/23 - Sunday 1:00 AM Midnight snack Klaus 1116 W
8:00 AM Breakfast! Klaus 1116 E
9:00 AM Andrew Marchese Q&A Klaus Atrium
10:00 AM Workshop: Tableau Klaus 1456
12:00 PM Lunch! Klaus 1116 E
Devpost submission deadline
1:45 PM Work time ends
1:45 PM T-Shirt Distribution Klaus 1116 W
2:00 PM Demos! Klaus Atrium
4:15 PM Closing ceremony & awards CoC 016

Major League Hacking 2020 Hackathon Season