Jan 2018 - May 2018

Data Engineer Intern

Owned the end-to-end development of an error-tracking feature.

Capstone Deliverable

Develop a feature that would allow the Data & Insights team to track error's on live score boards in real time.

Over the course of 4-months I would have the opportuity to collaborate, design, develop and test a feature that tracked XML file uploads that held Score Board data from universities. The goal was to Find a way to not only detect corrupted files but also log the error for us to perform analysis and pattern recognition.

The Problem

Everytime a corrupted file was uploaded it would crash the current scoreboard page and cached data wasn't visible to the end-user

The Solution

I was able to work alongside senior engineers to develop a feature that run through 5+ years worth of National Championship data and extracted and stored into a SQL databases each university's primary key identifier within the XML files. We segregarted each university storing their identifier and their variations. Thereafter we implemented a data pipeline that intergrated the identified variations database to the stream of data being uploaded where I created a script to read the incoming data and trigger a warning whenever a corrupted file was uploaded . It would then check the variation database for a match and replace it with the correct primary key identifier.

Key Learnings

Overall this was a great learning experience I learned how to develop a pipeline, how to contribute during design and analysis meetings. development and how to deliver technical presentations to my team. Also, it was great to be part of such a large coorporation and see how my teams work contributed to the grander vision.

Looking Forward

With this experience, I'm looking forward to applying what I learned in future jobs and personal projects!

Back Home