Go-to references for data science tools and concepts

These are the sources I use to implement best practices in data science.

Data storage format options

What’s an efficient way to store derived data, features, parameters when creating models?

Best practices and format comparisions

Feather

Pandas and mongodb

HDF5

  • The python HDF5 ecosystem
  • HDF5 take 2 - h5py & PyTables SciPy 2017 Tutorial Tom Kooij (https://www.youtube.com/watch?v=ofLFhQ9yxCw)

Organizing files in HDFS (https://www.linkedin.com/learning/hadoop-for-data-science-tips-tricks-techniques/organize-files-in-hdfs)

AWS

Workflow pipepline

Coding best practices

Debugging, testing, refactoring of code

Virtual environments, requirements, containers

Code Documentation

Written on April 15, 2018