study material & useful links

overview

a study-material list i believe to be “sources of truth” on their respective topics; it is not enough for an analytics professional to ‘generate insights’; the professional must be able to (1) describe and communicate insights and recommendations and (2) have intelligent discussions as to how the model(s) should be designed and supported in production; required expertise (or familiarity) in the following subjects:

general scientific methods (how to setup, run, qualify and ensure experiment reporducibility)
maths, statistics, probability theory, et al
scripting & package knowledge (Python, R and respective packages)
visualization (packages, reporting tools, etc.)
computer science / programming basics (version control, testing framework, design standards, etc.)
data engineering (how are you ingesting, transforming and updating your data feed)
cloud architecture (where / how are your models going to be supported?)

given the pace at which technology is ever-changing, there should be a portion of ones time spent, to use a nn reference, “exloring” versus exploiting the current mastered skill sets; certain tools may solve a problem more effecienty with minimum time investment (e.g. shiny versus tableau or power bi mastery)

current topics of interest

ml - the goldilocks’ fit (avoiding under / overfitting)
ml - improving model testing and cross vadlidation procedures in production
ml - selecting & optimizing parameters & hyperparameters
ml - objectives and loss-functions
ml - proper utilization of label versus one hot encoding
ml - missing data management (done the right way)
py - improving management and use of scopes, classes, instances, methods
py - data classes

resource links

awesome production machine learning - curated list of open source libs to deploy, monitor, version and scale your machine learning
data scientist roadmap - really well done repo by MrMimic that outlines fundamentals in various areas
statistics fundamentals - modules ranging from point estimation to bayesian inference
python best practices, green tea press - how to think like a (python) computer scientist
hitchhikers guide to python, o’reilly - environment setup standards and generally how to write great python code
hundred page machine learning book - Andriy is nice enough to offer this book on a ‘read first, buy later’ principal; i eneded up purchasing the paperback copy… because why wouldnt you want to support people doing good for the community?
hundred page machine learning book - data & code samples - the code to go with the hundred page ml book’s examples
ml prod deployment methods - overview of some deployment methods with varying environments & scenarios
ml evaluation methods p1 - overview of basic evaluation metrics and methods for common model application scenarios
ml evaluation methods p2 - overview of basic evaluation metrics and methods for common model application scenarios
kaggle ensembling guide - methods of improving the accuracy of various ml tasks by joining models
application scenarios - i have linked to this before in previous posts, however, it is useful to understand where to apply the tools in your toolbox and firmai provides the best list i have found to date
managing missing data…the right way - many packages recommend you ‘fill in’ missing values with mean, medain, mode, etc. when in fact this is rarely the path to be taken; a good response to this problem is summarized in this stack exchange answer
google’s guide to ml ops

research papaers

good reads (and how to find more of them)

overview

current topics of interest

resource links

research papaers

Share on: