overview

a study-material list i believe to be “sources of truth” on their respective topics; it is not enough for an analytics professional to ‘generate insights’; the professional must be able to (1) describe and communicate insights and recommendations and (2) have intelligent discussions as to how the model(s) should be designed and supported in production; required expertise (or familiarity) in the following subjects:

  • general scientific methods (how to setup, run, qualify and ensure experiment reporducibility)
  • maths, statistics, probability theory, et al
  • scripting & package knowledge (Python, R and respective packages)
  • visualization (packages, reporting tools, etc.)
  • computer science / programming basics (version control, testing framework, design standards, etc.)
  • data engineering (how are you ingesting, transforming and updating your data feed)
  • cloud architecture (where / how are your models going to be supported?)

given the pace at which technology is ever-changing, there should be a portion of ones time spent, to use a nn reference, “exloring” versus exploiting the current mastered skill sets; certain tools may solve a problem more effecienty with minimum time investment (e.g. shiny versus tableau or power bi mastery)

current topics of interest

  • ml - the goldilocks’ fit (avoiding under / overfitting)
  • ml - improving model testing and cross vadlidation procedures in production
  • ml - selecting & optimizing parameters & hyperparameters
  • ml - objectives and loss-functions
  • ml - proper utilization of label versus one hot encoding
  • ml - missing data management (done the right way)
  • py - improving management and use of scopes, classes, instances, methods
  • py - data classes

research papaers

good reads (and how to find more of them)