profile-img
Senior PowerPoint Engineer

@ryxcommar

Risk manager @AlamedaResearch. | Prev: QA engineer at Knight Capital Group | he/him | https://t.co/Z2YNAksZLO

calendar_today27-09-2016 00:00:08

48,3K Tweets

42,8K Followers

2,6K Following

Senior PowerPoint Engineer(@ryxcommar) 's Twitter Profile Photo

I think the biggest blind spot I've seen from (non-finance) data scientists is an inability to think about uncertainty and risk. I don't blame them, it's hard, but probably one of the stronger areas of growth for many out there.

account_circle
Senior PowerPoint Engineer(@ryxcommar) 's Twitter Profile Photo

at least, this is true outside the context of tuning decision thresholds. Every data scientist should be able to understand that e.g. a bank fraud detection algo that sends a text on a suspicious transaction might want to err on the side of caution.

account_circle
Senior PowerPoint Engineer(@ryxcommar) 's Twitter Profile Photo

But sometimes your algo is outputting a number, not a boolean. That number might be used to make a decision. What happens when it's off by X?

account_circle
Senior PowerPoint Engineer(@ryxcommar) 's Twitter Profile Photo

Maybe that's not too hard, but what if there's variance in the wrongness (heteroskedasticity)? Or correlations with other things? Correlations with the heteroskedastic wrongness of other things?

Can you exploit this to make a cleverly risk-reducing decision? (Sometimes yes!)

account_circle
Senior PowerPoint Engineer(@ryxcommar) 's Twitter Profile Photo

Imagine you work at Uber. Avg. trip durations in an area are 20mph with +/-1 stdev. Do you do things differently than if it was same average but +/-3 stdev? Possibly!-- remote origin to remote destination trips would be harder to reduce downtime for, and might have to cost more.

account_circle
Senior PowerPoint Engineer(@ryxcommar) 's Twitter Profile Photo

Now what if the stdev of ride speeds in Newton MA is negatively correlated with the stdev of ride speeds in Wellesley MA, but positively correlated with the average of Wellesley MA ride speeds? That's kind of hard to think about, but you could probably do something with it.

account_circle
Senior PowerPoint Engineer(@ryxcommar) 's Twitter Profile Photo

the epistemology of machine learning doesn't usually allow for this type of thinking. Residuals are seen as a failure to predict rather than as a fact of nature. Residuals exist only to be reduced, not to be embraced and utilized.

account_circle