Saturday, April 09, 2011

Discrimination and Data

A Discriminatory Conundrum

The total American workforce has remained relatively constant over the last ten years – roughly 131M employees. The number of EEOC claims over the last ten years has increased by roughly 25% to almost 100K a year.

I would think these suits increase when there is a bad employment marketplace, but I don't have any data on that it just seems logical. When times are good all sorts of people get good jobs. When times get bad, jobs are hard to find, promotions are hard to find, people get demoted, bosses have to cut budgets and people. People are also very stressed out and it isn't surprising to me that lots of bad outcomes come from that including people feeling slighted, people getting into escalating cycles of bad words and actions, and more lawsuits.

In addition to the macro-effects of the economy data has quite a bit of variation. Often lawsuits over class differences get data that shows difference in result and claims this shows discrimination. While if we showed the same data for the first letter of the grade school the person attended, or eye color, or the state the person's Mother was born in we would see lots of variation in data that people would see as worthy of payment for the discriminated against class (though in those cases no-one would actually believe it). xkcd took a comic look at data analysis recently.

I am skeptical of short term variation having much meaning with this data. I would want to see it charted over time and then analyzed for having an indication it wasn't just random variation of a stable system.

If the data doesn't indication a special cause (for the variation in the data) that tells you that special cause problem solving is not the best way to improve. In that case you still want to improve but you improve using common cause problem solving techniques. This particular dataset would undoubtedly (at least in my opinion) benefit from stratifying the data to identify segments where it is more of an issue. My guess is that there would be wide variation but the data analysis would show if this is true or not.