28 Sept 2000
Sexing your Meep
or
A little foray into Bayesian inference
So, earlier this week, noting the change to fall-like weather, I donned my
usual fall uniform: leather hat, leather jacket, leather vest, pirate
shirt, blue jeans, leather shoes. With hair tucked up under the hat, I
got my usual "Was that a woman or a man?" looks.
So, as a public service, I thought I'd talk about how to determine the
sex of a random person.
So you've got a Meep walking down the road -- how can you determine Meep's
gender? Flip Meep over and check between the legs. Ooops, that's how you
sex chickens. So if you're not going to be able to persuade me to remove
my clothing, give you a blood or tissue sample (I'm =pretty= sure I'm XX),
you're going to have to go with observable cues. That's where inference
comes in.
So, while I'm trying to look up demographic journals to give me actual
statistics, let me explain conditional probability to you.
We've all run into conditional probability, mainly because we don't live
in a fog of ignorance all the time ('we don't?'). How often have we asked
"What's the chance of my plane leaving on time given that there are major
thunderstorms over Chicago?", "What's the chance of me going to the gym
tomorrow morning, given that it's 3 am right now and I'm trying to finish
Harry Potter?", "The other guy has three of a kind showing, but I'm so
sure that my flush will beat his hand! However, what's the probability he
will beat me, also given that he keeps raising my bets?" (for more info on
that last question, see me. I'm thinking of writing a book: Meep's
Complete Poker Probability Bible.)
So there's this thing called "conditional probability": P(A | B) = "the
probability of A given B". For example, what's P(in a family of 2
children, both are boys | at least one is a boy)? And what's P(both boys
| oldest one is a boy)? Are those probabilities the same? (Answer at
bottom of file)
How does one calculate conditional probabilities? There are two ways: do
it directly. This is easy if the probability of each outcome is
equal. So, if I tell you that someone has a hand of all red cards, what's
the probability of them holding a heart flush? I can just count up the
number of heart flushes and divide by the number of hands made of hearts
and diamonds. Every hand is equally likely. No sweat.
But what if you can't calculate it directly? Here's a nice little
formula:
P(A | B) = P(A & B) / P(B)
Remember that. Tattoo it somewhere you can read it (remember - if you
tattoo it on your belly, put it upside-down. if on your ass, tattoo it
backwards).
So to get the conditional probability of A given B, calculate the
probability of both A & B happening, and divide by the probability of B
occurring.
This formula can be seen in another form as well (just minor algebra
manipulation):
P(A & B) = P(A | B) P(B)
So let's see what we can do with this info.
It seems that the appropriate statistics are unavailable online, so I'm
just going to pull them out of my ass. Which is appropriate, for my first
inference involves the ass.
Now, there Meep goes, just a walkin' down the street...
(singin doo wah ditty ditty dum ditty doo...)
What's the first thing you notice? Today, that is, when I'm wearing a
short vest and shirt tucked in. Yes! Meep's got an ass!
Now, one thing I've noticed throughout my life is that, if you're a woman,
chances are good that you have an ass. And if you're a man, chances are
less that you have an ass.
so let M = person is a man
W = person is a woman
A = person has an ass
fake stats:
P(A | W) = 70%
P(A | M) = 50%
we want to know: P(W | A) - probability a person is a woman, given that
they've got an ass.
Now let's see what we need:
P(W | A) = P(W & A) / P(A) = P(A | W) P(W) / P(A)
We already have P(A | W). What's P(W)? This is where =priors= come
in. You're trying to determine if someone's a man or a woman. You have
some prior probability in mind that they could be a particular
gender. Let's say this is at rush hour in Manhattan, Meep walking down
the street. Chances are about 50/50 that a person is one gender or
another. Now if you had been talking about walking around in the middle
of the day in Afghanistan, the priors would've been way different. As in,
the prior probability of being a woman would be 0, as any woman wandering
around would be immediately executed by the Taliban.
But back to Meep's back property. So let's assume P(W) = P(M) = 50%.
What about P(A)? Here we go with the old divide and conquer strategy.
If I keep an old-fashioned view of the world, the event S that a person is
a human (as in homo Sapiens)= W union M. Also old-fashioned, I assume M &
W are disjoint (no overlap). So keep your she-males to yourself; at this
point, things are complicated enough, so take transgender issues
elsewhere. Now this is cute. Watch the probability fly:
P(A) = P(A & S) = P(A & (W union M)) = P(A & W) + P(A & M)
neat! I split the event "having an ass" into two sub-events: being a
woman with an ass, and being a man with an ass.
So now I've got: P(A) = P(A & W) + P(A & M)
= P(A | W)P(W) + P(A | M)P(M)
Cool! Now I can actually calculate stuff!
P(A & W) = 70% * 50% = 35%
P(A & M) = 50% * 50% = 25%
P(W | A) = 35% / (35% + 25%) = 58%
So far we can guess that Meep is female with 58% probability!
Next, we notice Meep has prominent hips. Indeed, the jeans fit quite
nicely over this hip/ass package. Can we use this information in any way?
Well, again, most women have prominent hips. But even fewer men have
prominent hips.
W & M mean the same thing, but now H = has hips.
P(H | W) = 70%
P(H | M) = 40%
Cool! Let's just chug through the info as before:
P(H & W) = P(H | W) P(W) = 70% * 50% = 35%
P(H & M) = P(H | M) P(M) = 40% * 50% = 20%
P(W | H) = 35% / (35% + 20%) = 64%
Wow! We've got a better lock on! Meep now stands a 64% chance of being
female! But, truthfully, we'd like to combine our two pieces of
information. Indeed, what is P(W | H & A)?
Let's see what info I'd need:
P(W | H & A) = P(W & H & A) / P(H & A)
Actually, we can go several ways from here. But what we really need to
correlate hips & ass (these are =not= independent events, people -- women
with asses tend to have hips, and men with hips tend to have
asses... let's try to use this info to calculate):
P(H | W & A) = 90%
P(A | M & H) = 70%
let's see where this gets us:
P(W & H & A) = P(H | W & A) P(W & A) = 90% * 35% = 31.5%
P(M & H & A) = P(A | M & H) P(M & H) = 70% * 20& = 14%
so
P(W | H & A) = 31.5% / (31.5% + 14%) = 69%
Now Meep has an 69% chance of being female. Truthfully, I don't think
we're going to get much better than this. You might think that combining
the results would actually help the situation better than that, but the
truth is that since hips and ass usually go together for both men and
women, combining the info doesn't take you much farther. However, would
you like to see what happens when hips and ass don't correlate very well
amongst women, and correlate extremely well amongst men?
P(H | W & A) = 40%
P(A | M & H) = 100%
P(W & H & A) = P(H | W & A) P(W & A) = 40% * 35% = 14%
P(M & H & A) = P(A | M & H) P(M & H) = 100% * 20% = 20%
So:
P(W | H & A) = 14% / (14% + 20%) = 41%
I could be going from info that was convincing me someone was female, to
info convincing me someone was male! That's why you've got to be careful
of correlations: if I told you most people who had asses were female, and
most people who had hips were female, but most people who had asses =and=
hips were male, you'd think I was crazy. However, this is something that
happens in real life all the time, due to all sorts of correlations. This
is something called Simpson's paradox.
I'll give you an example from an old Stats text. In the era of burgeoning
women's rights, someone at Berkeley thought they'd look into the graduate
programs at Berkeley and their admissions rate for women vs. men. Ah-ha!
A larger percentage of men were accepted over women! Sexual
discrimination!
However, though this was hot stuff, it wasn't enough to flesh out a
research paper, so they decided to see if they could see which departments
were the main culprits. The departments, after all, were the level where
the actual acceptance/rejection thing was going on.
In =all= departments, women had a higher acceptance rate than men.
What was going on? More women were applying to programs that were more
competitive. So, for example, the education department had a lot of
applicants, mostly women, and had a low acceptance rate. On the other
hand, physics had an applicant pool that was mostly men, but they had
fewer applicants as a whole, and had a higher acceptance rate. Women
=overall= had a higher rejection rate because women flocked to the subject
areas where the rejection rate was higher. Likewise, men "played it
safe" and mainly went for subject areas with less competition.
Interesting, ne?
By the way, the probability of a family having two boys given that it has
at least one boy is 1/3. The probability of a family having two boys
given that the elder child is a boy is 1/2.
And my gender? Well, I once convinced someone online that I was a man
named Mary. That should be good enough for you.