29 January 02

29 Jan 2002

Argh, I'm so tired.  I don't know why.  In any case, I've decided to break down
and do my best to explain Lagrange multipliers in a very generic sense:

King of the Hill
----------------

Suppose you're hiking along a path, and you know there's a hill on the path, and
you want to find the top of the hill.  How would you find it?  

Easy, you might think.  Just hike around the path, and find the point that's
higher than all the rest.  But how do you know you're at the highest point?

In real life, one would probably look around and notice that everything is below
where you're currently standing.  Well, let's suppose you =can't= see the
surrounding area.  Perhaps there's a forest surrounding you and you can't see all
of the path. 

Then you could bring along a handy-dandy altimeter (or perhaps GPS could do this
for you) and keep measuring the height until you find the point that's higher than
all the other points on the path.  Okay, but that sounds like a lot of work, and
you'd have to keep track of all the heights.  What if you were blind (and didn't
have a talking or Braille altimeter)?  Or what if using the altimeter cost a lot
of time (or money) for each use?

First of all, you probably wouldn't know if you were at =the= highest point if
there were several hills, but I'll tell you there's only one hill and you need to
find the top.  But even being blind, and not being able to see the path, you could
tell if you had reached the top simply because as you approach the top, you will
be climbing up the hill, and as you walk over and beyond the top you will be
descending the hill.  

Now, one way to represent whether you're climbing or descending would be by the
angle your sight is making with a theoretical "flat ground".  You could measure
this angle by using a spirit level (one of those metal bars that has glass tubes
filled with liquid and a bubble in the liquid.  People use these all the time in
constructing stuff, or hanging pictures, to make sure stuff isn't crooked.)  Or
you could use a plumb line - simply a heavy enough weight hanging on a string -
and see what angle you make with "straight down".  We measure the angle from the
plumb line counterclockwise to where you are (to get a positive angle).  If one
has a negative angle, the angle has been measured from the plumb line clockwise to
you.  

So if one is climbing up a hill, the plumb line will be swung out at an angle
=behind= you, giving you an angle between 0 and +90 degrees.  Likewise,
if you are hiking down the hill, the plumb line will be in front of you, giving
you an angle between 0 and -90 degrees.  (You won't be getting +/- 90 degrees
unless you're Spiderman going up or down a sheer cliff.  We're only talking
gentle, rolling hills here.)  

What does this have to do with calculus?  Well, one of the two big concepts of
calculus, derivatives and differentiation, are related to the angle you are making
with the plumb line.  In fact, the derivative of the, well, hill I suppose, is
equal to the tangent (as in the trigonometric function) of the angle you are
making with the plumb line.  Because the tangent of angles between 0 and +90
degrees is positive, whenever you are ascending the hill, you've got a positive
derivative.  The tangent of angles between 0 and -90 degrees is negative, so when
one is descending a hill, the derivative is negative?

What about at the top of the hill?  If you stand still at the top of the hill, the
angle you make with the plumb line will be 0, because you won't be tilted either
way, but standing straight up.  The tangent of 0 degrees is 0, so the derivative
at the top of the hill is 0!  

Let me get a little side-tracked here.  I just said that at the top of the hill
(maximum point) the derivative is zero.  This does NOT mean that where the
derivative is zero, you're at a maximum point.  You could be walking across flat
ground -- obviously you aren't tilted away from the plumb line in such a case; you
could be at the bottommost point of a valley.  In fact, if we were talking about
general functions, and not nice rolling hillsides, we might find the maximum point
does not give one a derivative at all.  These kinds of "ifs, ands, and
buts" populate math classes, and if you want all the details, consult a calculus
text (though not any text with 30 authors) or a class.  My purpose is not to teach
you how to take derivatives or consider all the possibilities at maxima and
minima.

So back on topic.  One way to find the top of the hill is to note everywhere the
plumb line makes an angle of 0 degrees to you, and =then= measure the height at
each of these points.  If everything is nice and smooth, you're guaranteed to find
the top.

Let's think of another way of looking at this.  You can think of each point of the
path being colored with a hue that gives you complete information as to the height
of that point.  (Perhaps this is impossible in real life as quantum physics will
not allow a continuum of wavelengths but THIS ISN'T REAL!  It's just a
thought-experiment!  There's no Schroedinger's cat you ninny!)  As you walk over
the top of the hill, every color you see as you ascend you will see (in reverse
order) as you descend.  Except for one -- the color at the very top of the hill.

Another way to think about it is there being horizontal lines, one for each
height, going through the hill.  Every height on the hill has a line that pierces
the hill twice - once on the ascending side and once on the descending
side.  Except for one.  There will be a line, at the height of the maximum point,
which will touch the hill at that single maximum point.  This line doesn't go
"through" the hill.  Except for the intersection point, the line is above the
hill.   

A line that touches a curve at a single point and stays on one side of that curve
is said to be a =tangent line= (obviously, this isn't the technical definition,
but it's good enough for our purposes).  If you remember your algebra, the slope
of a horizontal line is 0.  Indeed, the derivative of a curve is the slope of the
line tangent to the curve.  Another way to look at these horizontal lines are as
=level curves=, because each horizontal line marks off a vertical level.  That
will come in handy in a second.

In any case, this doesn't have much to do with Lagrange multipliers... or does it?

Now we're going to go hiking in a park, again with a hill, but because the path
we're going to hike on is rather long, we want to know where the highest point on
the path is =before we even hike it= so we can take a rest.  All we're given is a
very special topographical map that lets us zoom in as close as we want to go.

If you've ever seen a topographical map (if you've done orienteering you surely
have), you'll notice curves on the map (sometimes closed into circular or oblong
shapes, sometimes just a long wiggly line that traverses the length of the
map) and each curve has a particular altitude on it.  Usually the lines are at
regular height differences - every 100 feet or perhaps every 15 feet, depending on
what the terrain looks like.

In the world of mathematics, these are called "level curves".  If one thinks of
the surface of the landscape as a function, drawing the places where the landscape
is at a particular height will give you a single level curve.  This gives us an
idea of how to find the highest point -- just like in the case where we're walking
along our path.

To make things easier, let's say our path is closed -- we're starting in the
parking lot, we walk a big loop, and it takes us back to the parking lot.  So for
any point below the highest point, we know that if we go =up= through it, we'll
have to come back =down= through it.  That means that for heights less than the
maximum height, we'll pass through the level curve for that height at least
twice.  At the maximum point, however, we will touch the level curve at a single
point, and not pass through it.  At the highest point, our path will be tangent to
a level curve!

And that's where Lagrange multipliers comes in.  For there is something called
=the gradient=, which for a point on a surface is a vector with a length related
to partial derivates (who cares what that is right now) but whose direction points
towards the greatest increase, and, more importantly, is perpendicular to the
level curve that point is on.  Likewise, if I describe the path I travel as an
equation (like x^2 + y^2 = 25 would give me a path that's a circle of radius 5
centered at 0), and take partial derivatives and stick them appropriately on a
vector, I get a vector that is perpendicular to the path.

So, if the path and a level curve are tangent, that means vectors perpendicular to
one of them has to be perpendicular to the other.  So Lagrangian multipliers
acknowledge this -- the multipliers themselves come from the fact that we don't
care how the lengths of the perpendiculars relate to each other, simply that you
can get one by a nonzero multiple of the other.

Of course, I was just looking at a 2D surface and a single 1D path.  Lagrange
multipliers can be used inmuch higher-dimensional realms, but my visualization of
4D situations is kinda sketchy (as it is, I have difficulty picturing 3-space in
my mind ... and think, the topographical map is a way to "flatten" the
surface.  That's why I thought of it that way).  Again, you must also remember
that one can get level curves tangent to a path at minima and at flat, or
constant, areas of the path, so be careful.

Lagrangian multipliers are used in situations where one is trying to optimize with
a constraint - this happens all the time when one is doing something like
estimating parameters for a probability distribution, when one is given a
statistical sampling of the population.  There are various constraints, such as
all the probability has to be 1, and sometimes there are constraints to make sure
the mean or standard deviation is what one wants it to be.  And then, people might
want the parameters, subject to those constraints, which give the highest
probability that the particular statistical data were seen.  This is called the
maximum likelihood method.     

In any case, you notice I don't tell you how to do any of these things - how to
take derivatives, how to get gradients, etc.  Because, truth be told, most people
don't need to know the nitty-gritty of these things.  I don't need to know (well,
remember) Krebs cycle to realize it's the metabolic process that creates ATP, the
molecule that are cells use for energy to drive their own particular needs.  Sure
there are a lot of details I'm missing, and there's more going on than just the
production of ATP, but it's enough for scientific literacy, as it were.

And now you know that Langrange multipliers are the mathematical way to find
extreme points (highest or lowest) of a function, when you've constrained the
variables to a particular path (or surface).
Year