So, now that you understand the chain rule, you can rederive some of the tricks you’ve already learned.

The book gives a very cute proof of the fact that .

But screw that, we’re onto bigger game. Let’s prove the chain rule.

The proof they give does some stuff that (at least for my brain) overcomplicates things. So I rewrote it, and will work steps with you.

It’s probably worth your time to look this over and just try to understand it. But, I’m going to supplement you with some plain-speaking explanations.

Part 1:

1) Remember y is the same as f(x). So, we’re just saying the change in y is the same as the change in f(x) as x changes.

2) Definition of a limit.

3) Epsilon is defined.

4) Take the limit as x goes to zero of epsilon.

5) Distribute the limit on the right side.

6) Come to the obvious conclusion that it equals zero.

7) Step back a sec and notice we CAN say something about change in y. It may not seem useful yet, but we’ll come back to it.

8) Make everything continuous. This is a legal move since we’re talking about taking a derivative. If it’s not differentiable, the chain rule doesn’t matter.

Got it? So far, we’ve just established a rule I called RULE A for simplicity.

Part 2:

Before I launch in, for your information, the basic difference between my version of the book is that I don’t substitute functions. That is, I left everything in terms of f(x) and g(x) rather than saying g(x) when x=a is called b, or anything like that. The way they do it makes certain steps prettier, but the way I do it makes what’s going on clearer. I prefer it this way. You may not. If you don’t, SUCK IT.

1, 2, 3, 4) Define some stuff.

5) Just RULE A for a function we’ve decided to call g(x). a is some particular value of x, and the epsilon gets a sub-1 just to disambiguate from a different epsilon in a sec.

6) Same deal. Use RULE A for f(x). It looks uglier because f is a function of g(x), but if you look over it, everything is very clear.

7) In step 5, you defined change in g(x). In step 6, you defined change in f(x). The latter contained the term for change in g(x). So, substitute that shit from step 5 into the equation in step 6.

Then, simplify so you state everything as the change in f(g(x)) as x changes. More familiarly, this is change in y over change in x.

8) Recall that OH SHIT, that shit on the left is *made *to be taken to the limit as delta x goes to 0.

When we do so, the left hand shit turns into the derivative, aka f’(g(x)). The shit on the right stays the same, except the epsilons disappear, since (as we showed in part 1) epsilon goes to 0 as delta x goes to 0.

And holy balls, the result sure looks like the chain rule: Derivative of f(g(a)) = derivative of g(a) times derivative of f(g(a)).

Mind = blown.

**Next stop: Implicit Differentiation.**

I don’t think “g(a) is differentiable” makes any sense: g(a) is a number, not a function. You mean something more like “g(x) is differentiable at x = a”.

Something about this is unsettling, and I just realized what it is: setting epsilon equal to 0.This effectively means that you’re dividing by zero, which is sort of, well, cheating. I was a bit surprised to see that Stewart’s Calculus does this with epsilons, but there it is in my book as well.

You don’t actually need continuity there, you just need epsilon to tend towards zero in the limit as your x gets arbitrarily close to a. It’s subtle, but that’s proofs in calculus for you. I find it helpful to think of epsilon as your error bound: this is the level of tolerance you’re shooting for. You can’t get it perfect (epsilon=0) but you can get as close to that as you’d like (any epsilon>0).

I’m almost 18 years old and this is the Math we do during our last year of school. The Greek educational system is insane!