9 Comments
User's avatar
Chris's avatar

I have a counterpoint to Wiener's claim from a while back. Basically, there are things that an entirely additive loss function cannot express.

https://weary-travelers.gitlab.io/posts/ideas/non-additive-losses/idea.html

Expand full comment
Cagatay Candan's avatar

Average loss is linked with probability through law of large numbers.

In many applications, say in communication; say your Wifi server operates at 50 Mbits/sec; that is there are 50 million bit transmissions taking place at every second. This is the regime of law of large numbers, relative frequency interpretation of probability etc. It does not make sense to exert a special effort for the 100th bit or 200th bit in this application.(Also, there is a feedback channel requesting the correction of mis-delivered bits called ARQ. Things work as you know!) Since all bits are identical in value, which is indeed the case after source coding, you only care to improve the average number of transmission errors, say in a second, which is the probability of error metric in communications. Hence, in this regime of repeated trials/experiments, targeting the average behaviour makes sense; also in insurance business, also in the hub optimization of fed-ex packets etc.

The meaning and connotation of probability of airplane accident is not the same for the insurance company, airline company or passengers.

I have also read the blog in the link with great interest. The word habituation in the blog really suits well current AI training supervised by examples IMO. I remember when I was teaching, I was trying to avoid many examples in class but trying to focus on the main content and directing students to the examples for self-discovery after the theory in class. Many instructors taught topics through examples. Books also differ significantly on this from Schaums books to yellow perils in maths. This reminds me that. I can not say that learning through examples does not work after so many years.

Expand full comment
Jess Grogan's avatar

Thank you for sharing, I found this useful to think about. Currently, I don't think that's a good enough counter point for Wiener's claim. My reasoning: just because there's a specific way of human reasoning that an average loss function doesn't exactly replicate, doesn't mean that considering an average isn't the most effective way for machines to learn the problems we need them to, in order to help society.

Expand full comment
Alex Tolley's avatar

You might want to use cheap AI to correct spelling. "Evaluting" stood out like a sore thumb.

Actually, I thought teh slide deck seemed quite understandable, at least on first glance.

Expand full comment
Ben Recht's avatar

It would be helpful if you told me where the typos occurred. That error doesn't occur in this post.

also: teh.

Expand full comment
Alex Tolley's avatar

Slide 17.

"teh" is a long-time muscle coordination problem. I usually have to do a "Find and Replace" to correct that error in texts.

Expand full comment
Hostile Replicator's avatar

If we’re being pedentic, the final paragraph contains a you’re that should be a your.

Anyway, excited to follow along with these notes and especially the reports of the discussions that arise in your classes!

Expand full comment
Ben Recht's avatar

fixed. I'm always happy for people to call out typos! just tell me where they are.

also: pendentic -> pedantic. :)

Expand full comment
Hostile Replicator's avatar

Aha! You fell into my trap of pedantically correcting my deliberate typo of pedantic!

(It’s the little things in life)

Expand full comment