2 Comments
User's avatar
Nick's avatar

I've been studying approximation algorithms lately and was surprised to find many tricks used to relax discrete symbols to enable continuous optimization are very similar to the representations used in ML. For example, using unit vectors or simplexes. Perhaps the theory of data representation is simply a study of convex relaxations of integer programs.

Expand full comment
Cagatay Candan's avatar

word2vec: The word-embedding part (the pre-training stage of LLMs) is still very much surprising for me. (As soon as I digest this, I am hoping to move towards the main novelty that everyone is talking about: transformers!)

I have asked ChatGPT whether it is doing anything special for agglutinative languages such as Turkish. (In Turkish, book is "kitap"; bookstore is "kitapci"; bookshelf is "kitaplik", bookshelf store "kitap-lik-ci" The 2-3 letter additions at the very end of the main word "kitap" changes the word all together.) ChatGPT told me that they are not doing anything special for any language and using sub-word level tokenization for all languages! How convenient?

This language independent approach leads to perfect translations between almost all languages. I am still in awe of this. How are they doing it?

This is my chat with him on the topic: https://chatgpt.com/share/68d55f4b-9a04-8002-bdb3-6d6a0b02a6b6

Expand full comment