ChatGPT, other language models and AI

The fact that RL can be used in some task, like chess, does not mean that it can be utilized for creating an LLM. Probably thousands of AI researchers have spent the last 2 years pondering how RL can be harnessed for creating an LLM. It is completely obvious that RL is one of the philosopher’s stones on the path towards AI. Its implementation is not trivial but a scientific breakthrough.

I’ll give an example if physics is a more familiar area to you than computer science:
Also, fusion has been very well known for years. The fact that someone succeeds in producing energy with it is just a trivial detail related to implementation. The project manager budgets a couple of weeks for it in Excel, and this way, the creation of this fusion reactor stays on schedule. :sweat_smile:

2 Likes

I agree that AGI is nowhere near.
An LLM codes by being able to generate the next word because it understands the context well, based on words in an over 1000-dimensional embedding space.
Humans learn to code by learning language commands, such as a for-loop.
The same difference applies to other things; that is, a language model is based on probability.

The core of a Transformer is an embedding space. In principle, any data can be brought in as tokens and converted back.
The embedding space itself doesn’t use words; rather, it describes the relationships of things in the surrounding world with vectors.
This method allows for human-like activity, even if its learning and thinking are not human-like.

Transformers or their variations are used extensively for processing all kinds of data. The entire Multimodal reasoning is based on the fact that multiple types of data can be processed in the same space.
image

I don’t know what this refers to, but Transformers are used in robots and autonomous cars.
Similarly, as far as I know, DeepSeek uses a Transformer-based approach, meaning they have made their own modifications to the basic model. Transformer-based refers to Self-Attention type embedding space processing.

I mentioned JEPA and Large Concept Models because Meta, led by LeCun, aims for human-like AI.

JEPA is a Self-Supervised learning model where an x-encoder model is trained to predict information according to a y-encoder model. This is done such that the predictor has an abstract representation space, which aims to form knowledge about what the concrete y-encoder represents (cat, dog, etc.).
image

I-JEPA trains a context encoder using a target. Because the predictor has an abstract multi-dimensional space where knowledge about the image’s subject is formed, the image is not predicted at the pixel level, but rather the object being generated is understood.
image
V-JEPA is similar for video.
Both are built using multiple ViT (Vision Transformer) models.

Large Concept Models (LCMs) by LeCun seek a similar concept.
Because humans don’t read individual words from a book but rather read a longer passage and internalize the content (Concept), LCM aims to function similarly.
That is, concepts are identified from a long text. Since concepts are wholes, descriptive text can ultimately be produced from them from different starting points. Or an image can be produced from a concept, etc.
image
LCM is also implemented using a Transformer.
image

So, when processing text, images, etc., LeCun aims to identify larger concepts and objects in the background, upon which the generation of text, images, etc., is based.

Regarding AGI.
Hype around AGI arises from time to time. This often stems from something functioning like a human.
Two years ago, AGI hype arose when ChatGPT responded like a human.
For some, S-Group’s food robots instill belief in the realization of AGI, or currently, DeepSeek’s excellent LLM enhancement ideas.
Even though no significant new changes enabling AGI have occurred in AI models.

11 Likes

https://medium.com/thoughts-on-machine-learning/the-laymans-introduction-to-deepseek-r1-training-80d8ff7a887d
Here’s one perspective on how DeepSeek R1’s training was done - let’s go through the role of RL (Reinforcement Learning) in the process.

2 Likes

In the past, photography and moving images were quickly harnessed for the realm of eroticism, and general interest increased. The same happened with color photography, the internet, home videos, mobile video, and 3D technology, and some people were horrified. AI has already entered the same domain, even though regulation is being attempted. There are immense forces in the realm of basic human instincts. At its deepest, this is also the question in Russia’s invasion of Ukraine. Regulating AI is like putting out a fire by spitting. After instincts related to reproduction, other needs arise, and on average, the need to act for the good of the community comes much later. Bees and ants have a different hierarchy of values. Things should be influenced precisely from this other direction, if and when there is a need at all.

Artificial intelligence continues to amaze almost daily. It forces many school subjects into a new mold. At the same time, people’s resilience, or the ability to cope without technology, weakens. Are we already too dependent on it when a small disruption can leave a large apartment building without HVAC services in winter? In a crisis, it is not possible for everyone to survive, just like with those insects. One can significantly improve their chances by preparing so that not everything relies on artificial intelligence.

Used wisely, artificial intelligence is a great thing. As it becomes commonplace, the hype subsides, and society adapts, just as it does with the aforementioned technologies. We just have to live with it. Thumbs up for artificial intelligence. Thumbs down for leaders and people, big and small players, who operate based on instincts.

1 Like

Give ChatGPT a task: “Write a 12-letter long word that means nothing.” Even with 10 attempts, I couldn’t get a word exactly 12 letters long.

It’s due to how a transformer works. There’s no internal counter that keeps track of used characters or words. It spits out content, just until softmax spits out an EOS token, which stops the generation.

1 Like

AI-stupidity seems to be an impossible task for language models. Wouldn’t have believed it. The same result came with chatGPT4, Grok2, and Deepseek, whose reasoning was fun to follow; 13 and 11 letters succeeded when asking for 12… maybe the experts will explain or should I ask the models?
If there’s no counter for letters, why then do 11 and 13 letters come out, and not 5-10?

1 Like

However, when I asked for a 12-line poem, ChatGPT gave exactly 12 lines, even numbered. And it omitted the numbers when I asked.

This is still due to how a transformer works. If you ask for a twelve-line poem, the model starts, roughly at the right point, to bake higher probabilities for the EOS-token based on the context, until finally the EOS-token pops out at an appropriate point and the generation ends.

These models are still practically pure autoregression, where the next token is chosen from the set of possible output tokens (vocabulary) based on probability.

Probably also when asking for a 12-letter word, the model guesses lengths roughly around that number, and not, for example, 1 or 150 character words. So, due to the lack of a counter, it couldn’t return the EOS-token from the probability distribution at exactly the right moment, but it probably got quite close based on the prompt context.

This illustrates well how, to some extent, the model is “just” a so-called stochastic parrot: Stochastic parrot - Wikipedia

2 Likes

Another challenge for ChatGPT was: “Write a 12-line poem and translate it so that the words are reversed, beginning to end.” The poem came out nicely at first, but in the translated version, the words were reversed like a preschooler’s. I wonder if other problem-solving is at the same level. The AI should be able to say when too much is being asked.

edit excerpt:
In the shade of the forest I walked,
the path led me forward,
the wind rustled in the trees,
and the silence was beautiful.

ansseimS natsäm etsi,
niukop iew tni munnep,
ilut ihsumi ssuit,
ajaisihil olleu kaunis.

1 Like

When posting gibberish to a forum, it helps to ask [it] to invent a word with 12 letters and ask the language model to number the letters. “Invent a meaningless word with 12 letters. Finally, number the letters.” The free version of Copilot cannot do it, but here’s Deepseek’s example of its thought process, as it counts the letters even before printing the answer:

IMG_2768

This is still due to how a transformer works. Let’s assume here that 1 token = 1 word (in reality, tokenization involves parts of words, but it’s easier to explain). The algorithm roughly always outputs the next word that receives the highest probability. The output word must therefore be among the possible words to be output, from which the selection is made.

Since the vocabulary of the training data is roughly the lexicon of the internet, the model is unlikely to be able to guess, for example, the reverse of a word like “siimeksessä”, because such a word is not found in its vocabulary.

(Note: the above are very rough generalizations – in reality, transformers incorporate methods to solve out-of-vocabulary situations, etc.)

To understand the operating logic, I recommend, for example, this roughly 30-minute video, after which you will certainly understand well the kind of constraints involved when working with transformers: https://www.youtube.com/watch?v=zxQyTK8quyY

5 Likes

Perhaps it’s easier for a non-technical person to visualize the matter. For example, in this example run on my own machine, the AI has given (incorrectly) the answer “Flibergoof”, which has 10 letters. The length of letters contained within a single token varies and depends on the settings, specifically what probabilities different token options receive when the model is run. In this particular run, the probability distribution is such that getting an answer of exactly 12 letters is quite unlikely:

image

I made the task slightly more difficult by giving unclear instructions: “12 words”, instead of “12 letters”. As can be seen from the answer, it was still ‘understood’ what the user was looking for, but it still couldn’t ‘manage’ to give a correctly formatted answer.

1 Like

Now, AI researchers at the University of California, Berkeley, claim to have replicated the R1 model for under $30.

Dubbed TinyZero, the new AI model was trained using the number puzzle game found in the game show Countdown, which requires players to reach a predetermined answer from a random set of numbers.

2 Likes

Quite interesting, but I wouldn’t call it a deep dive.

I would recommend listening to the latest Lex Fridman. It’s five hours long and even that is just a popular-level scratch of the surface on the topic, but probably the minimum amount one should understand about the subject.

4 Likes

Facts related to investment taxation should not be asked from ChatGPT, at least (gpt-4o-mini):

Explain the presumptive acquisition cost in Finnish taxation.

The presumptive acquisition cost is a method used in Finnish taxation to calculate capital gains when selling shares or other investments. According to it, the capital gain is calculated by deducting a presumed acquisition cost from the selling price, which is usually 40% of the selling price. This means that the taxable gain does not need to be calculated based on the actual acquisition cost, but this simple presumption can be used. The presumptive acquisition cost can be used if the selling price is over 1,000 euros and the shares have been owned for more than a year.

What about the 10-year rule?

The 10-year rule refers to a rule related to the taxation of share sales in Finnish taxation, according to which, if shares have been owned for more than 10 years, no tax needs to be paid on their sale. This means that the capital gain is tax-exempt if the ownership period exceeds ten years. The rule encourages long-term investing and asset management.

5 Likes

It seems that it actually works out that if shares have been owned for less than 10 years, one can use a 20% deemed acquisition cost, and 40% if over 10 years. Tax-free capital gains probably never exist in Finland.

1 Like

Trades under 1000 euros per year are tax-free. This is what ChatGPT apparently tried to guess there.

1 Like

A really annoying feature in these AIs is that screen rotation doesn’t work. 90% of the user experience is lost when you have to squint at a tiny keyboard and type with your index finger the same way my hundred-year-old parents do. Why isn’t such a simple thing fixed? Is it just a problem with my phone, or is the fault solely with the user?

In practice, I have to write the question in a note and copy/paste it into the AI. I do the same on forums, though, as even there, the rotated landscape display isn’t good.

Another brilliant tutorial video from Andrej Karpathy explaining how AI models are developed and trained:

3 Likes