In sampling, a 20-token top-k usually forms an excellent probability distribution. 100 tokens would already be quite a lot in the probability distribution of a response. Other rarer tokens are so unlikely and usually so undesirable that they are never shown to you.
What is happening here is that your brain is doing the creative work by shaping the input into a form that allows you to access those tokens in the model that the model would not otherwise show you. At the same time, by modifying the input, you exclude a large part of other potential responses. In this case, I don’t see the possibility of creativity independent of a human for the language model, because it is merely executing what you try to command it to do in the input, like a slave or a robot, without actual intellectual autonomy.
I wouldn’t necessarily make the same claim, because I believe I have a very biological view of how the human brain functions. However, the human brain is so many orders of magnitude more complex and versatile than current large language models that I don’t think the comparison is appropriate. Here is an example from one of LeCun’s old papers on what the architecture of an autonomous AI might look like:
I could accept the claim of AI creativity if it had some kind of understanding of and ability to react to the surrounding world—at a minimum, an internal world model and perception, as well as the autonomy to react to stimuli with a significantly wide diversity. Current large language models are too slavishly dependent on their training data, and the underlying architecture is too simple for them to be elevated to a role greater than that of an everyday tool alongside humans.
If you aren’t already following Sebastian Raschka, this Substack is definitely worth reading:
The texts usually get quite technical, but one of his predictions for 2026 was as if written by my own pen:
A lot of LLM benchmark and performance progress will come from improved tooling and inference-time scaling rather than from training or the core model itself. It will look like LLMs are getting much better, but this will mainly be because the surrounding applications are improving. At the same time, developers will focus more on lowering latency and making reasoning models expand fewer reasoning tokens where it is unnecessary. Don’t get me wrong, 2026 will push the state-of-the-art further, but the proportion of progress will come more from the inference than purely the training side this year.









