Video review (Deep Dive into LLMs like ChatGPT)

February 12, 2025

I has chance to go over the video Deep Dive into LLMs like ChatGPT by Andrej Karpathy during the weekend. First of all, this is an awesome video for anyone who want to understand how LLMs works(Just like the title). This video don't cover the nitty-gritty of the mathematical details of each component in an LLM in a LLM, so it is super friendly for anyone even you are not a math/technical person. Also, the examples and words in the video is very intuitive, makes easier to understand.

The video first describe how a LLM is build, the process is divided into 3 steps, pre-train, post-train, reinforcement learning.

In the pre-train step. Essentially the company collected all the available data in the internet then feed in to the model, in this step, the data quantity is more important, and the training process is very expensive bunch of super GPU run for days and costs a lots, the pre-train don't happened often, probably annually. The web page first been turn into plain text then stored into .txt files, then we use tokennizer(a model to turn text into number vector), then uses supervised learning to fedd into giant NN to plikroduce output, where the output would be the next token, then we update the weight so on so for. That is a brief summarize what is happened in this step.

At the end of the pre-train, you will have a model called Base model, but this is not enough, the base model just generate documents sequence by given input. To turn the model into Assistance, we uses post-training, where we turn the llm model into chat assistance, in the basic format, we will design/create dataset with conversation pair ex:

[ {'human': 'What is the weather like today?', 'Assistance': 'It is sunny and warm.'}, {'human': 'And what about tomorrow?', 'Assistance': 'Tomorrow it will be cloudy.'}, {'human': 'Will it rain?', 'Assistance': 'There is a 30% chance of rain.'} ]

The model is then trained with datasets like the one above. The post-training step typically requires less data than pre-training, but the data needs to be high quality, which involves significantly more labor to verify its correctness. This is also an opportunity to provide the model with cognitive knowledge and enable it to respond with "I don't know" to unknown queries. In the video, the author points out that earlier models often fabricated answers. This is partly because the training data was unbalanced, lacking "I don't know" responses, so the model always answered confidently, even when incorrect. This tendency, while amusing, highlights the importance of balanced training data. Cognitive knowledge involves training the model on information about itself, such as its creators and capabilities. Safety features are also included in this step, training the model on what it should and should not answer."

At this point, some of the LLM are actually ready to use and act like the ChatGPT models, on the third step which is the reinforcement learning step, we give the model opportunity to think, to generate the logic behind the answer and drive the correct answer by trials. The feature we see on the more mordern model we see today that included in this step such as DeepSeek Deepthink, OpenAi reasoning.. is archive by using reinforcement learning. The idea of the reinforcement learning in the video, is like given the model prompt(question) and answer then ask the model to figure out the needed steps and the correct path to the correct answer. In my understanding the reinforcement learning very align with how human actually learn, you going to the environment, you have different actions, when you perform different actions at different state, you get different reward. In the llm setting, the agent will be the llm itself, and action will be the llm generate the words, the state can be artificial state, then the reward is the evaluation of the answer, so that is the core idea. Now, there are tons of technical details I've skipped, since it's probably still a top-tier research problem. One good reference is the RLHF paper published by OpenAI. We can imagine that evaluating LLM outputs can be very difficult. For some prompts, like 'write a poem' or 'write a joke,' humans themselves have to evaluate the output, and evaluating the output millions of times is not feasible. RLHF proposes a technique where the model generates different outcomes, and then humans rank or score these outcomes. These rankings or scores are then used to train a reward model, which approximates human preferences and guides the LLM during reinforcement learning. However, since we don't have control over how the model learned the topic, sometime the model generate stuff don't make sense to us such as 'The joke is the the the the the the the the.....', which... is funny but is a bad joke, so the model will sort of trick because it does whatever to maximize the reward.

At the end of the video, the Author discuss the potential of reinforcement learning, where supervised learning can get the model archive the highest standard of human knowledge with shorter time, but reinforcement learning is the way to exceed the boundary of the highest human knowledge or boundary, at least now is the way. Then follow by some of the potential usage of LLM, such as tokenize the audio, picture, video then develop large picture model, large audio model....etc. Finally, the future is exciting.

Is is been the review of the video. I really enjoyed it.

Banff

June 25, 2024

Amazing Banff!

Some thought on using selenium on auto login to ChatGPT.

June 05, 2024

Problem

If the website uses CAPTCHA to verify human beings, then just using the login credentials is not enough. Some methods I tried are listed below.

CAPTCHA Solving Service

These service APIs do not really solve the problem because I think it is against the purpose of what I want to do in the first place.

Login First.

So, I plan to take some action on the page after login. One alternately way is to export the cookies from the website as JSON(YAML, .env, whatever really), then use driver.add_cookie(cookie) functions with logged-in URL(The one with session id) to log into the website. This login into the web successfully. But unfortunately, the cookies does exprie in a period of time, in order to build a succful auto bot on ChatGPT. It will require the following pieces. - A script to auto import cookies into work dir.

Have a test in the pipeline to test if the cookie expired.
Have the real crendential.
Then the rest..

Logistic Regression

May 14, 2024

The logistic regression is a classic classification method. The motivation is to modify Logit(P(Y=1)) = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ which transform the linear relationship of the feature into 0, and 1(binary outcome), we do this by using the sigmoid function. We can then calculate construct the probability function for each feature in the dataset.
When we build the probability function. We build the maximum likelihood function that tells us by certain β. we will have the best result for this logistic regression.
Then we optimize the MLE formula first by taking the log then use some optimation tool such as gradient descent or newton.

Recall And Precision