3Blue1Brown
3Blue1Brown
  • 175
  • 501 233 738
Attention in transformers, visually explained | Chapter 6, Deep Learning
Demystifying attention, the key mechanism inside transformers and LLMs.
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
Special thanks to these supporters: www.3blue1brown.com/lessons/attention#thanks
An equally valuable form of support is to simply share the videos.
Demystifying self-attention, multiple heads, and cross-attention.
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
The first pass for the translated subtitles here is machine-generated, and therefore notably imperfect. To contribute edits or fixes, visit translate.3blue1brown.com/
And yes, at 22:00 (and elsewhere), "breaks" is a typo.
------------------
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
ua-cam.com/video/kCc8FmEb1nY/v-deo.html
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
ua-cam.com/video/1il-s4mgNdI/v-deo.htmlsi=XaVxj6bsdy3VkgEX
If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
transformer-circuits.pub/2021/framework/index.html
Site with exercises related to ML programming and GPTs
www.gptandchill.ai/codingproblems
History of language models by Brit Cruise, @ArtOfTheProblem
ua-cam.com/video/OFS90-FX6pg/v-deo.html
An early paper on how directions in embedding spaces have meaning:
arxiv.org/pdf/1301.3781.pdf
------------------
Timestamps:
0:00 - Recap on embeddings
1:39 - Motivating examples
4:29 - The attention pattern
11:08 - Masking
12:42 - Context size
13:10 - Values
15:44 - Counting parameters
18:21 - Cross-attention
19:19 - Multiple heads
22:16 - The output matrix
23:19 - Going deeper
24:54 - Ending
------------------
These animations are largely made using a custom Python library, manim. See the FAQ comments here:
3b1b.co/faq#manim
github.com/3b1b/manim
github.com/ManimCommunity/manim/
All code for specific videos is visible here:
github.com/3b1b/videos/
The music is by Vincent Rubinetti.
www.vincentrubinetti.com
vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown
open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u
------------------
3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on UA-cam or otherwise following on whichever platform below you check most regularly.
Mailing list: 3blue1brown.substack.com
Twitter: 3blue1brown
Instagram: 3blue1brown
Reddit: www.reddit.com/r/3blue1brown
Facebook: 3blue1brown
Patreon: patreon.com/3blue1brown
Website: www.3blue1brown.com
Переглядів: 891 826

Відео

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning
Переглядів 2,2 млнМісяць тому
Unpacking how large language models work under the hood Early view of the next chapter for patrons: 3b1b.co/early-attention Special thanks to these supporters: 3b1b.co/lessons/gpt#thanks To contribute edits to the subtitles, visit translate.3blue1brown.com/ Other recommended resources on the topic. Richard Turner's introduction is one of the best starting places: arxiv.org/pdf/2304.10557.pdf Co...
4 questions about the refractive index | Optics puzzles 4
Переглядів 667 тис.5 місяців тому
4 questions about the refractive index | Optics puzzles 4
But why would light "slow down"? | Optics puzzles 3
Переглядів 1,2 млн5 місяців тому
But why would light "slow down"? | Optics puzzles 3
25 Math explainers you may enjoy | SoME3 results
Переглядів 544 тис.7 місяців тому
25 Math explainers you may enjoy | SoME3 results
Explaining the barber pole effect from origins of light | Optics puzzles 2
Переглядів 689 тис.8 місяців тому
Explaining the barber pole effect from origins of light | Optics puzzles 2
Polarized light in sugar water | Optics puzzles 1
Переглядів 1 млн8 місяців тому
Polarized light in sugar water | Optics puzzles 1
A pretty reason why Gaussian + Gaussian = Gaussian
Переглядів 756 тис.9 місяців тому
A pretty reason why Gaussian Gaussian = Gaussian
This pattern breaks, but for a good reason | Moser's circle problem
Переглядів 1,9 млн10 місяців тому
This pattern breaks, but for a good reason | Moser's circle problem
How They Fool Ya (live) | Math parody of Hallelujah
Переглядів 948 тис.10 місяців тому
How They Fool Ya (live) | Math parody of Hallelujah
Convolutions | Why X+Y in probability is a beautiful mess
Переглядів 629 тис.10 місяців тому
Convolutions | Why X Y in probability is a beautiful mess
Why π is in the normal distribution (beyond integral tricks)
Переглядів 1,5 млнРік тому
Why π is in the normal distribution (beyond integral tricks)
But what is the Central Limit Theorem?
Переглядів 3,3 млнРік тому
But what is the Central Limit Theorem?
But what is a convolution?
Переглядів 2,5 млнРік тому
But what is a convolution?
Researchers thought this was a bug (Borwein integrals)
Переглядів 3,3 млнРік тому
Researchers thought this was a bug (Borwein integrals)
What makes a great math explanation? | SoME2 results
Переглядів 736 тис.Рік тому
What makes a great math explanation? | SoME2 results
How to lie using visual proofs
Переглядів 3,1 млнРік тому
How to lie using visual proofs
Olympiad level counting (Generating functions)
Переглядів 1,9 млнРік тому
Olympiad level counting (Generating functions)
Oh, wait, actually the best Wordle opener is not “crane”…
Переглядів 6 млн2 роки тому
Oh, wait, actually the best Wordle opener is not “crane”…
Solving Wordle using information theory
Переглядів 10 млн2 роки тому
Solving Wordle using information theory
A tale of two problem solvers (Average cube shadows)
Переглядів 2,7 млн2 роки тому
A tale of two problem solvers (Average cube shadows)
2021 Summer of Math Exposition results
Переглядів 777 тис.2 роки тому
2021 Summer of Math Exposition results
Beyond the Mandelbrot set, an intro to holomorphic dynamics
Переглядів 1,4 млн2 роки тому
Beyond the Mandelbrot set, an intro to holomorphic dynamics
From Newton’s method to Newton’s fractal (which Newton knew nothing about)
Переглядів 2,8 млн2 роки тому
From Newton’s method to Newton’s fractal (which Newton knew nothing about)
The Summer of Math Exposition
Переглядів 722 тис.2 роки тому
The Summer of Math Exposition
A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra
Переглядів 981 тис.3 роки тому
A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra
How (and why) to raise e to the power of a matrix | DE6
Переглядів 2,7 млн3 роки тому
How (and why) to raise e to the power of a matrix | DE6
The medical test paradox, and redesigning Bayes' rule
Переглядів 1,2 млн3 роки тому
The medical test paradox, and redesigning Bayes' rule
Hamming codes part 2: The one-line implementation
Переглядів 835 тис.3 роки тому
Hamming codes part 2: The one-line implementation
But what are Hamming codes? The origin of error correction
Переглядів 2,3 млн3 роки тому
But what are Hamming codes? The origin of error correction

КОМЕНТАРІ

  • @_Interstellar_Space_
    @_Interstellar_Space_ 19 годин тому

    😮👍

  • @shingtaitaitepoon4088
    @shingtaitaitepoon4088 19 годин тому

    62

  • @JS33137
    @JS33137 20 годин тому

    Wow

  • @firedragonninjabeastyt7724
    @firedragonninjabeastyt7724 20 годин тому

    Nice

  • @ayushmankashyap2740
    @ayushmankashyap2740 20 годин тому

    A small triangle in the middle is left 😅😅

  • @DaBgold
    @DaBgold 20 годин тому

    So with 10000000000000000000000000000000000000000kg the number of collisions are gonna be 314159265358797323486?

  • @madamesomeone8756
    @madamesomeone8756 21 годину тому

    Me when listening to my math teacher, understanding the start of the lesson but when they twist it, my mind gets twisted too

  • @lavidaloca266h
    @lavidaloca266h 21 годину тому

    Tryna find space …

  • @kirill_good_job
    @kirill_good_job 21 годину тому

    where's link to the code ?

    • @kirill_good_job
      @kirill_good_job 19 годин тому

      output of this code: 1.0471975511965976 what does it mean ?

    • @kirill_good_job
      @kirill_good_job 19 годин тому

      1.0471975511965976: This is approximately equal to 60 degrees when converted from radians to degrees. So, the output 1.0471975511965976 means that at the specified point in time, the pendulum has swung to an angle of approximately 60 degrees from its starting position. theta(10000)

  • @blaubeersauce353
    @blaubeersauce353 22 години тому

    3:30 I think this areal equivalence is also used in cartography by Mercator.

  • @kabelloseskabel7029
    @kabelloseskabel7029 22 години тому

    I am hoping for a video about LSTM/RNNs.

  • @chenyujin9602
    @chenyujin9602 22 години тому

    hey thank you a lot for the great content. I have a naive question though: since θ is defined as the angle, the change along the perimeter is 2πr*dθ/360 degree; r=1 so it's 2π*dθ/360. Basically dθ is where I got lost at. Much appreciated if anyone can help me out

  • @user-rj7rm8uw1n
    @user-rj7rm8uw1n 23 години тому

    Катушка,риле.

  • @JasperLee-jq9ef
    @JasperLee-jq9ef 23 години тому

    Why pie?

  • @alfonso.k
    @alfonso.k 23 години тому

    So the quadratic formula and the formula for eigenvalue using m and p are almost the same, except that the first b in quadratic formula is negated but the m outside sqrt is not.

  • @barathwinmaster8637
    @barathwinmaster8637 День тому

    You are wrong the equation works when the common ratio is less than 1 If the common ratio is greater or more than 1 then sum to infinite is infinity

  • @smarty4822
    @smarty4822 День тому

    This is u positive predictive value is sometimes the more important figure.

  • @andrewwotherspoona5722
    @andrewwotherspoona5722 День тому

    You have to have provided the simplest explanation of calculus at the most fundamental level. Truly brilliant!

  • @YourStoryTeam
    @YourStoryTeam День тому

    Please consider offering a VR experience, Id pay!

  • @sekiro_the_one-armed_wolf
    @sekiro_the_one-armed_wolf День тому

    Shouldn’t it end with the smaller cube having stopped?

  • @user-pg4vs5tq7n
    @user-pg4vs5tq7n День тому

    I'm just bored and wanted to learn something before class starts again, thanks for the great video and a very meticulous explanation providing a deeper comprehension of the subject. So, again, thank you so much for making this video.

  • @rohan.saxena
    @rohan.saxena День тому

    Great series!

  • @1TieDye1
    @1TieDye1 День тому

    That is fucking bizarre and so cool

  • @tomasseeber
    @tomasseeber День тому

    See: 1D: a line spinning over a dot • Can cast a "shadow intensity" on the dot (according to its verticality) from zero to its length • So its average is (0+length)/2 2D: a square spinning over a line • Can cast a shadow on the line from its side length to its diagonal (cast at max rotation) • So its average is (side length+diagonal)/2 3D: a cube spinning over a surface • Can cast a shadow on the surface from the area of its sides to the area of the hexagon cast at max rotation • So its average is (area of sides+area of hexagon cast)/2 Which for a cube with side 1 is: (3✓3)/16.

  • @lalalanding234
    @lalalanding234 День тому

    THE ANIMATIONS ARE SO ON POINT!! Also this kinda looks like waves.of the ocean!!

  • @claudioestevez61
    @claudioestevez61 День тому

    Nobody ever stops to think what point in time is this question asked. If this is asked in the year 3000 there are no farmers, only librarians. People's way of thinking is so narrow.

  • @Player-pj9kt
    @Player-pj9kt День тому

    U can see the EM waves propagating from the change when it moves

  • @P0rC3L41n
    @P0rC3L41n День тому

    the way i screamed pi when he zoomed in on 3141 ☠️

  • @dustyandhuckarebabies
    @dustyandhuckarebabies День тому

    The pi fact at the end 🤯🤯🤯🤯🤯🤯🤯

  • @righteousoutlaw5116
    @righteousoutlaw5116 День тому

    The wave propagation outward looks like the explanation for the big bang how matter is expanding out, the start of the atomic matter we know is the beginning of a wave transmission.

  • @kylev.8248
    @kylev.8248 День тому

    Thank you for making math suck less

  • @durandoo1134
    @durandoo1134 День тому

    IT MAKES SENSE

  • @trishasaoirse1511
    @trishasaoirse1511 День тому

    The answer is 1G. There's half an hour of my life i'll never get back.

  • @naoimleschad
    @naoimleschad День тому

    Can someone explain 13:47? Why would you subtract the value at the lower bound?

  • @Lil_Deej1
    @Lil_Deej1 День тому

    Well, if you are allowed to converse prior to the flip, on the 3 square board, just make the one with the key different from the rest

  • @user-ps4th8tc5d
    @user-ps4th8tc5d День тому

    4048 is a power of 2

  • @toshaxar
    @toshaxar День тому

    The Best explanation, thanks!

  • @keziha5315
    @keziha5315 День тому

    I would flip the other T next to it and hope for the best. It would make the T with the coin under it the only one with H's all around it. It a long shot but I feel like it's the only thing I could do.

  • @superjulian0245
    @superjulian0245 День тому

    27:07 physicists when they do math: "the theorem applies for n going to infinity so let's assume it also applies for n = 100 since 100 is big"

  • @user-wo7yp8vi4n
    @user-wo7yp8vi4n День тому

    What happens to the fractal shape on complex numbers z where P'(z)=0?

  • @hans7408
    @hans7408 День тому

    Giger counter

  • @josephcoulter7994
    @josephcoulter7994 День тому

    Can you calculate pi using graphs and integrals?

  • @Denccmbr_
    @Denccmbr_ День тому

    Why did this video blow up so much

  • @samuelbartik5265
    @samuelbartik5265 День тому

    Thank you!

  • @xvrqt
    @xvrqt День тому

    Watching the animation at 5:30 and realizing the determinant is just the outer product (without the orientation information)

  • @quirty8966
    @quirty8966 День тому

    uhm what the sigma is going on over there

  • @porterrobertson517
    @porterrobertson517 День тому

    Flip it on the coins edge. Heads or tails isn’t technically a 50/50 chance

  • @holoperfection
    @holoperfection День тому

    Imagine infinite kg

  • @TheAnnoyedAssasain
    @TheAnnoyedAssasain День тому

    what game is this?

  • @GamingInhere
    @GamingInhere День тому

    Anlık türkler