175
501 233 738

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

27:14

4 questions about the refractive index | Optics puzzles 4

13:25

But why would light "slow down"? | Optics puzzles 3

29:24

25 Math explainers you may enjoy | SoME3 results

22:12

Explaining the barber pole effect from origins of light | Optics puzzles 2

21:33

Polarized light in sugar water | Optics puzzles 1

9:57

Attention in transformers, visually explained | Chapter 6, Deep Learning

Demystifying attention, the key mechanism inside transformers and LLMs.
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
Special thanks to these supporters: www.3blue1brown.com/lessons/attention#thanks
An equally valuable form of support is to simply share the videos.
Demystifying self-attention, multiple heads, and cross-attention.
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
The first pass for the translated subtitles here is machine-generated, and therefore notably imperfect. To contribute edits or fixes, visit translate.3blue1brown.com/
And yes, at 22:00 (and elsewhere), "breaks" is a typo.
------------------
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
ua-cam.com/video/kCc8FmEb1nY/v-deo.html
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
ua-cam.com/video/1il-s4mgNdI/v-deo.htmlsi=XaVxj6bsdy3VkgEX
If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
transformer-circuits.pub/2021/framework/index.html
Site with exercises related to ML programming and GPTs
www.gptandchill.ai/codingproblems
History of language models by Brit Cruise, @ArtOfTheProblem
ua-cam.com/video/OFS90-FX6pg/v-deo.html
An early paper on how directions in embedding spaces have meaning:
arxiv.org/pdf/1301.3781.pdf
------------------
Timestamps:
0:00 - Recap on embeddings
1:39 - Motivating examples
4:29 - The attention pattern
11:08 - Masking
12:42 - Context size
13:10 - Values
15:44 - Counting parameters
18:21 - Cross-attention
19:19 - Multiple heads
22:16 - The output matrix
23:19 - Going deeper
24:54 - Ending
------------------
These animations are largely made using a custom Python library, manim. See the FAQ comments here:
3b1b.co/faq#manim
github.com/3b1b/manim
github.com/ManimCommunity/manim/
All code for specific videos is visible here:
github.com/3b1b/videos/
The music is by Vincent Rubinetti.
www.vincentrubinetti.com
vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown
open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u
------------------
3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on UA-cam or otherwise following on whichever platform below you check most regularly.
Mailing list: 3blue1brown.substack.com
Twitter: 3blue1brown
Instagram: 3blue1brown
Reddit: www.reddit.com/r/3blue1brown
Facebook: 3blue1brown
Patreon: patreon.com/3blue1brown
Website: www.3blue1brown.com

Відео

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

27:14

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

Переглядів 2,2 млнМісяць тому

Unpacking how large language models work under the hood Early view of the next chapter for patrons: 3b1b.co/early-attention Special thanks to these supporters: 3b1b.co/lessons/gpt#thanks To contribute edits to the subtitles, visit translate.3blue1brown.com/ Other recommended resources on the topic. Richard Turner's introduction is one of the best starting places: arxiv.org/pdf/2304.10557.pdf Co...

$4 questions about the refractive index | Optics puzzles 4$ 13:25 $4 questions about the refractive index | Optics puzzles 4$

4 questions about the refractive index | Optics puzzles 4

Переглядів 667 тис.5 місяців тому

4 questions about the refractive index | Optics puzzles 4

But why would light "slow down"? | Optics puzzles 3

29:24

But why would light "slow down"? | Optics puzzles 3

Переглядів 1,2 млн5 місяців тому

But why would light "slow down"? | Optics puzzles 3

25 Math explainers you may enjoy | SoME3 results

22:12

25 Math explainers you may enjoy | SoME3 results

Переглядів 544 тис.7 місяців тому

25 Math explainers you may enjoy | SoME3 results

Explaining the barber pole effect from origins of light | Optics puzzles 2

21:33

Explaining the barber pole effect from origins of light | Optics puzzles 2

Переглядів 689 тис.8 місяців тому

Explaining the barber pole effect from origins of light | Optics puzzles 2

Polarized light in sugar water | Optics puzzles 1

9:57

Polarized light in sugar water | Optics puzzles 1

Переглядів 1 млн8 місяців тому

Polarized light in sugar water | Optics puzzles 1

A pretty reason why Gaussian + Gaussian = Gaussian

13:16

A pretty reason why Gaussian + Gaussian = Gaussian

Переглядів 756 тис.9 місяців тому

A pretty reason why Gaussian Gaussian = Gaussian

This pattern breaks, but for a good reason | Moser's circle problem

16:13

This pattern breaks, but for a good reason | Moser's circle problem

Переглядів 1,9 млн10 місяців тому

This pattern breaks, but for a good reason | Moser's circle problem

How They Fool Ya (live) | Math parody of Hallelujah

4:00

How They Fool Ya (live) | Math parody of Hallelujah

Переглядів 948 тис.10 місяців тому

How They Fool Ya (live) | Math parody of Hallelujah

Convolutions | Why X+Y in probability is a beautiful mess

27:25

Convolutions | Why X+Y in probability is a beautiful mess

Переглядів 629 тис.10 місяців тому

Convolutions | Why X Y in probability is a beautiful mess

Why π is in the normal distribution (beyond integral tricks)

24:46

Why π is in the normal distribution (beyond integral tricks)

Переглядів 1,5 млнРік тому

Why π is in the normal distribution (beyond integral tricks)

31:15

But what is the Central Limit Theorem?

Переглядів 3,3 млнРік тому

But what is the Central Limit Theorem?

23:01

But what is a convolution?

Переглядів 2,5 млнРік тому

But what is a convolution?

Researchers thought this was a bug (Borwein integrals)

17:26

Researchers thought this was a bug (Borwein integrals)

Переглядів 3,3 млнРік тому

Researchers thought this was a bug (Borwein integrals)

What makes a great math explanation? | SoME2 results

17:01

What makes a great math explanation? | SoME2 results

Переглядів 736 тис.Рік тому

What makes a great math explanation? | SoME2 results

18:49

How to lie using visual proofs

Переглядів 3,1 млнРік тому

How to lie using visual proofs

Olympiad level counting (Generating functions)

34:36

Olympiad level counting (Generating functions)

Переглядів 1,9 млнРік тому

Olympiad level counting (Generating functions)

Oh, wait, actually the best Wordle opener is not “crane”…

10:53

Oh, wait, actually the best Wordle opener is not “crane”…

Переглядів 6 млн2 роки тому

Oh, wait, actually the best Wordle opener is not “crane”…

30:38

Solving Wordle using information theory

Переглядів 10 млн2 роки тому

Solving Wordle using information theory

A tale of two problem solvers (Average cube shadows)

40:06

A tale of two problem solvers (Average cube shadows)

Переглядів 2,7 млн2 роки тому

A tale of two problem solvers (Average cube shadows)

12:40

2021 Summer of Math Exposition results

Переглядів 777 тис.2 роки тому

2021 Summer of Math Exposition results

Beyond the Mandelbrot set, an intro to holomorphic dynamics

27:36

Beyond the Mandelbrot set, an intro to holomorphic dynamics

Переглядів 1,4 млн2 роки тому

Beyond the Mandelbrot set, an intro to holomorphic dynamics

$From Newton’s method to Newton’s fractal (which Newton knew nothing about)$ 26:06 $From Newton’s method to Newton’s fractal (which Newton knew nothing about)$

From Newton’s method to Newton’s fractal (which Newton knew nothing about)

Переглядів 2,8 млн2 роки тому

From Newton’s method to Newton’s fractal (which Newton knew nothing about)

24:21

The Summer of Math Exposition

Переглядів 722 тис.2 роки тому

The Summer of Math Exposition

A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra

13:13

A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra

Переглядів 981 тис.3 роки тому

A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra

How (and why) to raise e to the power of a matrix | DE6

27:07

How (and why) to raise e to the power of a matrix | DE6

Переглядів 2,7 млн3 роки тому

How (and why) to raise e to the power of a matrix | DE6

The medical test paradox, and redesigning Bayes' rule

21:14

The medical test paradox, and redesigning Bayes' rule

Переглядів 1,2 млн3 роки тому

The medical test paradox, and redesigning Bayes' rule

Hamming codes part 2: The one-line implementation

16:50

Hamming codes part 2: The one-line implementation

Переглядів 835 тис.3 роки тому

Hamming codes part 2: The one-line implementation

But what are Hamming codes? The origin of error correction

20:05

But what are Hamming codes? The origin of error correction

Переглядів 2,3 млн3 роки тому

But what are Hamming codes? The origin of error correction

КОМЕНТАРІ

@_Interstellar_Space_ 19 годин тому
😮👍
@shingtaitaitepoon4088 19 годин тому
62
@JS33137 20 годин тому
Wow
@firedragonninjabeastyt7724 20 годин тому
Nice
@ayushmankashyap2740 20 годин тому
A small triangle in the middle is left 😅😅
@DaBgold 20 годин тому
So with 10000000000000000000000000000000000000000kg the number of collisions are gonna be 314159265358797323486?
@madamesomeone8756 21 годину тому
Me when listening to my math teacher, understanding the start of the lesson but when they twist it, my mind gets twisted too
@lavidaloca266h 21 годину тому
Tryna find space …
@lavidaloca266h 21 годину тому
Pushups
@kirill_good_job 21 годину тому
where's link to the code ?
@kirill_good_job 19 годин тому
output of this code: 1.0471975511965976 what does it mean ?
@kirill_good_job 19 годин тому
1.0471975511965976: This is approximately equal to 60 degrees when converted from radians to degrees. So, the output 1.0471975511965976 means that at the specified point in time, the pendulum has swung to an angle of approximately 60 degrees from its starting position. theta(10000)
@blaubeersauce353 22 години тому
3:30 I think this areal equivalence is also used in cartography by Mercator.
@kabelloseskabel7029 22 години тому
I am hoping for a video about LSTM/RNNs.
@chenyujin9602 22 години тому
hey thank you a lot for the great content. I have a naive question though: since θ is defined as the angle, the change along the perimeter is 2πr*dθ/360 degree; r=1 so it's 2π*dθ/360. Basically dθ is where I got lost at. Much appreciated if anyone can help me out
@user-rj7rm8uw1n 23 години тому
Катушка,риле.
@JasperLee-jq9ef 23 години тому
Why pie?
@alfonso.k 23 години тому
So the quadratic formula and the formula for eigenvalue using m and p are almost the same, except that the first b in quadratic formula is negated but the m outside sqrt is not.
@barathwinmaster8637 День тому
You are wrong the equation works when the common ratio is less than 1 If the common ratio is greater or more than 1 then sum to infinite is infinity
@smarty4822 День тому
This is u positive predictive value is sometimes the more important figure.
@andrewwotherspoona5722 День тому
You have to have provided the simplest explanation of calculus at the most fundamental level. Truly brilliant!
@YourStoryTeam День тому
Please consider offering a VR experience, Id pay!
@sekiro_the_one-armed_wolf День тому
Shouldn’t it end with the smaller cube having stopped?
@user-pg4vs5tq7n День тому
I'm just bored and wanted to learn something before class starts again, thanks for the great video and a very meticulous explanation providing a deeper comprehension of the subject. So, again, thank you so much for making this video.
@rohan.saxena День тому
Great series!
@1TieDye1 День тому
That is fucking bizarre and so cool
@tomasseeber День тому
See: 1D: a line spinning over a dot • Can cast a "shadow intensity" on the dot (according to its verticality) from zero to its length • So its average is (0+length)/2 2D: a square spinning over a line • Can cast a shadow on the line from its side length to its diagonal (cast at max rotation) • So its average is (side length+diagonal)/2 3D: a cube spinning over a surface • Can cast a shadow on the surface from the area of its sides to the area of the hexagon cast at max rotation • So its average is (area of sides+area of hexagon cast)/2 Which for a cube with side 1 is: (3✓3)/16.
@lalalanding234 День тому
THE ANIMATIONS ARE SO ON POINT!! Also this kinda looks like waves.of the ocean!!
@claudioestevez61 День тому
Nobody ever stops to think what point in time is this question asked. If this is asked in the year 3000 there are no farmers, only librarians. People's way of thinking is so narrow.
@Player-pj9kt День тому
U can see the EM waves propagating from the change when it moves
@P0rC3L41n День тому
the way i screamed pi when he zoomed in on 3141 ☠️
@dustyandhuckarebabies День тому
The pi fact at the end 🤯🤯🤯🤯🤯🤯🤯
@righteousoutlaw5116 День тому
The wave propagation outward looks like the explanation for the big bang how matter is expanding out, the start of the atomic matter we know is the beginning of a wave transmission.
@kylev.8248 День тому
Thank you for making math suck less
@durandoo1134 День тому
IT MAKES SENSE
@trishasaoirse1511 День тому
The answer is 1G. There's half an hour of my life i'll never get back.
@naoimleschad День тому
Can someone explain 13:47? Why would you subtract the value at the lower bound?
@Lil_Deej1 День тому
Well, if you are allowed to converse prior to the flip, on the 3 square board, just make the one with the key different from the rest
@user-ps4th8tc5d День тому
4048 is a power of 2
@toshaxar День тому
The Best explanation, thanks!
@keziha5315 День тому
I would flip the other T next to it and hope for the best. It would make the T with the coin under it the only one with H's all around it. It a long shot but I feel like it's the only thing I could do.
@superjulian0245 День тому
27:07 physicists when they do math: "the theorem applies for n going to infinity so let's assume it also applies for n = 100 since 100 is big"
@user-wo7yp8vi4n День тому
What happens to the fractal shape on complex numbers z where P'(z)=0?
@hans7408 День тому
Giger counter
@josephcoulter7994 День тому
Can you calculate pi using graphs and integrals?
@Denccmbr_ День тому
Why did this video blow up so much
@samuelbartik5265 День тому
Thank you!
@xvrqt День тому
Watching the animation at 5:30 and realizing the determinant is just the outer product (without the orientation information)
@quirty8966 День тому
uhm what the sigma is going on over there
@porterrobertson517 День тому
Flip it on the coins edge. Heads or tails isn’t technically a 50/50 chance
@holoperfection День тому
Imagine infinite kg
@TheAnnoyedAssasain День тому
what game is this?
@GamingInhere День тому
Anlık türkler

3Blue1Brown

КОМЕНТАРІ