- 175
- 501 233 738
3Blue1Brown
United States
Приєднався 3 бер 2015
My name is Grant Sanderson. Videos here cover a variety of topics in math, or adjacent fields like physics and CS, all with an emphasis on visualizing the core ideas. The goal is to use animation to help elucidate and motivate otherwise tricky topics, and for difficult problems to be made simple with changes in perspective.
For more information, other projects, FAQs, and inquiries see the website: www.3blue1brown.com
For more information, other projects, FAQs, and inquiries see the website: www.3blue1brown.com
Attention in transformers, visually explained | Chapter 6, Deep Learning
Demystifying attention, the key mechanism inside transformers and LLMs.
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
Special thanks to these supporters: www.3blue1brown.com/lessons/attention#thanks
An equally valuable form of support is to simply share the videos.
Demystifying self-attention, multiple heads, and cross-attention.
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
The first pass for the translated subtitles here is machine-generated, and therefore notably imperfect. To contribute edits or fixes, visit translate.3blue1brown.com/
And yes, at 22:00 (and elsewhere), "breaks" is a typo.
------------------
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
ua-cam.com/video/kCc8FmEb1nY/v-deo.html
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
ua-cam.com/video/1il-s4mgNdI/v-deo.htmlsi=XaVxj6bsdy3VkgEX
If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
transformer-circuits.pub/2021/framework/index.html
Site with exercises related to ML programming and GPTs
www.gptandchill.ai/codingproblems
History of language models by Brit Cruise, @ArtOfTheProblem
ua-cam.com/video/OFS90-FX6pg/v-deo.html
An early paper on how directions in embedding spaces have meaning:
arxiv.org/pdf/1301.3781.pdf
------------------
Timestamps:
0:00 - Recap on embeddings
1:39 - Motivating examples
4:29 - The attention pattern
11:08 - Masking
12:42 - Context size
13:10 - Values
15:44 - Counting parameters
18:21 - Cross-attention
19:19 - Multiple heads
22:16 - The output matrix
23:19 - Going deeper
24:54 - Ending
------------------
These animations are largely made using a custom Python library, manim. See the FAQ comments here:
3b1b.co/faq#manim
github.com/3b1b/manim
github.com/ManimCommunity/manim/
All code for specific videos is visible here:
github.com/3b1b/videos/
The music is by Vincent Rubinetti.
www.vincentrubinetti.com
vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown
open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u
------------------
3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on UA-cam or otherwise following on whichever platform below you check most regularly.
Mailing list: 3blue1brown.substack.com
Twitter: 3blue1brown
Instagram: 3blue1brown
Reddit: www.reddit.com/r/3blue1brown
Facebook: 3blue1brown
Patreon: patreon.com/3blue1brown
Website: www.3blue1brown.com
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
Special thanks to these supporters: www.3blue1brown.com/lessons/attention#thanks
An equally valuable form of support is to simply share the videos.
Demystifying self-attention, multiple heads, and cross-attention.
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
The first pass for the translated subtitles here is machine-generated, and therefore notably imperfect. To contribute edits or fixes, visit translate.3blue1brown.com/
And yes, at 22:00 (and elsewhere), "breaks" is a typo.
------------------
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
ua-cam.com/video/kCc8FmEb1nY/v-deo.html
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
ua-cam.com/video/1il-s4mgNdI/v-deo.htmlsi=XaVxj6bsdy3VkgEX
If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
transformer-circuits.pub/2021/framework/index.html
Site with exercises related to ML programming and GPTs
www.gptandchill.ai/codingproblems
History of language models by Brit Cruise, @ArtOfTheProblem
ua-cam.com/video/OFS90-FX6pg/v-deo.html
An early paper on how directions in embedding spaces have meaning:
arxiv.org/pdf/1301.3781.pdf
------------------
Timestamps:
0:00 - Recap on embeddings
1:39 - Motivating examples
4:29 - The attention pattern
11:08 - Masking
12:42 - Context size
13:10 - Values
15:44 - Counting parameters
18:21 - Cross-attention
19:19 - Multiple heads
22:16 - The output matrix
23:19 - Going deeper
24:54 - Ending
------------------
These animations are largely made using a custom Python library, manim. See the FAQ comments here:
3b1b.co/faq#manim
github.com/3b1b/manim
github.com/ManimCommunity/manim/
All code for specific videos is visible here:
github.com/3b1b/videos/
The music is by Vincent Rubinetti.
www.vincentrubinetti.com
vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown
open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u
------------------
3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on UA-cam or otherwise following on whichever platform below you check most regularly.
Mailing list: 3blue1brown.substack.com
Twitter: 3blue1brown
Instagram: 3blue1brown
Reddit: www.reddit.com/r/3blue1brown
Facebook: 3blue1brown
Patreon: patreon.com/3blue1brown
Website: www.3blue1brown.com
Переглядів: 891 826
Відео
But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning
Переглядів 2,2 млнМісяць тому
Unpacking how large language models work under the hood Early view of the next chapter for patrons: 3b1b.co/early-attention Special thanks to these supporters: 3b1b.co/lessons/gpt#thanks To contribute edits to the subtitles, visit translate.3blue1brown.com/ Other recommended resources on the topic. Richard Turner's introduction is one of the best starting places: arxiv.org/pdf/2304.10557.pdf Co...
4 questions about the refractive index | Optics puzzles 4
Переглядів 667 тис.5 місяців тому
4 questions about the refractive index | Optics puzzles 4
But why would light "slow down"? | Optics puzzles 3
Переглядів 1,2 млн5 місяців тому
But why would light "slow down"? | Optics puzzles 3
25 Math explainers you may enjoy | SoME3 results
Переглядів 544 тис.7 місяців тому
25 Math explainers you may enjoy | SoME3 results
Explaining the barber pole effect from origins of light | Optics puzzles 2
Переглядів 689 тис.8 місяців тому
Explaining the barber pole effect from origins of light | Optics puzzles 2
Polarized light in sugar water | Optics puzzles 1
Переглядів 1 млн8 місяців тому
Polarized light in sugar water | Optics puzzles 1
A pretty reason why Gaussian + Gaussian = Gaussian
Переглядів 756 тис.9 місяців тому
A pretty reason why Gaussian Gaussian = Gaussian
This pattern breaks, but for a good reason | Moser's circle problem
Переглядів 1,9 млн10 місяців тому
This pattern breaks, but for a good reason | Moser's circle problem
How They Fool Ya (live) | Math parody of Hallelujah
Переглядів 948 тис.10 місяців тому
How They Fool Ya (live) | Math parody of Hallelujah
Convolutions | Why X+Y in probability is a beautiful mess
Переглядів 629 тис.10 місяців тому
Convolutions | Why X Y in probability is a beautiful mess
Why π is in the normal distribution (beyond integral tricks)
Переглядів 1,5 млнРік тому
Why π is in the normal distribution (beyond integral tricks)
But what is the Central Limit Theorem?
Переглядів 3,3 млнРік тому
But what is the Central Limit Theorem?
Researchers thought this was a bug (Borwein integrals)
Переглядів 3,3 млнРік тому
Researchers thought this was a bug (Borwein integrals)
What makes a great math explanation? | SoME2 results
Переглядів 736 тис.Рік тому
What makes a great math explanation? | SoME2 results
Olympiad level counting (Generating functions)
Переглядів 1,9 млнРік тому
Olympiad level counting (Generating functions)
Oh, wait, actually the best Wordle opener is not “crane”…
Переглядів 6 млн2 роки тому
Oh, wait, actually the best Wordle opener is not “crane”…
Solving Wordle using information theory
Переглядів 10 млн2 роки тому
Solving Wordle using information theory
A tale of two problem solvers (Average cube shadows)
Переглядів 2,7 млн2 роки тому
A tale of two problem solvers (Average cube shadows)
2021 Summer of Math Exposition results
Переглядів 777 тис.2 роки тому
2021 Summer of Math Exposition results
Beyond the Mandelbrot set, an intro to holomorphic dynamics
Переглядів 1,4 млн2 роки тому
Beyond the Mandelbrot set, an intro to holomorphic dynamics
From Newton’s method to Newton’s fractal (which Newton knew nothing about)
Переглядів 2,8 млн2 роки тому
From Newton’s method to Newton’s fractal (which Newton knew nothing about)
A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra
Переглядів 981 тис.3 роки тому
A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra
How (and why) to raise e to the power of a matrix | DE6
Переглядів 2,7 млн3 роки тому
How (and why) to raise e to the power of a matrix | DE6
The medical test paradox, and redesigning Bayes' rule
Переглядів 1,2 млн3 роки тому
The medical test paradox, and redesigning Bayes' rule
Hamming codes part 2: The one-line implementation
Переглядів 835 тис.3 роки тому
Hamming codes part 2: The one-line implementation
But what are Hamming codes? The origin of error correction
Переглядів 2,3 млн3 роки тому
But what are Hamming codes? The origin of error correction
😮👍
62
Wow
Nice
A small triangle in the middle is left 😅😅
So with 10000000000000000000000000000000000000000kg the number of collisions are gonna be 314159265358797323486?
Me when listening to my math teacher, understanding the start of the lesson but when they twist it, my mind gets twisted too
Tryna find space …
Pushups
where's link to the code ?
output of this code: 1.0471975511965976 what does it mean ?
1.0471975511965976: This is approximately equal to 60 degrees when converted from radians to degrees. So, the output 1.0471975511965976 means that at the specified point in time, the pendulum has swung to an angle of approximately 60 degrees from its starting position. theta(10000)
3:30 I think this areal equivalence is also used in cartography by Mercator.
I am hoping for a video about LSTM/RNNs.
hey thank you a lot for the great content. I have a naive question though: since θ is defined as the angle, the change along the perimeter is 2πr*dθ/360 degree; r=1 so it's 2π*dθ/360. Basically dθ is where I got lost at. Much appreciated if anyone can help me out
Катушка,риле.
Why pie?
So the quadratic formula and the formula for eigenvalue using m and p are almost the same, except that the first b in quadratic formula is negated but the m outside sqrt is not.
You are wrong the equation works when the common ratio is less than 1 If the common ratio is greater or more than 1 then sum to infinite is infinity
This is u positive predictive value is sometimes the more important figure.
You have to have provided the simplest explanation of calculus at the most fundamental level. Truly brilliant!
Please consider offering a VR experience, Id pay!
Shouldn’t it end with the smaller cube having stopped?
I'm just bored and wanted to learn something before class starts again, thanks for the great video and a very meticulous explanation providing a deeper comprehension of the subject. So, again, thank you so much for making this video.
Great series!
That is fucking bizarre and so cool
See: 1D: a line spinning over a dot • Can cast a "shadow intensity" on the dot (according to its verticality) from zero to its length • So its average is (0+length)/2 2D: a square spinning over a line • Can cast a shadow on the line from its side length to its diagonal (cast at max rotation) • So its average is (side length+diagonal)/2 3D: a cube spinning over a surface • Can cast a shadow on the surface from the area of its sides to the area of the hexagon cast at max rotation • So its average is (area of sides+area of hexagon cast)/2 Which for a cube with side 1 is: (3✓3)/16.
THE ANIMATIONS ARE SO ON POINT!! Also this kinda looks like waves.of the ocean!!
Nobody ever stops to think what point in time is this question asked. If this is asked in the year 3000 there are no farmers, only librarians. People's way of thinking is so narrow.
U can see the EM waves propagating from the change when it moves
the way i screamed pi when he zoomed in on 3141 ☠️
The pi fact at the end 🤯🤯🤯🤯🤯🤯🤯
The wave propagation outward looks like the explanation for the big bang how matter is expanding out, the start of the atomic matter we know is the beginning of a wave transmission.
Thank you for making math suck less
IT MAKES SENSE
The answer is 1G. There's half an hour of my life i'll never get back.
Can someone explain 13:47? Why would you subtract the value at the lower bound?
Well, if you are allowed to converse prior to the flip, on the 3 square board, just make the one with the key different from the rest
4048 is a power of 2
The Best explanation, thanks!
I would flip the other T next to it and hope for the best. It would make the T with the coin under it the only one with H's all around it. It a long shot but I feel like it's the only thing I could do.
27:07 physicists when they do math: "the theorem applies for n going to infinity so let's assume it also applies for n = 100 since 100 is big"
What happens to the fractal shape on complex numbers z where P'(z)=0?
Giger counter
Can you calculate pi using graphs and integrals?
Why did this video blow up so much
Thank you!
Watching the animation at 5:30 and realizing the determinant is just the outer product (without the orientation information)
uhm what the sigma is going on over there
Flip it on the coins edge. Heads or tails isn’t technically a 50/50 chance
Imagine infinite kg
what game is this?
Anlık türkler