Knowledge Share
Browse by topic
All notes
-
Machine Learning Self Attention At its core, self-attention is a sequence-to-sequence operation. It takes a sequence of vectors and produces a new sequence of vectors of the same…
-
Machine Learning Vision Transformer (ViT) The Vision Transformer (ViT) represents a massive paradigm shift in computer vision. Introduced by Google in 2020 ("An Image is Worth 16x16 Words"), it…
-
Machine Learning Diffusion Transformer (DiT) To understand the math behind the Diffusion Transformer (DiT), we have to separate it into two distinct parts: the mathematical framework (the…
-
Machine Learning Classifier-Free Guidance (CFG) Classifier-Free Guidance (CFG) is arguably the most critical technique for achieving high-fidelity, strongly aligned generations in modern diffusion…
-
Machine Learning Backpropagation Backpropagation (short for "backward propagation of errors") is the mathematical engine that allows neural networks to learn. At its core, it is an…
-
Machine Learning AutoDiff Automatic differentiation (AutoDiff) is the algorithmic foundation that makes modern machine learning frameworks like PyTorch and JAX possible. While…
-
Machine Learning KL Divergence At its core, Kullback-Leibler (KL) Divergence is a statistical measure of how much one probability distribution differs from a second, reference…
-
Machine Learning ELBO (Evidence Lower Bound) In Bayesian inference and generative modeling, the Evidence Lower Bound (ELBO) is a crucial quantity used to approximate the marginal likelihood (the…
-
Machine Learning Diffusion Model Diffusion models, specifically Denoising Diffusion Probabilistic Models (DDPMs), are generative models that learn to create data by reversing a gradual…
-
Machine Learning Diffusion from Stochastic Differential Equations (SDEs) Perspective From a mathematical perspective, diffusion models are fundamentally about defining a trajectory between a complex, intractable data distribution and a…
-
Machine Learning VAE vs. Diffusion from ELBO Perspective It is fascinating that two entirely different generative paradigms—Variational Autoencoders (VAEs) and Diffusion Models—are mathematically rooted in…
-
Machine Learning DDIM (Denoising Diffusion Implicit Models) The fundamental difference between DDPM (Denoising Diffusion Probabilistic Models) and DDIM (Denoising Diffusion Implicit Models) lies entirely in the…
-
Machine Learning DDPMs, DDIMs, and Score-Based Methods The connection between DDPMs, DDIMs, and Score-Based Generative Models is one of the most elegant unifying theories in modern machine learning…
-
Machine Learning Flow Matching Flow matching is a highly effective mathematical framework for generative modeling. It serves as an alternative to Diffusion Models and provides a more…
-
Machine Learning Mean Flow Notes on mean-flow generative modeling and its connection to flow matching.
-
Machine Learning Improved Mean Flow (iMF) Improved mean flow (iMF) — faster sampling and training for flow-based generative models.
-
Computer Vision Optical Flow Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene, caused by the relative motion between an observer (an…
-
Computer Vision Lucas-Kanade Method Example To see exactly how the Lucas-Kanade (LK) method solves for optical flow, we need to walk through the least-squares approximation.
-
Computer Vision RAFT(Recurrent All-Pairs Field Transforms) RAFT (Recurrent All-Pairs Field Transforms) represents a major paradigm shift from traditional optical flow methods like Lucas-Kanade.
-
Computer Vision Camera Intrinsic Matrix Intrinsics K, focal length, principal point, and pixel skew.
-
Computer Vision Camera Extrinsic Matrix Estimating the camera extrinsic matrix—which defines the rigid transformation from the world coordinate system to the camera's local 3D coordinate…
-
Computer Vision COLMAP COLMAP is an end-to-end pipeline for Structure-from-Motion (SfM) and Multi-View Stereo (MVS). It takes a collection of 2D images and mathematically…
-
Computer Vision Bundle Adjustment Bundle adjustment is the cornerstone of 3D reconstruction, Structure from Motion (SfM), and visual SLAM. At its core, it is a large-scale, non-linear…
-
Optimization Proximal Algorithms Proximal algorithms are a class of optimization methods designed to handle objective functions that are non-smooth, constrained, or split into multiple…
-
Optimization Neural Proximal Operators This is the exact conceptual leap that birthed the Plug-and-Play (PnP) and Regularization by Denoising (RED) frameworks, revolutionizing how we solve…
-
Optimization Analytical Proximal Operators As a quick refresher, the proximal operator of a scaled convex function \(\lambda f(x)\) evaluated at a point \(v\) is defined as:
-
Optimization HQS (Half-Quadratic Splitting) It is designed to minimize an objective function that consists of two competing terms: a data fidelity term (how well the solution matches the…
-
Optimization ADMM (Alternating Direction Method of Multipliers) The Alternating Direction Method of Multipliers (ADMM) is a powerful algorithm that solves convex optimization problems by breaking them into smaller,…
-
Optimization Lagrangian Method The standard Lagrangian is a mathematical trick to turn a *constrained* problem into an *unconstrained* one. It does this by taking the hard rules…
-
Computer Graphics 3D Gaussian Splatting 3D Gaussian Splatting (3DGS) is a breakthrough technique in computer graphics and computer vision for novel view synthesis. It emerged as a faster,…
-
Computer Graphics NeRF Neural Radiance Fields (NeRF) represent a breakthrough approach to synthesizing novel views of complex 3D scenes from a sparse set of 2D images.…
-
Geometry Generation Geometry Generation 3D shape synthesis, neural implicit surfaces, and mesh generation methods.
-
Computational Imaging PnP with Diffusion Plug-and-play priors with diffusion models for computational imaging inverse problems.
-
Computational Imaging Generative Methods for Deconv At its core, deconvolution is fundamentally ill-posed. Information is lost when an image is blurred, meaning multiple different sharp images could…
No notes match your search.