ComfyUI 101: The Complete Beginner's Guide
Master node-based AI image generation from scratch. This comprehensive guide covers everything from installation to advanced workflows.
1. Introduction to ComfyUI
ComfyUI is a powerful, node-based graphical user interface for Stable Diffusion and other AI image generation models. Unlike traditional UIs with fixed interfaces, ComfyUI lets you build custom pipelines by connecting visual nodes—giving you unprecedented control over every aspect of the generation process.
Why ComfyUI?
- Visual Programming: Build workflows by connecting nodes instead of writing code
- Complete Control: Access every parameter of the generation pipeline
- Reproducibility: Save and share workflows as JSON files
- Efficiency: Only re-runs changed parts of your workflow
- Extensibility: Huge ecosystem of custom nodes
- Low VRAM Usage: Optimized to work on consumer GPUs
How Node-Based Workflows Work
In ComfyUI, each node performs a specific task—loading models, encoding text, sampling images, etc. Nodes have typed inputs and outputs (represented by colored dots). You connect outputs to inputs to create a data flow pipeline.
Data types include: MODEL, CLIP, VAE, CONDITIONING, LATENT, and IMAGE. Only matching types can be connected.
2. Installation Guide
System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA 6GB VRAM | NVIDIA 8-12GB+ VRAM |
| RAM | 8 GB | 16-32 GB |
| Storage | 15 GB | 50+ GB (SSD) |
| Python | 3.8+ | 3.10+ |
Windows Installation (Desktop - Recommended)
- Download the Windows Desktop installer from
comfy.org - Run the installer and select NVIDIA GPU (or CPU if no NVIDIA)
- Choose installation location (SSD recommended, ensure 15GB+ free)
- Wait for installation to complete (downloads Python environment automatically)
- Launch from desktop shortcut
Windows Manual Installation
# Clone the repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# Create virtual environment
python -m venv venv
venv\Scripts\activate
# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Install requirements
pip install -r requirements.txt
# Run ComfyUI
python main.pymacOS Installation (Apple Silicon)
# Install via Homebrew (easiest)
brew install comfyui
# OR manual installation:
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.pyLinux Installation
# Clone repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# Create conda environment (recommended)
conda create -n comfyui python=3.10
conda activate comfyui
# Install PyTorch (NVIDIA)
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
# OR for AMD (ROCm):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
# Install requirements
pip install -r requirements.txt
# Run
python main.pyDownloading Models
After installation, you need to download Stable Diffusion checkpoints. Place them in the correct folders:
ComfyUI/
├── models/
│ ├── checkpoints/ # Main SD models (.safetensors, .ckpt)
│ ├── vae/ # VAE models
│ ├── loras/ # LoRA models
│ ├── controlnet/ # ControlNet models
│ ├── upscale_models/ # Upscaler models (ESRGAN, etc.)
│ └── clip/ # CLIP modelsDownload models from sources like CivitAI, Hugging Face, or the official Stability AI repository.
3. Understanding the Interface
The Canvas
The main workspace is a large canvas where you place and connect nodes. You can:
- Pan: Click and drag on empty space, or use middle mouse button
- Zoom: Scroll wheel or pinch gesture
- Select: Click on nodes, or drag a selection box
- Multi-select: Hold Shift while clicking
Adding Nodes
- Right-click: Opens the node menu
- Double-click: Opens quick search
- Drag from output: Dragging from an output and releasing on empty space shows compatible nodes
Connecting Nodes
Click and drag from an output slot (right side) to an input slot (left side). Connections are color-coded by data type:
- Purple: MODEL (the diffusion model/UNet)
- Yellow: CLIP (text encoder)
- Red: VAE
- Orange: CONDITIONING (encoded prompts)
- Pink: LATENT (latent space images)
- Green: IMAGE (pixel images)
Queue & Execution
Click Queue Prompt (or press Enter) to execute the workflow. ComfyUI intelligently caches results—unchanged nodes won't re-run.
4. Core Nodes Explained
Understanding these fundamental nodes is essential for building any workflow:
Load Checkpoint
The CheckpointLoaderSimple node loads your Stable Diffusion model and extracts three components:
| Node | Purpose | Inputs | Outputs |
|---|---|---|---|
| CheckpointLoaderSimple | Loads SD checkpoint file | ckpt_name (dropdown) | MODEL, CLIP, VAE |
Select your model from the dropdown. The node outputs the MODEL (UNet for denoising), CLIP (text encoder), and VAE (latent/pixel converter).
CLIP Text Encode
Converts your text prompt into embeddings the model can understand:
| Node | Purpose | Inputs | Outputs |
|---|---|---|---|
| CLIPTextEncode | Encodes text prompt | clip, text | CONDITIONING |
You typically use two of these—one for positive prompts and one for negative prompts.
Empty Latent Image
Creates a blank latent space canvas for text-to-image generation:
| Node | Purpose | Inputs | Outputs |
|---|---|---|---|
| EmptyLatentImage | Creates blank latent | width, height, batch_size | LATENT |
KSampler
The heart of image generation—iteratively denoises the latent image:
| Node | Purpose | Inputs | Outputs |
|---|---|---|---|
| KSampler | Denoises latent to generate image | model, positive, negative, latent_image, seed, steps, cfg, sampler_name, scheduler, denoise | LATENT |
Key parameters:
- seed: Random seed for reproducibility (-1 for random)
- steps: Number of denoising iterations (20-30 typical)
- cfg: Classifier-free guidance scale (6-8 typical, higher = more prompt adherence)
- denoise: Amount of noise to add/remove (1.0 for full generation, lower for img2img)
VAE Decode
Converts the latent image back to pixel space:
| Node | Purpose | Inputs | Outputs |
|---|---|---|---|
| VAEDecode | Latent → Pixels | samples (LATENT), vae | IMAGE |
Save Image
Saves the final image to disk:
| Node | Purpose | Inputs | Outputs |
|---|---|---|---|
| SaveImage | Saves image to output folder | images, filename_prefix | None |
5. Your First Workflow: Text-to-Image
Let's build a complete text-to-image workflow from scratch. This is the foundation for all other workflows.
Step 1: Load the Checkpoint
- Right-click on the canvas → Add Node → loaders → Load Checkpoint
- Select your model from the
ckpt_namedropdown
Step 2: Create Prompt Encoders
- Add two
CLIP Text Encodenodes - Connect the CLIP output from Load Checkpoint to both
- Label one "Positive" and one "Negative" (right-click → Title)
- Enter your positive prompt: "a beautiful sunset over mountains, dramatic lighting, 8k, detailed"
- Enter your negative prompt: "blurry, low quality, watermark, text, ugly, deformed"
Step 3: Create Empty Latent
- Add
Empty Latent Imagenode - Set width and height (512x512 for SD 1.5, 1024x1024 for SDXL)
- batch_size: 1 (or more for multiple images)
Step 4: Add the Sampler
- Add
KSamplernode - Connect: MODEL → model, Positive CONDITIONING → positive, Negative CONDITIONING → negative, LATENT → latent_image
- Set: steps: 20, cfg: 7, sampler_name: euler, scheduler: normal, denoise: 1.0
Step 5: Decode and Save
- Add
VAE Decodenode - Connect: KSampler LATENT output → samples, Load Checkpoint VAE → vae
- Add
Save Imagenode - Connect: VAE Decode IMAGE → images
Complete Workflow Diagram
Basic Text-to-Image Workflow
output folder.6. Samplers & Schedulers Deep Dive
The sampler and scheduler determine how the model denoises the image. Different combinations produce different styles and qualities.
Popular Samplers
| Sampler | Speed | Quality | Best For |
|---|---|---|---|
| euler | Fast | Good | Quick iterations, testing |
| euler_ancestral | Fast | Creative | More variation, artistic |
| dpmpp_2m | Medium | Excellent | General purpose, recommended |
| dpmpp_2m_sde | Medium | Excellent | Fine details, photorealism |
| dpmpp_3m_sde | Slower | Best | Maximum quality |
Schedulers
- normal: Standard linear noise schedule
- karras: Improved schedule, often better results with fewer steps
- exponential: Alternative schedule for some models
- sgm_uniform: For certain fine-tuned models
dpmpp_2m + karras at 20-25 steps. This combination works well for most models.CFG Scale Guide
- 1-4: Very creative, may ignore prompt
- 5-7: Balanced creativity and prompt following
- 7-10: Strong prompt adherence (recommended range)
- 10+: Very strict, can cause artifacts and oversaturation
7. Using LoRA Models
LoRA (Low-Rank Adaptation) models are small add-ons that modify your base model to achieve specific styles, characters, or concepts without replacing the entire checkpoint.
Setup
- Download LoRA files (.safetensors) from CivitAI or other sources
- Place them in
ComfyUI/models/loras/ - Restart ComfyUI to detect new models
Using Load LoRA Node
| Node | Purpose | Inputs | Outputs |
|---|---|---|---|
| LoraLoader | Applies LoRA to model | model, clip, lora_name, strength_model, strength_clip | MODEL, CLIP |
Insert the LoRA loader between your checkpoint loader and the rest of the workflow:
LoRA Integration Workflow
Key Parameters
- strength_model: How much the LoRA affects the diffusion model (0.5-1.0 typical)
- strength_clip: How much the LoRA affects text encoding (often matches strength_model)
Stacking Multiple LoRAs
Chain multiple Load LoRA nodes to combine effects. Each LoRA processes the MODEL and CLIP sequentially:
Stacking Multiple LoRAs
8. ControlNet Integration
ControlNet lets you guide image generation using reference images—edges, depth maps, poses, and more. This gives you precise control over composition and structure.
Setup
- Download ControlNet models (e.g., control_v11p_sd15_canny.pth)
- Place in
ComfyUI/models/controlnet/ - Install the ControlNet Auxiliary Preprocessors custom node pack (via ComfyUI Manager)
ControlNet Types
| Type | Best For | Preprocessor |
|---|---|---|
| Canny | Edge detection, line art | CannyEdgePreprocessor |
| Depth | Spatial layout, perspective | DepthAnythingPreprocessor |
| OpenPose | Human poses, body position | OpenPosePreprocessor |
| Scribble | Rough sketches | ScribblePreprocessor |
| Lineart | Clean line drawings | LineartPreprocessor |
Basic ControlNet Workflow
ControlNet Workflow
Key nodes:
- Load ControlNet Model: Loads the ControlNet checkpoint
- Apply ControlNet: Applies the control image to your conditioning
- Preprocessor: Extracts control signal from your input image
ControlNet Parameters
- strength: How strongly the control image influences generation (0.0-2.0, start at 1.0)
- start_percent: When to start applying control (0.0 = beginning)
- end_percent: When to stop applying control (1.0 = end)
9. Image-to-Image & Inpainting
Image-to-Image
Instead of starting from noise, img2img starts from an existing image. The denoise parameter controls how much of the original is preserved:
- 0.1-0.3: Subtle changes, keeps most of original
- 0.4-0.6: Moderate transformation
- 0.7-1.0: Heavy changes, little of original remains
Image-to-Image Workflow
Inpainting
Inpainting regenerates only masked portions of an image while preserving the rest.
- Use an inpainting-trained model (e.g., sd-v1-5-inpainting.safetensors)
- Load your image and create/load a mask (white = regenerate, black = keep)
- Use
VAE Encode (for Inpainting)node - Set
grow_mask_byto expand mask edges for smoother blending
Inpainting Workflow
10. Upscaling Techniques
Simple Upscaling with ESRGAN
- Download upscale models (RealESRGAN_x4plus, etc.) to
models/upscale_models/ - Add
Load Upscale Modelnode - Add
Upscale Image (using Model)node - Connect your image to the upscaler
ESRGAN Upscaling Workflow
Hires Fix (Two-Pass Upscaling)
For even better quality, upscale the latent and run through the sampler again:
- Generate at lower resolution (e.g., 512x512)
- Use
Upscale Latentto increase latent size - Run through another KSampler with low denoise (0.3-0.5)
- Decode and optionally apply ESRGAN for final upscale
Recommended Upscale Models
- RealESRGAN_x4plus: Great all-purpose upscaler
- RealESRGAN_x4plus_anime_6B: Optimized for anime/illustration
- 4x-UltraSharp: Excellent for photos
- NMKD Superscale: Good balance of speed and quality
11. SDXL Workflows
SDXL (Stable Diffusion XL) produces higher quality images but has different requirements.
SDXL Basics
- Optimized for 1024x1024 (or equivalent ~1 megapixel)
- Uses two CLIP text encoders (text_g and text_l)
- Requires 8GB+ VRAM (12GB+ recommended)
- Optional refiner model for enhanced details
SDXL Text Encoding
SDXL has a special CLIP encoder node that handles both text encoders with additional parameters for resolution conditioning:
SDXL Text Encoding
Using the Refiner
The SDXL refiner adds final polish. Use it in a two-pass workflow:
SDXL Base + Refiner Workflow
SDXL Resolution Guide
| Aspect Ratio | Resolution | Use Case |
|---|---|---|
| 1:1 | 1024 x 1024 | Square, profile pictures |
| 3:4 | 896 x 1152 | Portrait |
| 4:3 | 1152 x 896 | Landscape |
| 16:9 | 1344 x 768 | Widescreen |
| 9:16 | 768 x 1344 | Mobile/Stories |
12. Custom Nodes & Extensions
ComfyUI Manager
The easiest way to install custom nodes. Install it first:
cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
# Restart ComfyUIOnce installed, access Manager from the menu to search and install nodes with one click.
Essential Custom Node Packs
- ComfyUI-Impact-Pack: Face detailing, segmentation, upscaling enhancements
- ComfyUI-ControlNet-Aux: All ControlNet preprocessors (Canny, Depth, Pose, etc.)
- WAS Node Suite: Extensive utility nodes for image processing
- ComfyUI-Inpaint-Nodes: Advanced inpainting with crop/stitch
- Efficiency Nodes: Streamlined workflows with combined nodes
- rgthree-comfy: Quality-of-life improvements and utilities
13. Troubleshooting Common Issues
CUDA Out of Memory
Your GPU doesn't have enough VRAM.
Solutions: Lower resolution, reduce batch size, use --lowvram flag, close other GPU applications.
Black or Noisy Images
VAE mismatch or corrupted model.
Solutions: Use the correct VAE for your model, re-download the checkpoint, try a different VAE.
Model Not in Dropdown
ComfyUI hasn't detected your model.
Solutions: Check file is in correct folder, restart ComfyUI, verify file isn't corrupted.
Node Shows Red/Error State
Missing connection or invalid input.
Solutions: Check all required inputs are connected, ensure data types match (colors should align).
LoRA Has No Effect
Missing trigger word or incompatible model.
Solutions: Add the LoRA's trigger word to your prompt, verify LoRA is trained for your base model type (SD1.5 vs SDXL).
14. Resources & Next Steps
Official Resources
Model Sources
- CivitAI - Largest model repository
- Hugging Face - Official and community models
- OpenModelDB - Upscale models
Workflow Collections
- cubiq/ComfyUI_Workflows - Curated workflow collection
- ComfyWorkflows.com - Searchable workflow database
What to Learn Next
- Experiment with different models and LoRAs
- Master ControlNet for precise composition control
- Learn advanced inpainting for seamless edits
- Explore video generation workflows (AnimateDiff, etc.)
- Build your own reusable workflow templates
- Train your own LoRAs for custom styles