← Back to Blog
TutorialFeatured

ComfyUI 101: The Complete Beginner's Guide

Master node-based AI image generation from scratch. This comprehensive guide covers everything from installation to advanced workflows.

January 17, 202645 min read

1. Introduction to ComfyUI

ComfyUI is a powerful, node-based graphical user interface for Stable Diffusion and other AI image generation models. Unlike traditional UIs with fixed interfaces, ComfyUI lets you build custom pipelines by connecting visual nodes—giving you unprecedented control over every aspect of the generation process.

Why ComfyUI?

  • Visual Programming: Build workflows by connecting nodes instead of writing code
  • Complete Control: Access every parameter of the generation pipeline
  • Reproducibility: Save and share workflows as JSON files
  • Efficiency: Only re-runs changed parts of your workflow
  • Extensibility: Huge ecosystem of custom nodes
  • Low VRAM Usage: Optimized to work on consumer GPUs

How Node-Based Workflows Work

In ComfyUI, each node performs a specific task—loading models, encoding text, sampling images, etc. Nodes have typed inputs and outputs (represented by colored dots). You connect outputs to inputs to create a data flow pipeline.

Data types include: MODEL, CLIP, VAE, CONDITIONING, LATENT, and IMAGE. Only matching types can be connected.

2. Installation Guide

System Requirements

ComponentMinimumRecommended
GPUNVIDIA 6GB VRAMNVIDIA 8-12GB+ VRAM
RAM8 GB16-32 GB
Storage15 GB50+ GB (SSD)
Python3.8+3.10+

Windows Installation (Desktop - Recommended)

  1. Download the Windows Desktop installer from comfy.org
  2. Run the installer and select NVIDIA GPU (or CPU if no NVIDIA)
  3. Choose installation location (SSD recommended, ensure 15GB+ free)
  4. Wait for installation to complete (downloads Python environment automatically)
  5. Launch from desktop shortcut

Windows Manual Installation

Clone and setup
# Clone the repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Create virtual environment
python -m venv venv
venv\Scripts\activate

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install requirements
pip install -r requirements.txt

# Run ComfyUI
python main.py

macOS Installation (Apple Silicon)

macOS setup
# Install via Homebrew (easiest)
brew install comfyui

# OR manual installation:
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py
Note
macOS uses Metal Performance Shaders (MPS) for GPU acceleration on Apple Silicon chips. Performance is good but not as fast as NVIDIA CUDA.

Linux Installation

Linux setup
# Clone repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Create conda environment (recommended)
conda create -n comfyui python=3.10
conda activate comfyui

# Install PyTorch (NVIDIA)
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# OR for AMD (ROCm):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6

# Install requirements
pip install -r requirements.txt

# Run
python main.py

Downloading Models

After installation, you need to download Stable Diffusion checkpoints. Place them in the correct folders:

Model folder structure
ComfyUI/
├── models/
│   ├── checkpoints/     # Main SD models (.safetensors, .ckpt)
│   ├── vae/             # VAE models
│   ├── loras/           # LoRA models
│   ├── controlnet/      # ControlNet models
│   ├── upscale_models/  # Upscaler models (ESRGAN, etc.)
│   └── clip/            # CLIP models

Download models from sources like CivitAI, Hugging Face, or the official Stability AI repository.

3. Understanding the Interface

The Canvas

The main workspace is a large canvas where you place and connect nodes. You can:

  • Pan: Click and drag on empty space, or use middle mouse button
  • Zoom: Scroll wheel or pinch gesture
  • Select: Click on nodes, or drag a selection box
  • Multi-select: Hold Shift while clicking

Adding Nodes

  • Right-click: Opens the node menu
  • Double-click: Opens quick search
  • Drag from output: Dragging from an output and releasing on empty space shows compatible nodes

Connecting Nodes

Click and drag from an output slot (right side) to an input slot (left side). Connections are color-coded by data type:

  • Purple: MODEL (the diffusion model/UNet)
  • Yellow: CLIP (text encoder)
  • Red: VAE
  • Orange: CONDITIONING (encoded prompts)
  • Pink: LATENT (latent space images)
  • Green: IMAGE (pixel images)

Queue & Execution

Click Queue Prompt (or press Enter) to execute the workflow. ComfyUI intelligently caches results—unchanged nodes won't re-run.

4. Core Nodes Explained

Understanding these fundamental nodes is essential for building any workflow:

Load Checkpoint

The CheckpointLoaderSimple node loads your Stable Diffusion model and extracts three components:

NodePurposeInputsOutputs
CheckpointLoaderSimpleLoads SD checkpoint fileckpt_name (dropdown)MODEL, CLIP, VAE

Select your model from the dropdown. The node outputs the MODEL (UNet for denoising), CLIP (text encoder), and VAE (latent/pixel converter).

CLIP Text Encode

Converts your text prompt into embeddings the model can understand:

NodePurposeInputsOutputs
CLIPTextEncodeEncodes text promptclip, textCONDITIONING

You typically use two of these—one for positive prompts and one for negative prompts.

Empty Latent Image

Creates a blank latent space canvas for text-to-image generation:

NodePurposeInputsOutputs
EmptyLatentImageCreates blank latentwidth, height, batch_sizeLATENT
Pro Tip
For SD 1.5, use 512x512 or 768x768. For SDXL, use 1024x1024 or equivalent megapixel resolutions like 896x1152.

KSampler

The heart of image generation—iteratively denoises the latent image:

NodePurposeInputsOutputs
KSamplerDenoises latent to generate imagemodel, positive, negative, latent_image, seed, steps, cfg, sampler_name, scheduler, denoiseLATENT

Key parameters:

  • seed: Random seed for reproducibility (-1 for random)
  • steps: Number of denoising iterations (20-30 typical)
  • cfg: Classifier-free guidance scale (6-8 typical, higher = more prompt adherence)
  • denoise: Amount of noise to add/remove (1.0 for full generation, lower for img2img)

VAE Decode

Converts the latent image back to pixel space:

NodePurposeInputsOutputs
VAEDecodeLatent → Pixelssamples (LATENT), vaeIMAGE

Save Image

Saves the final image to disk:

NodePurposeInputsOutputs
SaveImageSaves image to output folderimages, filename_prefixNone

5. Your First Workflow: Text-to-Image

Let's build a complete text-to-image workflow from scratch. This is the foundation for all other workflows.

Step 1: Load the Checkpoint

  1. Right-click on the canvas → Add Node → loaders → Load Checkpoint
  2. Select your model from the ckpt_name dropdown

Step 2: Create Prompt Encoders

  1. Add two CLIP Text Encode nodes
  2. Connect the CLIP output from Load Checkpoint to both
  3. Label one "Positive" and one "Negative" (right-click → Title)
  4. Enter your positive prompt: "a beautiful sunset over mountains, dramatic lighting, 8k, detailed"
  5. Enter your negative prompt: "blurry, low quality, watermark, text, ugly, deformed"

Step 3: Create Empty Latent

  1. Add Empty Latent Image node
  2. Set width and height (512x512 for SD 1.5, 1024x1024 for SDXL)
  3. batch_size: 1 (or more for multiple images)

Step 4: Add the Sampler

  1. Add KSampler node
  2. Connect: MODEL → model, Positive CONDITIONING → positive, Negative CONDITIONING → negative, LATENT → latent_image
  3. Set: steps: 20, cfg: 7, sampler_name: euler, scheduler: normal, denoise: 1.0

Step 5: Decode and Save

  1. Add VAE Decode node
  2. Connect: KSampler LATENT output → samples, Load Checkpoint VAE → vae
  3. Add Save Image node
  4. Connect: VAE Decode IMAGE → images

Complete Workflow Diagram

Load Checkpoint
ckpt_name ▼
MODEL
CLIP
VAE
CLIP Text Encode
clip
positive prompt...
CONDITIONING
CLIP Text Encode
clip
negative prompt...
CONDITIONING
KSampler
model
positive
negative
latent_image
seed: 42
steps: 20
cfg: 7
sampler: euler
denoise: 1.0
LATENT
VAE Decode
samples
vae
IMAGE
Save Image
images
filename_prefix
Empty Latent Image
width: 512
height: 512
batch_size: 1
LATENT

Basic Text-to-Image Workflow

Pro Tip
Click "Queue Prompt" or press Enter to generate! Your image will be saved to the output folder.

6. Samplers & Schedulers Deep Dive

The sampler and scheduler determine how the model denoises the image. Different combinations produce different styles and qualities.

Popular Samplers

SamplerSpeedQualityBest For
eulerFastGoodQuick iterations, testing
euler_ancestralFastCreativeMore variation, artistic
dpmpp_2mMediumExcellentGeneral purpose, recommended
dpmpp_2m_sdeMediumExcellentFine details, photorealism
dpmpp_3m_sdeSlowerBestMaximum quality

Schedulers

  • normal: Standard linear noise schedule
  • karras: Improved schedule, often better results with fewer steps
  • exponential: Alternative schedule for some models
  • sgm_uniform: For certain fine-tuned models
Pro Tip
Start with dpmpp_2m + karras at 20-25 steps. This combination works well for most models.

CFG Scale Guide

  • 1-4: Very creative, may ignore prompt
  • 5-7: Balanced creativity and prompt following
  • 7-10: Strong prompt adherence (recommended range)
  • 10+: Very strict, can cause artifacts and oversaturation

7. Using LoRA Models

LoRA (Low-Rank Adaptation) models are small add-ons that modify your base model to achieve specific styles, characters, or concepts without replacing the entire checkpoint.

Setup

  1. Download LoRA files (.safetensors) from CivitAI or other sources
  2. Place them in ComfyUI/models/loras/
  3. Restart ComfyUI to detect new models

Using Load LoRA Node

NodePurposeInputsOutputs
LoraLoaderApplies LoRA to modelmodel, clip, lora_name, strength_model, strength_clipMODEL, CLIP

Insert the LoRA loader between your checkpoint loader and the rest of the workflow:

Load Checkpoint
MODEL
CLIP
Load LoRA
model
clip
lora_name ▼
strength_model: 0.8
strength_clip: 0.8
MODEL
CLIP
CLIP Text Encode
clip
CONDITIONING
KSampler
model
positive
LATENT

LoRA Integration Workflow

Key Parameters

  • strength_model: How much the LoRA affects the diffusion model (0.5-1.0 typical)
  • strength_clip: How much the LoRA affects text encoding (often matches strength_model)
Warning
Using strength values above 1.0 can cause distortions. Start low (0.5-0.7) and increase gradually.

Stacking Multiple LoRAs

Chain multiple Load LoRA nodes to combine effects. Each LoRA processes the MODEL and CLIP sequentially:

Checkpoint
MODEL
CLIP
LoRA 1 (Style)
model
clip
MODEL
CLIP
LoRA 2 (Character)
model
clip
MODEL
CLIP
LoRA 3 (Pose)
model
clip
MODEL
CLIP
KSampler
model

Stacking Multiple LoRAs

Pro Tip
Many LoRAs require specific trigger words in your prompt. Check the model page for required keywords!

8. ControlNet Integration

ControlNet lets you guide image generation using reference images—edges, depth maps, poses, and more. This gives you precise control over composition and structure.

Setup

  1. Download ControlNet models (e.g., control_v11p_sd15_canny.pth)
  2. Place in ComfyUI/models/controlnet/
  3. Install the ControlNet Auxiliary Preprocessors custom node pack (via ComfyUI Manager)

ControlNet Types

TypeBest ForPreprocessor
CannyEdge detection, line artCannyEdgePreprocessor
DepthSpatial layout, perspectiveDepthAnythingPreprocessor
OpenPoseHuman poses, body positionOpenPosePreprocessor
ScribbleRough sketchesScribblePreprocessor
LineartClean line drawingsLineartPreprocessor

Basic ControlNet Workflow

Load Image
IMAGE
Canny Preprocessor
image
low_threshold: 100
high_threshold: 200
IMAGE
Load ControlNet
control_net_name ▼
CONTROL_NET
Load Checkpoint
MODEL
CLIP
VAE
CLIP Text Encode
clip
CONDITIONING
Apply ControlNet
conditioning
control_net
image
strength: 1.0
CONDITIONING
KSampler
model
positive
LATENT
VAE Decode
samples
vae
IMAGE

ControlNet Workflow

Key nodes:

  • Load ControlNet Model: Loads the ControlNet checkpoint
  • Apply ControlNet: Applies the control image to your conditioning
  • Preprocessor: Extracts control signal from your input image

ControlNet Parameters

  • strength: How strongly the control image influences generation (0.0-2.0, start at 1.0)
  • start_percent: When to start applying control (0.0 = beginning)
  • end_percent: When to stop applying control (1.0 = end)
Pro Tip
Use start_percent=0 and end_percent=0.5-0.8 to allow creative freedom in final details while maintaining structure.

9. Image-to-Image & Inpainting

Image-to-Image

Instead of starting from noise, img2img starts from an existing image. The denoise parameter controls how much of the original is preserved:

  • 0.1-0.3: Subtle changes, keeps most of original
  • 0.4-0.6: Moderate transformation
  • 0.7-1.0: Heavy changes, little of original remains
Load Image
IMAGE
VAE Encode
pixels
vae
LATENT
KSampler
model
positive
negative
latent_image
denoise: 0.5
LATENT
VAE Decode
samples
vae
IMAGE
Save Image
images

Image-to-Image Workflow

Inpainting

Inpainting regenerates only masked portions of an image while preserving the rest.

  1. Use an inpainting-trained model (e.g., sd-v1-5-inpainting.safetensors)
  2. Load your image and create/load a mask (white = regenerate, black = keep)
  3. Use VAE Encode (for Inpainting) node
  4. Set grow_mask_by to expand mask edges for smoother blending
Load Image
IMAGE
Load Mask
MASK
VAE Encode (Inpaint)
pixels
mask
vae
grow_mask_by: 8
LATENT
KSampler
model
latent_image
LATENT
VAE Decode
samples
IMAGE

Inpainting Workflow

10. Upscaling Techniques

Simple Upscaling with ESRGAN

  1. Download upscale models (RealESRGAN_x4plus, etc.) to models/upscale_models/
  2. Add Load Upscale Model node
  3. Add Upscale Image (using Model) node
  4. Connect your image to the upscaler
VAE Decode
samples
IMAGE
Load Upscale Model
model_name ▼
UPSCALE_MODEL
Upscale Image (Model)
upscale_model
image
IMAGE
Save Image
images

ESRGAN Upscaling Workflow

Hires Fix (Two-Pass Upscaling)

For even better quality, upscale the latent and run through the sampler again:

  1. Generate at lower resolution (e.g., 512x512)
  2. Use Upscale Latent to increase latent size
  3. Run through another KSampler with low denoise (0.3-0.5)
  4. Decode and optionally apply ESRGAN for final upscale

Recommended Upscale Models

  • RealESRGAN_x4plus: Great all-purpose upscaler
  • RealESRGAN_x4plus_anime_6B: Optimized for anime/illustration
  • 4x-UltraSharp: Excellent for photos
  • NMKD Superscale: Good balance of speed and quality

11. SDXL Workflows

SDXL (Stable Diffusion XL) produces higher quality images but has different requirements.

SDXL Basics

  • Optimized for 1024x1024 (or equivalent ~1 megapixel)
  • Uses two CLIP text encoders (text_g and text_l)
  • Requires 8GB+ VRAM (12GB+ recommended)
  • Optional refiner model for enhanced details

SDXL Text Encoding

SDXL has a special CLIP encoder node that handles both text encoders with additional parameters for resolution conditioning:

Load Checkpoint (SDXL)
MODEL
CLIP
VAE
CLIPTextEncodeSDXL
clip
text_g: "prompt..."
text_l: "prompt..."
width: 1024
height: 1024
target_width: 1024
target_height: 1024
CONDITIONING

SDXL Text Encoding

Using the Refiner

The SDXL refiner adds final polish. Use it in a two-pass workflow:

Load Checkpoint (SDXL)
MODEL
CLIP
VAE
CLIPTextEncodeSDXL
clip
text_g: "prompt..."
text_l: "prompt..."
width: 1024
height: 1024
CONDITIONING
KSampler (Base)
model
positive
latent_image
steps: 25
cfg: 7
LATENT
KSampler (Refiner)
model
latent_image
denoise: 0.3
steps: 10
LATENT
VAE Decode
samples
IMAGE

SDXL Base + Refiner Workflow

Pro Tip
For the refiner, use denoise 0.2-0.4 and fewer steps (10-15). The base model does the heavy lifting.

SDXL Resolution Guide

Aspect RatioResolutionUse Case
1:11024 x 1024Square, profile pictures
3:4896 x 1152Portrait
4:31152 x 896Landscape
16:91344 x 768Widescreen
9:16768 x 1344Mobile/Stories

12. Custom Nodes & Extensions

ComfyUI Manager

The easiest way to install custom nodes. Install it first:

cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
# Restart ComfyUI

Once installed, access Manager from the menu to search and install nodes with one click.

Essential Custom Node Packs

  • ComfyUI-Impact-Pack: Face detailing, segmentation, upscaling enhancements
  • ComfyUI-ControlNet-Aux: All ControlNet preprocessors (Canny, Depth, Pose, etc.)
  • WAS Node Suite: Extensive utility nodes for image processing
  • ComfyUI-Inpaint-Nodes: Advanced inpainting with crop/stitch
  • Efficiency Nodes: Streamlined workflows with combined nodes
  • rgthree-comfy: Quality-of-life improvements and utilities
Warning
Only install custom nodes from trusted sources. They can execute arbitrary code. Review the source before installing unknown packs.

13. Troubleshooting Common Issues

CUDA Out of Memory

Your GPU doesn't have enough VRAM.

Solutions: Lower resolution, reduce batch size, use --lowvram flag, close other GPU applications.

Black or Noisy Images

VAE mismatch or corrupted model.

Solutions: Use the correct VAE for your model, re-download the checkpoint, try a different VAE.

Model Not in Dropdown

ComfyUI hasn't detected your model.

Solutions: Check file is in correct folder, restart ComfyUI, verify file isn't corrupted.

Node Shows Red/Error State

Missing connection or invalid input.

Solutions: Check all required inputs are connected, ensure data types match (colors should align).

LoRA Has No Effect

Missing trigger word or incompatible model.

Solutions: Add the LoRA's trigger word to your prompt, verify LoRA is trained for your base model type (SD1.5 vs SDXL).

14. Resources & Next Steps

Model Sources

Workflow Collections

What to Learn Next

  1. Experiment with different models and LoRAs
  2. Master ControlNet for precise composition control
  3. Learn advanced inpainting for seamless edits
  4. Explore video generation workflows (AnimateDiff, etc.)
  5. Build your own reusable workflow templates
  6. Train your own LoRAs for custom styles
ComfyUI 101: The Complete Beginner's Guide | NSFW.DEV