TutorialFeatured

ComfyUI 101: The Complete Beginner's Guide

Master node-based AI image generation from scratch. This comprehensive guide covers everything from installation to advanced workflows.

January 17, 2026•45 min read

1. Introduction to ComfyUI

ComfyUI is a powerful, node-based graphical user interface for Stable Diffusion and other AI image generation models. Unlike traditional UIs with fixed interfaces, ComfyUI lets you build custom pipelines by connecting visual nodes—giving you unprecedented control over every aspect of the generation process.

Why ComfyUI?

Visual Programming: Build workflows by connecting nodes instead of writing code
Complete Control: Access every parameter of the generation pipeline
Reproducibility: Save and share workflows as JSON files
Efficiency: Only re-runs changed parts of your workflow
Extensibility: Huge ecosystem of custom nodes
Low VRAM Usage: Optimized to work on consumer GPUs

How Node-Based Workflows Work

In ComfyUI, each node performs a specific task—loading models, encoding text, sampling images, etc. Nodes have typed inputs and outputs (represented by colored dots). You connect outputs to inputs to create a data flow pipeline.

Data types include: MODEL, CLIP, VAE, CONDITIONING, LATENT, and IMAGE. Only matching types can be connected.

2. Installation Guide

System Requirements

Component	Minimum	Recommended
GPU	NVIDIA 6GB VRAM	NVIDIA 8-12GB+ VRAM
RAM	8 GB	16-32 GB
Storage	15 GB	50+ GB (SSD)
Python	3.8+	3.10+

Windows Installation (Desktop - Recommended)

Download the Windows Desktop installer from comfy.org
Run the installer and select NVIDIA GPU (or CPU if no NVIDIA)
Choose installation location (SSD recommended, ensure 15GB+ free)
Wait for installation to complete (downloads Python environment automatically)
Launch from desktop shortcut

Windows Manual Installation

Clone and setup

# Clone the repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Create virtual environment
python -m venv venv
venv\Scripts\activate

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install requirements
pip install -r requirements.txt

# Run ComfyUI
python main.py

macOS Installation (Apple Silicon)

macOS setup

# Install via Homebrew (easiest)
brew install comfyui

# OR manual installation:
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py

Note

macOS uses Metal Performance Shaders (MPS) for GPU acceleration on Apple Silicon chips. Performance is good but not as fast as NVIDIA CUDA.

Linux Installation

Linux setup

# Clone repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Create conda environment (recommended)
conda create -n comfyui python=3.10
conda activate comfyui

# Install PyTorch (NVIDIA)
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# OR for AMD (ROCm):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6

# Install requirements
pip install -r requirements.txt

# Run
python main.py

Downloading Models

After installation, you need to download Stable Diffusion checkpoints. Place them in the correct folders:

Model folder structure

ComfyUI/
├── models/
│   ├── checkpoints/     # Main SD models (.safetensors, .ckpt)
│   ├── vae/             # VAE models
│   ├── loras/           # LoRA models
│   ├── controlnet/      # ControlNet models
│   ├── upscale_models/  # Upscaler models (ESRGAN, etc.)
│   └── clip/            # CLIP models

Download models from sources like CivitAI, Hugging Face, or the official Stability AI repository.

3. Understanding the Interface

The Canvas

The main workspace is a large canvas where you place and connect nodes. You can:

Pan: Click and drag on empty space, or use middle mouse button
Zoom: Scroll wheel or pinch gesture
Select: Click on nodes, or drag a selection box
Multi-select: Hold Shift while clicking

Adding Nodes

Right-click: Opens the node menu
Double-click: Opens quick search
Drag from output: Dragging from an output and releasing on empty space shows compatible nodes

Connecting Nodes

Click and drag from an output slot (right side) to an input slot (left side). Connections are color-coded by data type:

Purple: MODEL (the diffusion model/UNet)
Yellow: CLIP (text encoder)
Red: VAE
Orange: CONDITIONING (encoded prompts)
Pink: LATENT (latent space images)
Green: IMAGE (pixel images)

Queue & Execution

Click Queue Prompt (or press Enter) to execute the workflow. ComfyUI intelligently caches results—unchanged nodes won't re-run.

4. Core Nodes Explained

Understanding these fundamental nodes is essential for building any workflow:

Load Checkpoint

The CheckpointLoaderSimple node loads your Stable Diffusion model and extracts three components:

Node	Purpose	Inputs	Outputs
CheckpointLoaderSimple	Loads SD checkpoint file	ckpt_name (dropdown)	MODEL, CLIP, VAE

Select your model from the dropdown. The node outputs the MODEL (UNet for denoising), CLIP (text encoder), and VAE (latent/pixel converter).

CLIP Text Encode

Converts your text prompt into embeddings the model can understand:

Node	Purpose	Inputs	Outputs
CLIPTextEncode	Encodes text prompt	clip, text	CONDITIONING

You typically use two of these—one for positive prompts and one for negative prompts.

Empty Latent Image

Creates a blank latent space canvas for text-to-image generation:

Node	Purpose	Inputs	Outputs
EmptyLatentImage	Creates blank latent	width, height, batch_size	LATENT

Pro Tip

For SD 1.5, use 512x512 or 768x768. For SDXL, use 1024x1024 or equivalent megapixel resolutions like 896x1152.

KSampler

The heart of image generation—iteratively denoises the latent image:

Node	Purpose	Inputs	Outputs
KSampler	Denoises latent to generate image	model, positive, negative, latent_image, seed, steps, cfg, sampler_name, scheduler, denoise	LATENT

Key parameters:

seed: Random seed for reproducibility (-1 for random)
steps: Number of denoising iterations (20-30 typical)
cfg: Classifier-free guidance scale (6-8 typical, higher = more prompt adherence)
denoise: Amount of noise to add/remove (1.0 for full generation, lower for img2img)

VAE Decode

Converts the latent image back to pixel space:

Node	Purpose	Inputs	Outputs
VAEDecode	Latent → Pixels	samples (LATENT), vae	IMAGE

Save Image

Saves the final image to disk:

Node	Purpose	Inputs	Outputs
SaveImage	Saves image to output folder	images, filename_prefix	None

5. Your First Workflow: Text-to-Image

Let's build a complete text-to-image workflow from scratch. This is the foundation for all other workflows.

Step 1: Load the Checkpoint

Right-click on the canvas → Add Node → loaders → Load Checkpoint
Select your model from the ckpt_name dropdown

Step 2: Create Prompt Encoders

Add two CLIP Text Encode nodes
Connect the CLIP output from Load Checkpoint to both
Label one "Positive" and one "Negative" (right-click → Title)
Enter your positive prompt: "a beautiful sunset over mountains, dramatic lighting, 8k, detailed"
Enter your negative prompt: "blurry, low quality, watermark, text, ugly, deformed"

Step 3: Create Empty Latent

Add Empty Latent Image node
Set width and height (512x512 for SD 1.5, 1024x1024 for SDXL)
batch_size: 1 (or more for multiple images)

Step 4: Add the Sampler

Add KSampler node
Connect: MODEL → model, Positive CONDITIONING → positive, Negative CONDITIONING → negative, LATENT → latent_image
Set: steps: 20, cfg: 7, sampler_name: euler, scheduler: normal, denoise: 1.0

Step 5: Decode and Save

Add VAE Decode node
Connect: KSampler LATENT output → samples, Load Checkpoint VAE → vae
Add Save Image node
Connect: VAE Decode IMAGE → images

Complete Workflow Diagram

Load Checkpoint

ckpt_name ▼

MODEL

CLIP

VAE

CLIP Text Encode

clip

positive prompt...

CONDITIONING

CLIP Text Encode

clip

negative prompt...

CONDITIONING

KSampler

model

positive

negative

latent_image

seed: 42

steps: 20

cfg: 7

sampler: euler

denoise: 1.0

LATENT

VAE Decode

samples

vae

IMAGE

Save Image

images

filename_prefix

Empty Latent Image

width: 512

height: 512

batch_size: 1

LATENT

Basic Text-to-Image Workflow

Pro Tip

Click "Queue Prompt" or press Enter to generate! Your image will be saved to the output folder.

6. Samplers & Schedulers Deep Dive

The sampler and scheduler determine how the model denoises the image. Different combinations produce different styles and qualities.

Popular Samplers

Sampler	Speed	Quality	Best For
euler	Fast	Good	Quick iterations, testing
euler_ancestral	Fast	Creative	More variation, artistic
dpmpp_2m	Medium	Excellent	General purpose, recommended
dpmpp_2m_sde	Medium	Excellent	Fine details, photorealism
dpmpp_3m_sde	Slower	Best	Maximum quality

Schedulers

normal: Standard linear noise schedule
karras: Improved schedule, often better results with fewer steps
exponential: Alternative schedule for some models
sgm_uniform: For certain fine-tuned models

Pro Tip

Start with dpmpp_2m + karras at 20-25 steps. This combination works well for most models.

CFG Scale Guide

1-4: Very creative, may ignore prompt
5-7: Balanced creativity and prompt following
7-10: Strong prompt adherence (recommended range)
10+: Very strict, can cause artifacts and oversaturation

7. Using LoRA Models

LoRA (Low-Rank Adaptation) models are small add-ons that modify your base model to achieve specific styles, characters, or concepts without replacing the entire checkpoint.

Setup

Download LoRA files (.safetensors) from CivitAI or other sources
Place them in ComfyUI/models/loras/
Restart ComfyUI to detect new models

Using Load LoRA Node

Node	Purpose	Inputs	Outputs
LoraLoader	Applies LoRA to model	model, clip, lora_name, strength_model, strength_clip	MODEL, CLIP

Insert the LoRA loader between your checkpoint loader and the rest of the workflow:

Load Checkpoint

MODEL

CLIP

Load LoRA

model

clip

lora_name ▼

strength_model: 0.8

strength_clip: 0.8

MODEL

CLIP

CLIP Text Encode

clip

CONDITIONING

KSampler

model

positive

LATENT

LoRA Integration Workflow

Key Parameters

strength_model: How much the LoRA affects the diffusion model (0.5-1.0 typical)
strength_clip: How much the LoRA affects text encoding (often matches strength_model)

Warning

Using strength values above 1.0 can cause distortions. Start low (0.5-0.7) and increase gradually.

Stacking Multiple LoRAs

Chain multiple Load LoRA nodes to combine effects. Each LoRA processes the MODEL and CLIP sequentially:

Checkpoint

MODEL

CLIP

LoRA 1 (Style)

model

clip

MODEL

CLIP

LoRA 2 (Character)

model

clip

MODEL

CLIP

LoRA 3 (Pose)

model

clip

MODEL

CLIP

KSampler

model

Stacking Multiple LoRAs

Pro Tip

Many LoRAs require specific trigger words in your prompt. Check the model page for required keywords!

8. ControlNet Integration

ControlNet lets you guide image generation using reference images—edges, depth maps, poses, and more. This gives you precise control over composition and structure.

Setup

Download ControlNet models (e.g., control_v11p_sd15_canny.pth)
Place in ComfyUI/models/controlnet/
Install the ControlNet Auxiliary Preprocessors custom node pack (via ComfyUI Manager)

ControlNet Types

Type	Best For	Preprocessor
Canny	Edge detection, line art	CannyEdgePreprocessor
Depth	Spatial layout, perspective	DepthAnythingPreprocessor
OpenPose	Human poses, body position	OpenPosePreprocessor
Scribble	Rough sketches	ScribblePreprocessor
Lineart	Clean line drawings	LineartPreprocessor

Basic ControlNet Workflow

Load Image

IMAGE

Canny Preprocessor

image

low_threshold: 100

high_threshold: 200

IMAGE

Load ControlNet

control_net_name ▼

CONTROL_NET

Load Checkpoint

MODEL

CLIP

VAE

CLIP Text Encode

clip

CONDITIONING

Apply ControlNet

conditioning

control_net

image

strength: 1.0

CONDITIONING

KSampler

model

positive

LATENT

VAE Decode

samples

vae

IMAGE

ControlNet Workflow

Key nodes:

Load ControlNet Model: Loads the ControlNet checkpoint
Apply ControlNet: Applies the control image to your conditioning
Preprocessor: Extracts control signal from your input image

ControlNet Parameters

strength: How strongly the control image influences generation (0.0-2.0, start at 1.0)
start_percent: When to start applying control (0.0 = beginning)
end_percent: When to stop applying control (1.0 = end)

Pro Tip

Use start_percent=0 and end_percent=0.5-0.8 to allow creative freedom in final details while maintaining structure.

9. Image-to-Image & Inpainting

Image-to-Image

Instead of starting from noise, img2img starts from an existing image. The denoise parameter controls how much of the original is preserved:

0.1-0.3: Subtle changes, keeps most of original
0.4-0.6: Moderate transformation
0.7-1.0: Heavy changes, little of original remains

Load Image

IMAGE

VAE Encode

pixels

vae

LATENT

KSampler

model

positive

negative

latent_image

denoise: 0.5

LATENT

VAE Decode

samples

vae

IMAGE

Save Image

images

Image-to-Image Workflow

Inpainting

Inpainting regenerates only masked portions of an image while preserving the rest.

Use an inpainting-trained model (e.g., sd-v1-5-inpainting.safetensors)
Load your image and create/load a mask (white = regenerate, black = keep)
Use VAE Encode (for Inpainting) node
Set grow_mask_by to expand mask edges for smoother blending

Load Image

IMAGE

Load Mask

MASK

VAE Encode (Inpaint)

pixels

mask

vae

grow_mask_by: 8

LATENT

KSampler

model

latent_image

LATENT

VAE Decode

samples

IMAGE

Inpainting Workflow

10. Upscaling Techniques

Simple Upscaling with ESRGAN

Download upscale models (RealESRGAN_x4plus, etc.) to models/upscale_models/
Add Load Upscale Model node
Add Upscale Image (using Model) node
Connect your image to the upscaler

VAE Decode

samples

IMAGE

Load Upscale Model

model_name ▼

UPSCALE_MODEL

Upscale Image (Model)

upscale_model

image

IMAGE

Save Image

images

ESRGAN Upscaling Workflow

Hires Fix (Two-Pass Upscaling)

For even better quality, upscale the latent and run through the sampler again:

Generate at lower resolution (e.g., 512x512)
Use Upscale Latent to increase latent size
Run through another KSampler with low denoise (0.3-0.5)
Decode and optionally apply ESRGAN for final upscale

Recommended Upscale Models

RealESRGAN_x4plus: Great all-purpose upscaler
RealESRGAN_x4plus_anime_6B: Optimized for anime/illustration
4x-UltraSharp: Excellent for photos
NMKD Superscale: Good balance of speed and quality

11. SDXL Workflows

SDXL (Stable Diffusion XL) produces higher quality images but has different requirements.

SDXL Basics

Optimized for 1024x1024 (or equivalent ~1 megapixel)
Uses two CLIP text encoders (text_g and text_l)
Requires 8GB+ VRAM (12GB+ recommended)
Optional refiner model for enhanced details

SDXL Text Encoding

SDXL has a special CLIP encoder node that handles both text encoders with additional parameters for resolution conditioning:

Load Checkpoint (SDXL)

MODEL

CLIP

VAE

CLIPTextEncodeSDXL

clip

text_g: "prompt..."

text_l: "prompt..."

width: 1024

height: 1024

target_width: 1024

target_height: 1024

CONDITIONING

SDXL Text Encoding

Using the Refiner

The SDXL refiner adds final polish. Use it in a two-pass workflow:

Load Checkpoint (SDXL)

MODEL

CLIP

VAE

CLIPTextEncodeSDXL

clip

text_g: "prompt..."

text_l: "prompt..."

width: 1024

height: 1024

CONDITIONING

KSampler (Base)

model

positive

latent_image

steps: 25

cfg: 7

LATENT

KSampler (Refiner)

model

latent_image

denoise: 0.3

steps: 10

LATENT

VAE Decode

samples

IMAGE

SDXL Base + Refiner Workflow

Pro Tip

For the refiner, use denoise 0.2-0.4 and fewer steps (10-15). The base model does the heavy lifting.

SDXL Resolution Guide

Aspect Ratio	Resolution	Use Case
1:1	1024 x 1024	Square, profile pictures
3:4	896 x 1152	Portrait
4:3	1152 x 896	Landscape
16:9	1344 x 768	Widescreen
9:16	768 x 1344	Mobile/Stories

12. Custom Nodes & Extensions

ComfyUI Manager

The easiest way to install custom nodes. Install it first:

cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
# Restart ComfyUI

Once installed, access Manager from the menu to search and install nodes with one click.

Essential Custom Node Packs

ComfyUI-Impact-Pack: Face detailing, segmentation, upscaling enhancements
ComfyUI-ControlNet-Aux: All ControlNet preprocessors (Canny, Depth, Pose, etc.)
WAS Node Suite: Extensive utility nodes for image processing
ComfyUI-Inpaint-Nodes: Advanced inpainting with crop/stitch
Efficiency Nodes: Streamlined workflows with combined nodes
rgthree-comfy: Quality-of-life improvements and utilities

Warning

Only install custom nodes from trusted sources. They can execute arbitrary code. Review the source before installing unknown packs.

13. Troubleshooting Common Issues

CUDA Out of Memory

Your GPU doesn't have enough VRAM.

Solutions: Lower resolution, reduce batch size, use --lowvram flag, close other GPU applications.

Black or Noisy Images

VAE mismatch or corrupted model.

Solutions: Use the correct VAE for your model, re-download the checkpoint, try a different VAE.

Model Not in Dropdown

ComfyUI hasn't detected your model.

Solutions: Check file is in correct folder, restart ComfyUI, verify file isn't corrupted.

Node Shows Red/Error State

Missing connection or invalid input.

Solutions: Check all required inputs are connected, ensure data types match (colors should align).

LoRA Has No Effect

Missing trigger word or incompatible model.

Solutions: Add the LoRA's trigger word to your prompt, verify LoRA is trained for your base model type (SD1.5 vs SDXL).

14. Resources & Next Steps

Official Resources

Model Sources

CivitAI - Largest model repository
Hugging Face - Official and community models
OpenModelDB - Upscale models

Workflow Collections

cubiq/ComfyUI_Workflows - Curated workflow collection
ComfyWorkflows.com - Searchable workflow database

What to Learn Next

Experiment with different models and LoRAs
Master ControlNet for precise composition control
Learn advanced inpainting for seamless edits
Explore video generation workflows (AnimateDiff, etc.)
Build your own reusable workflow templates
Train your own LoRAs for custom styles