Fastsdcpu
    Fastsdcpu

    Fastsdcpu

    Fast stable diffusion on CPU

    3.7

    GitHub Stats

    Stars

    1730

    Forks

    154

    Release Date

    5/6/2025

    about 2 months ago

    Detailed Description

    fastsd cpu :sparkles:mentioned in awesome openvino

    fastsd cpu is a faster version of stable diffusion on cpu. based on latent consistency models and adversarial diffusion distillation.

    fastsd cpu screenshot the following interfaces are available :

    • desktop gui, basic text to image generation (qt,faster)
    • webui (advanced features,lora,controlnet etc)
    • cli (commandline interface)

    🚀 using openvino(sdxs-512-0.9), it took 0.82 seconds (820 milliseconds) to create a single 512x512 image on a core i7-12700.

    📰 news

    • 2025-04-20 - added mcp server support,faster uv based installation,cluade desktop, open webui support
    • 2024-11-03 - added intel core ultra series 2 (lunar lake) npu support
    • 2024-10-02 - added gguf diffusion model(flux) support
    • 2024-09-03 – added intel ai pc gpu, npu support 🚀

    table of contents 👇

    supported platforms⚡️

    fastsd cpu works on the following platforms:

    • windows
    • linux
    • mac
    • android + termux
    • raspberry pi 4

    dependencies 📦

    memory requirements

    minimum system ram requirement for fastsd cpu.

    model (lcm,openvino): sd turbo, 1 step, 512 x 512

    model (lcm-lora): dreamshaper v8, 3 step, 512 x 512

    | mode | min ram | | --------------------- | ------------- | | lcm | 2 gb | | lcm-lora | 4 gb | | openvino | 11 gb |

    if we enable tiny decoder(taesd) we can save some memory(2gb approx) for example in openvino mode memory usage will become 9gb.

    :exclamation: please note that guidance scale >1 increases ram usage and slow inference speed.

    features ✨

    • desktop gui, web ui and cli
    • supports 256,512,768,1024 image sizes
    • supports windows,linux,mac
    • saves images and diffusion setting used to generate the image
    • settings to control,steps,guidance and seed
    • added safety checker setting
    • maximum inference steps increased to 25
    • added openvino support
    • fixed openvino image reproducibility issue
    • fixed openvino high ram usage,thanks deinferno
    • added multiple image generation support
    • application settings
    • added tiny auto encoder for sd (taesd) support, 1.4x speed boost (fast,moderate quality)
    • safety checker disabled by default
    • added sdxl,ssd1b - 1b lcm models
    • added lcm-lora support, works well for fine-tuned stable diffusion model 1.5 or sdxl models
    • added negative prompt support in lcm-lora mode
    • lcm-lora models can be configured using text configuration file
    • added support for custom models for openvino (lcm-lora baked)
    • openvino models now supports negative prompt (set guidance >1.0)
    • real-time inference support,generates images while you type (experimental)
    • fast 2,3 steps inference
    • lcm-lora fused models for faster inference
    • supports integrated gpu(igpu) using openvino (export device=gpu)
    • 5.7x speed using openvino(steps: 2,tiny autoencoder)
    • image to image support (use web ui)
    • openvino image to image support
    • fast 1 step inference (sdxl turbo)
    • added sd turbo support
    • added image to image support for turbo models (pytorch and openvino)
    • added image variations support
    • added 2x upscaler (edsr and tiled sd upscale (experimental)),thanks monstruosoft for sd upscale
    • works on android + termux + proot
    • added interactive cli,thanks monstruosoft
    • added basic lora support to cli and webui
    • onnx edsr 2x upscale
    • add sdxl-lightning support
    • add sdxl-lightning openvino support (int8)
    • add multilora support,thanks monstruosoft
    • add basic controlnet v1.1 support(lcm-lora mode),thanks monstruosoft
    • add controlnet annotators(canny,depth,lineart,mlsd,normalbae,pose,softedge,shuffle)
    • add sdxs-512 0.9 support
    • add sdxs-512 0.9 openvino,fast 1 step inference (0.8 seconds to generate 512x512 image)
    • default model changed to sdxs-512-0.9
    • faster realtime image generation
    • add npu device check
    • revert default model to sdturbo
    • update realtime ui
    • add hypersd support
    • 1 step fast inference support for sdxl and sd1.5
    • experimental support for single file safetensors sd 1.5 models(civitai models), simply add local model path to configs/stable-diffusion-models.txt file.
    • add rest api support
    • add aura sr (4x)/gigagan based upscaler support
    • add aura sr v2 upscaler support
    • add flux.1 schnell openvino int 4 support
    • add clip skip support
    • add token merging support
    • add intel ai pc support
    • ai pc npu(power efficient inference using openvino) supports, text to image ,image to image and image variations support
    • add taef1 (tiny autoencoder for flux.1) openvino support
    • add image to image and image variations qt gui support,thanks monstruosoft

    fast inference benchmarks

    🚀 fast 1 step inference with hyper-sd

    stable diffuion 1.5

    works with lcm-lora mode. fast 1 step inference supported on runwayml/stable-diffusion-v1-5 model,select rupeshs/hypersd-sd1-5-1-step-lora lcm_lora model from the settings.

    stable diffuion xl

    works with lcm and lcm-openvino mode.

    inference speed

    tested on core i7-12700 to generate 768x768 image(1 step).

    | diffusion pipeline | latency | | --------------------- | ------------- | | pytorch | 19s | | openvino | 13s | | openvino + taesdxl | 6.3s |

    fastest 1 step inference (sdxs-512-0.9)

    :exclamation:this is an experimental model, only text to image workflow is supported.

    inference speed

    tested on core i7-12700 to generate 512x512 image(1 step).

    sdxs-512-0.9

    | diffusion pipeline | latency | | --------------------- | ------------- | | pytorch | 4.8s | | openvino | 3.8s | | openvino + taesd | 0.82s |

    🚀 fast 1 step inference (sd/sdxl turbo - adversarial diffusion distillation,add)

    added support for ultra fast 1 step inference using sdxl-turbo model

    :exclamation: these sd turbo models are intended for research purpose only.

    inference speed

    tested on core i7-12700 to generate 512x512 image(1 step).

    sd turbo

    | diffusion pipeline | latency | | --------------------- | ------------- | | pytorch | 7.8s | | openvino | 5s | | openvino + taesd | 1.7s |

    sdxl turbo

    | diffusion pipeline | latency | | --------------------- | ------------- | | pytorch | 10s | | openvino | 5.6s | | openvino + taesdxl | 2.5s |

    🚀 fast 2 step inference (sdxl-lightning - adversarial diffusion distillation)

    sdxl-lightning works with lcm and lcm-openvino mode.you can select these models from app settings.

    tested on core i7-12700 to generate 768x768 image(2 steps).

    | diffusion pipeline | latency | | --------------------- | ------------- | | pytorch | 18s | | openvino | 12s | | openvino + taesdxl | 10s |

    2 steps fast inference (lcm)

    fastsd cpu supports 2 to 3 steps fast inference using lcm-lora workflow. it works well with sd 1.5 models.

    2 steps inference

    flux.1-schnell openvino support

    flux schenell openvino

    :exclamation: important - please note the following points with flux workflow

    • as of now only text to image generation mode is supported
    • use openvino mode
    • use int4 model - rupeshs/flux.1-schnell-openvino-int4
    • 512x512 image generation needs around 30gb system ram

    tested on intel core i7-12700 to generate 512x512 image(3 steps).

    | diffusion pipeline | latency | | --------------------- | ------------- | | openvino | 4 min 30sec |

    benchmark scripts

    to benchmark run the following batch file on windows:

    • benchmark.bat - to benchmark pytorch
    • benchmark-openvino.bat - to benchmark openvino

    alternatively you can run benchmarks by passing -b command line argument in cli mode.

    openvino support

    fast sd cpu utilizes openvino to speed up the inference speed. thanks deinferno for the openvino model contribution. we can get 2x speed improvement when using openvino. thanks disty0 for the conversion script.

    openvino sdxl models

    these are models converted to use directly use it with fastsd cpu. these models are compressed to int8 to reduce the file size (10gb to 4.4 gb) using nncf

    openvino sd turbo models

    we have converted sd/sdxl turbo models to openvino for fast inference on cpu. these models are intended for research purpose only. also we converted taesdxl model to openvino and

    you can directly use these models in fastsd cpu.

    convert sd 1.5 models to openvino lcm-lora fused models

    we first creates lcm-lora baked in model,replaces the scheduler with lcm and then converts it into openvino model. for more details check lcm openvino converter, you can use this tools to convert any stablediffusion 1.5 fine tuned models to openvino.

    real-time text to image (experimental)

    we can generate real-time text to images using fastsd cpu.

    cpu (openvino)

    near real-time inference on cpu using openvino, run the start-realtime.bat batch file and open the link in browser (resolution : 512x512,latency : 0.82s on intel core i7)

    watch youtube video :

    image_alt

    models

    to use single file safetensors sd 1.5 models(civit ai) follow this youtube tutorial. use lcm-lora mode for single file safetensors.

    fast sd supports lcm models and lcm-lora models.

    lcm models

    these models can be configured in configs/lcm-models.txt file.

    openvino models

    these are lcm-lora baked in models. these models can be configured in configs/openvino-lcm-models.txt file

    lcm-lora models

    these models can be configured in configs/lcm-lora-models.txt file.

    these models are used with stablediffusion base models configs/stable-diffusion-models.txt.

    :exclamation: currently no support for openvino lcm-lora models.

    how to add new lcm-lora models

    to add new model follow the steps: for example we will add wavymulder/collage-diffusion, you can give stable diffusion 1.5 or sdxl,ssd-1b fine tuned models.

    1. open configs/stable-diffusion-models.txt file in text editor.
    2. add the model id wavymulder/collage-diffusion or locally cloned path.

    updated file as shown below :

    fictiverse/stable_diffusion_papercut_model
    stabilityai/stable-diffusion-xl-base-1.0
    runwayml/stable-diffusion-v1-5
    segmind/ssd-1b
    stablediffusionapi/anything-v5
    wavymulder/collage-diffusion
    

    similarly we can update configs/lcm-lora-models.txt file with lcm-lora id.

    how to use lcm-lora models offline

    please follow the steps to run lcm-lora models offline :

    • in the settings ensure that "use locally cached model" setting is ticked.
    • download the model for example latent-consistency/lcm-lora-sdv1-5 run the following commands:
    git lfs install
    git clone https://huggingface.co/latent-consistency/lcm-lora-sdv1-5
    

    copy the cloned model folder path for example "d:\demo\lcm-lora-sdv1-5" and update the configs/lcm-lora-models.txt file as shown below :

    d:\demo\lcm-lora-sdv1-5
    latent-consistency/lcm-lora-sdxl
    latent-consistency/lcm-lora-ssd-1b
    
    • open the app and select the newly added local folder in the combo box menu.
    • that's all!

    how to use lora models

    place your lora models in "lora_models" folder. use lcm or lcm-lora mode. you can download lora model (.safetensors/safetensor) from civitai or hugging face e.g: cutecartoonredmond

    controlnet support

    we can use controlnet in lcm-lora mode.

    download controlnet models from controlnet-v1-1.download and place controlnet models in "controlnet_models" folder.

    use the medium size models (723 mb)(for example : https://huggingface.co/comfyanonymous/controlnet-v1-1_fp16_safetensors/blob/main/control_v11p_sd15_canny_fp16.safetensors)

    installation

    fastsd cpu on windows

    fastsd cpu desktop gui screenshot

    :exclamation:you must have a working python and uv installation.(recommended : python 3.10 or higher )

    to install fastsd cpu on windows run the following steps :

    • clone/download this repo or download release.
    • double click install.bat (it will take some time to install,depending on your internet speed.)
    • you can run in desktop gui mode or web ui mode.

    desktop gui

    • to start desktop gui double click start.bat

    web ui

    • to start web ui double click start-webui.bat

    fastsd cpu on linux

    :exclamation:ensure that you have python 3.10 and uv installed.

    • clone/download this repo or download release.

    • in the terminal, enter into fastsdcpu directory

    • run the following command

      chmod +x install.sh

      ./install.sh

    to start desktop gui

    ./start.sh

    to start web ui

    ./start-webui.sh

    fastsd cpu on mac

    fastsd cpu running on mac

    :exclamation:ensure that you have python 3.9 or 3.10 or 3.11 version installed.

    run the following commands to install fastsd cpu on mac :

    • clone/download this repo or download release.

    • in the terminal, enter into fastsdcpu directory

    • run the following command

      chmod +x install-mac.sh

      ./install-mac.sh

    to start desktop gui

    ./start.sh

    to start web ui

    ./start-webui.sh

    thanks autantpourmoi for mac testing.

    :exclamation:we don't support openvino on mac (m1/m2/m3 chips, but does work on intel chips).

    if you want to increase image generation speed on mac(m1/m2 chip) try this:

    export device=mps and start app start.sh

    web ui screenshot

    fastsd cpu webui screenshot

    google colab

    due to the limitation of using cpu/openvino inside colab, we are using gpu with colab. open in colab

    cli mode (advanced users)

    fastsd cpu cli screenshot

    open the terminal and enter into fastsdcpu folder. activate virtual environment using the command:

    windows users

    (suppose fastsd cpu available in the directory "d:\fastsdcpu") d:\fastsdcpu\env\scripts\activate.bat

    linux users

    source env/bin/activate

    start cli src/app.py -h

    android (termux + proot)

    fastsd cpu running on google pixel 7 pro.

    fastsd cpu android termux screenshot

    install fastsd cpu on android

    follow this guide to install fastsd cpu on android + termux how to install and run fastsd cpu on android + temux – step by step guide [updated]

    raspberry pi 4 support

    thanks [wgnw_mgm] for raspberry pi 4 testing.fastsd cpu worked without problems. system configuration - raspberry pi 4 with 4gb ram, 8gb of swap memory.

    api support

    fastsd cpu api documentation

    fastsd cpu supports basic api endpoints. following api endpoints are available :

    • /api/info - to get system information
    • /api/config - get configuration
    • /api/models - list all available models
    • /api/generate - generate images (text to image,image to image)

    to start fastapi in webserver mode run: python src/app.py --api

    or use start-webserver.sh for linux and start-webserver.bat for windows.

    access api documentation locally at http://localhost:8000/api/docs .

    generated image is jpeg image encoded as base64 string. in the image-to-image mode input image should be encoded as base64 string.

    to generate an image a minimal request post /api/generate with body :

    {
        "prompt": "a cute cat",
        "use_openvino": true
    }
    

    gguf support - flux

    gguf flux model supported via stablediffusion.cpp shared library. currently flux schenell model supported.

    to use gguf model use web ui and select gguf mode.

    tested on windows and linux.

    :exclamation: main advantage here we reduced minimum system ram required for flux workflow to around 12 gb.

    supported mode - text to image

    how to run flux gguf model

    • download stablediffusion.cpp prebuilt shared library and place it inside fastsdcpu folder for windows users, download stable-diffusion.dll

      for linux users download libstable-diffusion.so

      you can also build the library manully by following the guide "build stablediffusion.cpp shared library for gguf flux model support"

    • download diffusion model from flux1-schnell-q4_0.gguf and place it inside models/gguf/diffusion directory

    • download clip model from clip_l_q4_0.gguf and place it inside models/gguf/clip directory

    • download t5-xxl model from t5xxl_q4_0.gguf and place it inside models/gguf/t5xxl directory

    • download vae model from ae.safetensors and place it inside models/gguf/vae directory

    • start web ui and select gguf mode

    • select the models settings tab and select gguf diffusion,clip_l,t5xxl and vae models.

    • enter your prompt and generate image

    build stablediffusion.cpp shared library for gguf flux model support(optional)

    to build the stablediffusion.cpp library follow these steps

    • git clone https://github.com/leejet/stable-diffusion.cpp
    • cd stable-diffusion.cpp
    • git pull origin master
    • git submodule init
    • git submodule update
    • git checkout 14206fd48832ab600d9db75f15acb5062ae2c296
    • cmake . -dsd_build_shared_libs=on
    • cmake --build . --config release
    • copy the stablediffusion dll/so file to fastsdcpu folder

    intel ai pc support - openvino (cpu, gpu, npu)

    fast sd now supports ai pc with intel® core™ ultra processors. to learn more about ai pc and openvino.

    gpu

    for gpu mode set device=gpu and run webui. fastsd gpu benchmark on ai pc as shown below.

    fastsd ai pc arc gpu benchmark

    npu

    fastsd cpu now supports power efficient npu (neural processing unit) that comes with intel core ultra processors.

    fastsd tested with following intel processor's npus:

    • intel core ultra series 1 (meteor lake)
    • intel core ultra series 2 (lunar lake)

    currently fastsd support this model for npu rupeshs/sd15-lcm-square-openvino-int8.

    supports following modes on npu :

    • text to image
    • image to image
    • image variations

    to run model in npu follow these steps (please make sure that your ai pc's npu driver is the latest):

    • start webui
    • select lcm-openvino mode
    • select the models settings tab and select openvino model rupeshs/sd15-lcm-square-openvino-int8
    • set device envionment variable set device=npu
    • now it will run on the npu

    this is heterogeneous computing since text encoder and unet will use npu and vae will use gpu for processing. thanks to openvino.

    please note that tiny auto encoder will not work in npu mode.

    thanks to intel for providing ai pc dev kit and tiber cloud access to test fastsd, special thanks to pooja baraskar,dmitriy pastushenkov.

    mcp server support

    fastsdcpu now supports mcp(model context protocol) server.

    • start fastsdcpu mcp server: python src/app.py --mcp or run start-mcpserver.sh for linux and start-mcpserver.bat for windows.

    fastsdcpu mcp server will be running at http://127.0.0.1:8000/mcp

    it can be used with ai apps that support mcp protocol for example claude desktop

    note: openwebui not directly using mcp protocol it is based on openapi protocol.

    claude desktop

    to connect with fastsd mcp server first configure claude desktop :

    • first configure claude desktop,open file - >settings -> developer - edit config
    • add below config(also ensure that node.js installed on your machine)
    {
      "mcpservers": {
        "fastsdcpu": {
          "command": "npx",
          "args": [
            "mcp-remote",
            "http://127.0.0.1:8000/mcp"
          ]
        }
      }
    }
    
    • restart claude desktop
    • give a sample prompt to generate image "create image of a cat"

    screenshot of claude desktop accessing intel ai pc npu to generate an image using the fastsd mcp server

    claude desktop fastsd cpu aipc npu

    open webui support

    the fastsdcpu can be used with openwebui for local image generation using llm and tool calling.

    follow the below steps to fastsd to use with open webui.

    • start fastsdcpu mcp server: python src/app.py --mcp or run start-mcpserver.sh for linux and start-mcpserver.bat for windows.

    • update server url in the settings page as shown below

    openwebui settings

    • change chat controls setting "function calling" to "native"

    • generate image using text prompt (qwen 2.5 7b model used for the demo)

    openwebui fastsd mcp server

    known issues

    • taesd will not work with openvino image to image workflow

    license

    the fastsdcpu project is available as open source under the terms of the mit license

    disclaimer

    users are granted the freedom to create images using this tool, but they are obligated to comply with local laws and utilize it responsibly. the developers will not assume any responsibility for potential misuse by users.

    thanks to all our contributors

    original author & maintainer - rupesh sreeraman

    we thank all contributors for their time and hard work!

    Star History

    Star History

    Oct 19, 2023Dec 3, 2023Jan 31, 2024Mar 21, 2024May 15, 2024Jul 13, 2024Sep 13, 2024Nov 4, 2024Jan 10, 2025Mar 14, 2025Jun 29, 202504509001,3501,800
    Powered by MSeeP Analytics

    About the Project

    This app has not been claimed by its owner yet.

    Claim Ownership

    Receive Updates

    Security Updates

    Get notified about trust rating changes

    to receive email notifications.