Cua
    Cua

    Cua

    c/ua is the Docker Container for Computer-Use AI Agents.

    1.7

    GitHub Stats

    Stars

    8937

    Forks

    404

    Release Date

    6/8/2025

    about a month ago

    Detailed Description

    Python Swift macOS Discord

    c/ua ("koo-ah") is Docker for Computer-Use Agents - it enables AI agents to control full operating systems in virtual containers and deploy them locally or to the cloud.

    🚀 Quick Start with a Computer-Use Agent UI

    Need to automate desktop tasks? Launch the Computer-Use Agent UI with a single command.

    Option 1: Fully-managed install with Docker (recommended)

    Docker-based guided install for quick use

    macOS/Linux/Windows (via WSL):

    # Requires Docker
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/scripts/playground-docker.sh)"
    

    This script will guide you through setup using Docker containers and launch the Computer-Use Agent UI.


    Option 2: Dev Container

    Best for contributors and development

    This repository includes a Dev Container configuration that simplifies setup to a few steps:

    1. Install the Dev Containers extension (VS Code or WindSurf)
    2. Open the repository in the Dev Container:
      • Press Ctrl+Shift+P (or ⌘+Shift+P on macOS)
      • Select Dev Containers: Clone Repository in Container Volume... and paste the repository URL: https://github.com/trycua/cua.git (if not cloned) or Dev Containers: Open Folder in Container... (if git cloned).

      Note: On WindSurf, the post install hook might not run automatically. If so, run /bin/bash .devcontainer/post-install.sh manually.

    3. Open the VS Code workspace: Once the post-install.sh is done running, open the .vscode/py.code-workspace workspace and press Open Workspace .
    4. Run the Agent UI example: Click Run Agent UI to start the Gradio UI. If prompted to install debugpy (Python Debugger) to enable remote debugging, select 'Yes' to proceed.
    5. Access the Gradio UI: The Gradio UI will be available at http://localhost:7860 and will automatically forward to your host machine.

    Option 3: PyPI

    Direct Python package installation

    # conda create -yn cua python==3.12
    
    pip install -U "cua-computer[all]" "cua-agent[all]"
    python -m agent.ui # Start the agent UI
    

    Or check out the Usage Guide to learn how to use our Python SDK in your own code.


    Supported Agent Loops

    🖥️ Compatibility

    For detailed compatibility information including host OS support, VM emulation capabilities, and model provider compatibility, see the Compatibility Matrix.

    🐍 Usage Guide

    Follow these steps to use C/ua in your own Python code. See Developer Guide for building from source.

    Step 1: Install Lume CLI

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
    

    Lume CLI manages high-performance macOS/Linux VMs with near-native speed on Apple Silicon.

    Step 2: Pull the macOS CUA Image

    lume pull macos-sequoia-cua:latest
    

    The macOS CUA image contains the default Mac apps and the Computer Server for easy automation.

    Step 3: Install Python SDK

    pip install "cua-computer[all]" "cua-agent[all]"
    

    Step 4: Use in Your Code

    from computer import Computer
    from agent import ComputerAgent, LLM
    
    async def main():
        # Start a local macOS VM
        computer = Computer(os_type="macos")
        await computer.run()
    
        # Or with C/ua Cloud Container
        computer = Computer(
          os_type="linux",
          api_key="your_cua_api_key_here",
          name="your_container_name_here"
        )
    
        # Example: Direct control of a macOS VM with Computer
        computer.interface.delay = 0.1 # Wait 0.1 seconds between kb/m actions
        await computer.interface.left_click(100, 200)
        await computer.interface.type_text("Hello, world!")
        screenshot_bytes = await computer.interface.screenshot()
    
        # Example: Create and run an agent locally using mlx-community/UI-TARS-1.5-7B-6bit
        agent = ComputerAgent(
          computer=computer,
          loop="uitars",
          model=LLM(provider="mlxvlm", name="mlx-community/UI-TARS-1.5-7B-6bit")
        )
        async for result in agent.run("Find the trycua/cua repository on GitHub and follow the quick start guide"):
            print(result)
    
    if __name__ == "__main__":
        asyncio.run(main())
    

    For ready-to-use examples, check out our Notebooks collection.

    Lume CLI Reference

    # Install Lume CLI and background service
    curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
    
    # List all VMs
    lume ls
    
    # Pull a VM image
    lume pull macos-sequoia-cua:latest
    
    # Create a new VM
    lume create my-vm --os macos --cpu 4 --memory 8GB --disk-size 50GB
    
    # Run a VM (creates and starts if it doesn't exist)
    lume run macos-sequoia-cua:latest
    
    # Stop a VM
    lume stop macos-sequoia-cua_latest
    
    # Delete a VM
    lume delete macos-sequoia-cua_latest
    

    Lumier CLI Reference

    For advanced container-like virtualization, check out Lumier - a Docker interface for macOS and Linux VMs.

    # Install Lume CLI and background service
    curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
    
    # Run macOS in a Docker container
    docker run -it --rm \
        --name lumier-vm \
        -p 8006:8006 \
        -v $(pwd)/storage:/storage \
        -v $(pwd)/shared:/shared \
        -e VM_NAME=lumier-vm \
        -e VERSION=ghcr.io/trycua/macos-sequoia-cua:latest \
        -e CPU_CORES=4 \
        -e RAM_SIZE=8192 \
        -e HOST_STORAGE_PATH=$(pwd)/storage \
        -e HOST_SHARED_PATH=$(pwd)/shared \
        trycua/lumier:latest
    

    Resources

    Modules

    | Module | Description | Installation | |--------|-------------|---------------| | Lume | VM management for macOS/Linux using Apple's Virtualization.Framework | curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh \| bash | | Lumier | Docker interface for macOS and Linux VMs | docker pull trycua/lumier:latest | | Computer (Python) | Python Interface for controlling virtual machines | pip install "cua-computer[all]" | | Computer (Typescript) | Typescript Interface for controlling virtual machines | npm install @trycua/computer | | Agent | AI agent framework for automating tasks | pip install "cua-agent[all]" | | MCP Server | MCP server for using CUA with Claude Desktop | pip install cua-mcp-server | | SOM | Self-of-Mark library for Agent | pip install cua-som | | Computer Server | Server component for Computer | pip install cua-computer-server | | Core (Python) | Python Core utilities | pip install cua-core | | Core (Typescript) | Typescript Core utilities | npm install @trycua/core |

    Computer Interface Reference

    For complete examples, see computer_examples.py or computer_nb.ipynb

    # Shell Actions
    result = await computer.interface.run_command(cmd)       # Run shell command
    # result.stdout, result.stderr, result.returncode
    
    # Mouse Actions
    await computer.interface.left_click(x, y)       # Left click at coordinates
    await computer.interface.right_click(x, y)      # Right click at coordinates
    await computer.interface.double_click(x, y)     # Double click at coordinates
    await computer.interface.move_cursor(x, y)      # Move cursor to coordinates
    await computer.interface.drag_to(x, y, duration)  # Drag to coordinates
    await computer.interface.get_cursor_position()  # Get current cursor position
    await computer.interface.mouse_down(x, y, button="left")  # Press and hold a mouse button
    await computer.interface.mouse_up(x, y, button="left")    # Release a mouse button
    
    # Keyboard Actions
    await computer.interface.type_text("Hello")     # Type text
    await computer.interface.press_key("enter")     # Press a single key
    await computer.interface.hotkey("command", "c") # Press key combination
    await computer.interface.key_down("command")    # Press and hold a key
    await computer.interface.key_up("command")      # Release a key
    
    # Scrolling Actions
    await computer.interface.scroll(x, y)           # Scroll the mouse wheel
    await computer.interface.scroll_down(clicks)    # Scroll down
    await computer.interface.scroll_up(clicks)      # Scroll up
    
    # Screen Actions
    await computer.interface.screenshot()           # Take a screenshot
    await computer.interface.get_screen_size()      # Get screen dimensions
    
    # Clipboard Actions
    await computer.interface.set_clipboard(text)    # Set clipboard content
    await computer.interface.copy_to_clipboard()    # Get clipboard content
    
    # File System Operations
    await computer.interface.file_exists(path)      # Check if file exists
    await computer.interface.directory_exists(path) # Check if directory exists
    await computer.interface.read_text(path, encoding="utf-8")        # Read file content
    await computer.interface.write_text(path, content, encoding="utf-8") # Write file content
    await computer.interface.read_bytes(path)       # Read file content as bytes
    await computer.interface.write_bytes(path, content) # Write file content as bytes
    await computer.interface.delete_file(path)      # Delete file
    await computer.interface.create_dir(path)       # Create directory
    await computer.interface.delete_dir(path)       # Delete directory
    await computer.interface.list_dir(path)         # List directory contents
    
    # Accessibility
    await computer.interface.get_accessibility_tree() # Get accessibility tree
    
    # Delay Configuration
    # Set default delay between all actions (in seconds)
    computer.interface.delay = 0.5  # 500ms delay between actions
    
    # Or specify delay for individual actions
    await computer.interface.left_click(x, y, delay=1.0)     # 1 second delay after click
    await computer.interface.type_text("Hello", delay=0.2)   # 200ms delay after typing
    await computer.interface.press_key("enter", delay=0.5)   # 500ms delay after key press
    
    # Python Virtual Environment Operations
    await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment
    await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'") # Run a shell command in a virtual environment
    await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception
    
    # Example: Use sandboxed functions to execute code in a C/ua Container
    from computer.helpers import sandboxed
    
    @sandboxed("demo_venv")
    def greet_and_print(name):
        """Get the HTML of the current Safari tab"""
        import PyXA
        safari = PyXA.Application("Safari")
        html = safari.current_document.source()
        print(f"Hello from inside the container, {name}!")
        return {"greeted": name, "safari_html": html}
    
    # When a @sandboxed function is called, it will execute in the container
    result = await greet_and_print("C/ua")
    # Result: {"greeted": "C/ua", "safari_html": "<html>...</html>"}
    # stdout and stderr are also captured and printed / raised
    print("Result from sandboxed function:", result)
    

    ComputerAgent Reference

    For complete examples, see agent_examples.py or agent_nb.ipynb

    # Import necessary components
    from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
    
    # UI-TARS-1.5 agent for local execution with MLX
    ComputerAgent(loop=AgentLoop.UITARS, model=LLM(provider=LLMProvider.MLXVLM, name="mlx-community/UI-TARS-1.5-7B-6bit"))
    # OpenAI Computer-Use agent using OPENAI_API_KEY
    ComputerAgent(loop=AgentLoop.OPENAI, model=LLM(provider=LLMProvider.OPENAI, name="computer-use-preview"))
    # Anthropic Claude agent using ANTHROPIC_API_KEY
    ComputerAgent(loop=AgentLoop.ANTHROPIC, model=LLM(provider=LLMProvider.ANTHROPIC))
    
    # OmniParser loop for UI control using Set-of-Marks (SOM) prompting and any vision LLM
    ComputerAgent(loop=AgentLoop.OMNI, model=LLM(provider=LLMProvider.OLLAMA, name="gemma3:12b-it-q4_K_M"))
    # OpenRouter example using OAICOMPAT provider
    ComputerAgent(
        loop=AgentLoop.OMNI,
        model=LLM(
            provider=LLMProvider.OAICOMPAT,
            name="openai/gpt-4o-mini",
            provider_base_url="https://openrouter.ai/api/v1"
        ),
        api_key="your-openrouter-api-key"
    )
    

    Community

    Join our Discord community to discuss ideas, get assistance, or share your demos!

    License

    Cua is open-sourced under the MIT License - see the LICENSE file for details.

    Microsoft's OmniParser, which is used in this project, is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0) - see the OmniParser LICENSE file for details.

    Contributing

    We welcome contributions to CUA! Please refer to our Contributing Guidelines for details.

    Trademarks

    Apple, macOS, and Apple Silicon are trademarks of Apple Inc. Ubuntu and Canonical are registered trademarks of Canonical Ltd. Microsoft is a registered trademark of Microsoft Corporation. This project is not affiliated with, endorsed by, or sponsored by Apple Inc., Canonical Ltd., or Microsoft Corporation.

    Stargazers

    Thank you to all our supporters!

    Stargazers over time

    Contributors

    Star History

    Star History

    Jan 31Feb 4Feb 8Feb 12Feb 16Feb 20Feb 24Feb 28Mar 4Mar 805501,1001,6502,200
    Powered by MSeeP Analytics

    About the Project

    This app has not been claimed by its owner yet.

    Claim Ownership

    Receive Updates

    Security Updates

    Get notified about trust rating changes

    to receive email notifications.