What is Application Hub?
Application Hub is the central component of Cordatus for deploying, configuring, and managing AI applications and containerized workloads at scale.
It provides a unified interface to launch LLM inference engines, NVIDIA AI frameworks, and custom Docker containers across your devices with minimal configuration.
Application Hub transforms complex Docker deployments into simple, guided workflows — allowing you to run production-grade AI models without dealing with command-line complexity or infrastructure details.
Who is it for?
- AI Engineers & Data Scientists who need to deploy and test LLM models across different hardware configurations
- DevOps Teams managing AI infrastructure and container orchestration at scale
- ML Operations Engineers who need to optimize GPU utilization and model performance
- Organizations running distributed AI workloads across multiple devices and locations
- Researchers & Developers experimenting with different models, quantizations, and inference engines
What can I do with Application Hub?
With Application Hub you can:
-
Deploy AI Applications
Launch pre-configured applications such as:- vLLM, TensorRT-LLM, Ollama (LLM inference engines)
- NVIDIA AI Dynamo (distributed LLM runtime)
- NVIDIA VSS (Video Search & Summarization)
- Custom Docker containers
-
Configure Advanced Settings
- GPU Selection: Choose specific GPUs or allocate all available GPUs
- Resource Limits: Set CPU core and RAM limits with Host Reserved protection
- Model Selection: Use Cordatus Models, Custom Models, or User Models
- Docker Options: Configure ports, volumes, networks, and environment variables
- Engine Arguments: Fine-tune inference parameters (batch size, quantization, etc.)
-
Calculate VRAM Requirements Use the VRAM Calculator to:
- Predict GPU memory requirements before deployment
- Test different configurations (quantization, sequence length, batch size)
- Determine optimal hardware for your models
- Plan multi-GPU deployments
-
Manage Models Across Devices
With User Models:- Add models from your devices to Cordatus
- Transfer models between devices on the same network
- Use custom models not available on the internet
- Automatically configure volume mappings for different inference engines
-
Monitor & Control Containers
- View real-time container status and logs
- Start, stop, or delete containers individually or in groups
- Generate public URLs for deployed applications
- Create Open Web UI interfaces for LLM models
- Duplicate existing containers with modified configurations
Key concepts
-
Application — A pre-configured Docker image registered in Cordatus (e.g., vLLM, TensorRT-LLM, NVIDIA Dynamo, NVIDIA VSS).
See details → Application Launch Guide | Standard Applications | NVIDIA VSS Guide | NVIDIA Dynamo Guide -
Container — A running instance of an application with specific configuration (GPU assignment, model, parameters).
See details → Container Management Guide -
Container Group — Multiple containers deployed together as a single unit (e.g., VSS + VLM + LLM + Embed + Rerank).
See details → Container Management Guide | NVIDIA VSS Guide -
Model Types
- Cordatus Models: Pre-tested models registered in the Cordatus system
- Custom Models: Models specified by name or URL (downloaded during deployment)
- User Models: Models you've added from your devices
-
Inference Engine — The runtime framework that executes LLM models:
- vLLM: High-throughput inference with PagedAttention
- TensorRT-LLM: NVIDIA's optimized inference engine
- Ollama: Simple, local model deployment
- NVIDIA NIM: Enterprise-grade microservices
-
Quantization — Model weight compression format:
- BF16/FP16: Full precision (16-bit)
- INT8/FP8: Half memory usage (8-bit)
- INT4/FP4: Quarter memory usage (4-bit)
-
Resource Limits
- CPU Core Limit: Maximum CPU cores the container can use
- RAM Limit: Maximum memory allocation
- Host Reserved: Resources automatically reserved for system stability
- No Limit: Override Host Reserved (use with caution)
-
VRAM Components
- Model Weights: Memory occupied by model parameters
- KV Cache: Key-Value cache for transformer models
- Overhead/Activation: System overhead or activation memory
- Free VRAM: Remaining available memory
See details → VRAM Calculator User Guide
How does Application Hub work?
-
Select an Application
- Browse available applications from Containers > Applications
- View application details, versions, and supported platforms
- Check which Docker images are already downloaded on your device
See details → Application Launch Guide
-
Choose Device & Version
- Select the target device for deployment
- Choose the Docker image version
- Cordatus shows whether the image needs to be downloaded
-
Configure Advanced Settings
General Settings:
- Assign environment name
- Select GPUs (or use "All GPU")
- Set CPU core and RAM limits
- Enable Open Web UI (for LLM applications)
Model Selection (LLM Applications):
- Choose from Cordatus Models, Custom Models, or User Models
- Cordatus handles model transfer if needed
- Automatic volume configuration based on inference engine
Docker Options:
- Configure port mappings (auto-assigned or manual)
- Set up volume bindings with visual file explorer
- Define network settings and restart policies
Environment Variables:
- Use pre-defined variables for the application
- Add custom variables or select tokens from your account
Engine Arguments:
- Configure inference parameters (batch size, quantization, etc.)
- For NVIDIA Dynamo: Configure processing mode, router, connector, workers
- For NVIDIA VSS: Configure VLM, LLM, Embed, and Rerank components
-
Launch & Monitor
- Review configuration and click Start Environment
- Enter sudo password for authorization
- Monitor deployment progress and container status
- Access containers via Containers page or Applications > Containers tab
See details → Container Management Guide
-
Manage Running Containers
- View logs and parameters in real-time
- Generate public URLs for external access
- Start, stop, or delete containers
- Create Open Web UI for LLM models
- Duplicate containers with modified settings
Application Types
Standard Applications
Simple Docker containers with basic GPU, CPU, and RAM configuration:
- Single container deployment
- Direct GPU assignment
- Standard Docker options
See details → Standard Application Creation Guide
LLM Engine Applications
Advanced inference engines with model management:
- Model selection (Cordatus, Custom, User Models)
- Automatic volume configuration
- Model transfer between devices
- Optional Open Web UI creation
- Engine-specific arguments
See details → Standard Application Creation Guide | User Models Guide
NVIDIA AI Dynamo
Distributed LLM runtime for multi-GPU deployments:
- Processing modes (Aggregated/Disaggregated)
- Router configuration (KV-Aware, Round Robin, Random)
- Connector setup (KVBM, NIXL, LM Cache)
- Worker creation and GPU assignment per worker
- Multi-container orchestration
See details → NVIDIA AI Dynamo Creation Guide
NVIDIA VSS (Video Search & Summarization)
Complex multi-component pipeline:
- Main VSS container with Event Reviewer option
- VLM (Vision Language Model) component
- LLM (Large Language Model) component
- Embed (Embedding Model) component
- Rerank (Rerank Model) component
- Each component can be: new, existing, or remote application
See details → NVIDIA VSS Creation Guide
User Models & Model Transfer
Model Path Configuration
Define model paths on your devices for:
- Huggingface models
- Ollama models
- NVIDIA NIM models
Adding Models
Two methods to add models:
- Start Scanning: Automatic detection of models in defined paths
- Add Manually: Select specific model directories
Model Transfer
- Transfer models between devices on the same network
- Resume interrupted transfers automatically
- Track transfer progress in real-time
- Automatic volume configuration after transfer
Model Deployment
- Click Deploy next to any User Model
- Select inference engine (vLLM, TensorRT-LLM, etc.)
- System redirects to Application Launch interface
- Model is automatically configured with correct paths
Learn more: User Models and Model Transfer Guide
VRAM Calculator
Calculate Before You Deploy
Learn more: VRAM Calculator User Guide Avoid deployment failures by calculating VRAM requirements first:
-
Select Model: Choose from registered models or search Hugging Face
-
Choose GPU: Select GPU model or use device's actual GPUs
-
Configure Parameters:
- Quantization (BF16, FP16, INT8, INT4)
- Sequence Length (1K - 256K tokens)
- Batch Size (1 - 512+)
- GPU Count (Standalone mode)
- GPU Memory Utilization (0% - 100%)
- Calculation Type (Overhead vs Activation)
-
View Results:
- Visual doughnut chart showing memory distribution
- Detailed metrics for each component
- Sufficient/Insufficient status indicator
- Usage percentage bar
Use Cases
- Test hardware requirements before purchasing GPUs
- Optimize quantization and batch size for existing hardware
- Plan multi-GPU deployments
- Compare different model configurations
Container Management
Container Operations
- Start: Launch stopped containers individually or in groups
- Stop: Stop running containers individually or in groups
- Delete: Remove containers (type "DELETE" to confirm)
- Duplicate: Create new container with same configuration
Batch Operations
- Select multiple containers using checkboxes
- Delete multiple containers simultaneously
- Apply operations to entire container groups
Container Information
View detailed information for any container:
- Logs: Real-time log output
- Parameters: All configuration settings
- Ports: Local and public URLs
Public URL Generation
- Generate publicly accessible URLs for any container port
- Refresh, deactivate, or reassign ports
- Share access to deployed applications
Open Web UI Creation
For LLM engines:
- One-click Open Web UI deployment
- Optional public URL generation
- Direct chat interface with your model
Best Practices
Resource Management
- Always maintain Host Reserved values for system stability
- Leave 20-30% Free VRAM for unexpected loads
- Use VRAM Calculator before production deployments
- Set appropriate CPU and RAM limits based on workload
Model Management
- Organize models in consistent directory structures
- Define model paths on all devices before using User Models
- Use meaningful names for custom models
- Keep model metadata (quantization, parameters) up to date
Container Deployment
- Test configurations with small models first
- Use Activation calculation mode for accurate VRAM estimates
- Monitor container logs during initial deployment
- Create container groups for multi-component applications
Multi-GPU Deployments
- Use NVIDIA Dynamo for distributed inference
- Configure appropriate worker counts and GPU assignments
- Choose router strategy based on workload (KV-Aware for efficiency)
- Monitor GPU utilization across all workers
Getting Help
For detailed step-by-step instructions with screenshots and videos:
- Application Launch Guide: Standard application deployment
- NVIDIA AI Dynamo Guide: Distributed LLM runtime setup
- NVIDIA VSS Guide: Video analysis pipeline deployment
- User Models Guide: Model management and transfer
- Container Management Guide: Container operations and monitoring
- VRAM Calculator Guide: Memory requirement calculation