Step 1: Benchmark Your Setup

Begin by benchmarking your system to evaluate current performance metrics. This helps identify potential bottlenecks.

solo benchmark

Example Output:

solo-server git:(vLLM) ✗ solo benchmark
Enter server type (ollama, vllm, llama.cpp): ollama 
Enter model name: llama3

Starting Solo Server Benchmark for ollama  with model llama3...
Running benchmarks... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  0% -:--:--╭──── Benchmark Results ────╮
│ llama3                    │
│ Response: 0.00 tokens/s   │
│ Total: 0.00 tokens/s      │
│                           │
│ Stats:                    │
│  - Response tokens: 0     │
│  - Model load time: 0.00s │
│  - Response time: 0.00s   │
│  - Total time: 0.00s      │
╰───────────────────────────╯

Step 2: Generate Fine-Tuning Synthetic Data

Based on the prompts, generate synthetic data that will be used to finetune your models and your server’s performance.

solo finetune gen

Example Output:

Step 3: Check Fine-Tuning Status

Before running the fine-tuning process, verify the synthetic data is correctly formatted for fine tuning.

solo finetune status

Example Output:

If you attempt to run solo on a port that’s already in use, it will use the next available port:

Port 5070 is already in use. Trying 5071 instead.

Step 4: Run the Fine-Tuning Process

Execute the fine-tuning process to apply optimizations to your Solo Server setup.

solo finetune run

Example Output:

==((====))==  Unsloth 2025.2.4: Fast Llama patching. Transformers: 4.48.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

Step 5: Clean Up Old Artifacts

Remove any outdated build artifacts or cache that may interfere with the new configuration.

solo rm

Example Output:

If the deployment is successful, you should see the following:

Step 6: Serve Your Models On the Cloud

solo serve 

This workflow ensures your Solo Server environment is optimized and ready for efficient, high-performance model deployment