Server Usage Instructions
Better Shell Setup
This also installs some common dependencies if having sudo
access. If not, comment out the sudo
commands.
1 | git clone https://github.com/Co1lin/ML-Env-Setup |
Environment Variables
Set these environment variables in your .bashrc
and/or .zshrc
.
1 | export HF_HOME=<a shared folder on a fast storage> |
NVIDIA GPU Related
- Always monitor the status of GPUs when you plan to use or are using them.
nvitop
(pip install nvitop
) is recommended. - Only use GPUs that are needed for your task! Do NOT blindly use all GPUs! Limit visible GPUs to your processes by
export CUDA_VISIBLE_DEVICES=2,3
, for example.- Monitor the memory usage to help you decide how many GPUs to use.
- Monitor “GPU-Util”
- to make sure the GPUs are not idle. If the GPU-Util keeps at 0%, you should suspect that your processes get stuck!
- to help you optimize your code, if GPU-Util is low.
- Make sure you don’t have any dead process occupying the GPUs! One way to exit your process: https://blog.co1in.me/skills/py-embed-trick/#Exit-the-program.
- Show dead processes that cannot be shown by
nvidia-smi
:1
2# sudo needed to view processes of other users
fuser -v /dev/nvidia*
- Show dead processes that cannot be shown by
- Avoid using GPUs that are in use by other users! (Unless you are very sure it will not affect them by OOM or other reasons.)
Python Related
- https://blog.co1in.me/skills/py-embed-trick/#Exit-the-program
- Create new repos with python-template.
- Query LLMs with
litellm
, a unified interface for many LLM providers.