Server Usage Instructions

Better Shell Setup

This also installs some common dependencies if having sudo access. If not, comment out the sudo commands.

1
2
3
git clone https://github.com/Co1lin/ML-Env-Setup
cd ML-Env-Setup
bash 0_basic.sh

Environment Variables

Set these environment variables in your .bashrc and/or .zshrc.

1
2
3
export HF_HOME=<a shared folder on a fast storage>
export HF_TOKEN_PATH=<a personal folder>
export TMPDIR=<redirect this when /tmp is on / and the storage is limited>

NVIDIA GPU Related

  • Always monitor the status of GPUs when you plan to use or are using them. nvitop (pip install nvitop) is recommended.
  • Only use GPUs that are needed for your task! Do NOT blindly use all GPUs! Limit visible GPUs to your processes by export CUDA_VISIBLE_DEVICES=2,3, for example.
    • Monitor the memory usage to help you decide how many GPUs to use.
    • Monitor “GPU-Util”
      • to make sure the GPUs are not idle. If the GPU-Util keeps at 0%, you should suspect that your processes get stuck!
      • to help you optimize your code, if GPU-Util is low.
  • Make sure you don’t have any dead process occupying the GPUs! One way to exit your process: https://blog.co1in.me/skills/py-embed-trick/#Exit-the-program.
    • Show dead processes that cannot be shown by nvidia-smi:
      1
      2
      # sudo needed to view processes of other users
      fuser -v /dev/nvidia*
  • Avoid using GPUs that are in use by other users! (Unless you are very sure it will not affect them by OOM or other reasons.)

Python Related