= Xen GPU Cluster Practice = == 實作一: 如何將Dom0上的顯示卡資源分配給DomU == == Hardware == ||Machine|| Dell !OptiPlex 755 ||Node|| 1 node ||CPU|| Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz ||Memory|| 6GB ||Storage|| 160GB ||Video Card|| NVIDIA !GeForce 9800GT 1GB == Software == ||OS!#1|| Ubuntu 8.10 with Kernel: 2.6.28 x86_64 (non-xen-patched kernel) ||OS!#2|| Ubuntu 8.10 with Kernel: 2.6.22-9 x86_64 (Xen-3.3.1+Lustre patched kernel) [[BR]] == 步驟一: 連線到遠端主機 == '''# 以下兩種連線方式擇一使用.''' [[BR]] rider@cloud:~$ ssh 140.xxx.xxx.xxx [[BR]] rider@cloud:~$ vncviewer 140.xxx.xxx.xxx [[BR]] == 步驟二: 產生一台虛擬機器來使用 CUDA == '''# 設定你想要怎樣規格的虛擬機器.''' rider@cloud:~$ sudo vim /etc/xen-tools/xen-tools.conf [[BR]] {{{ dir = /home install-method = debootstrap size = 8Gb # Disk image size. memory = 1024Mb # Memory size swap = 128Mb # Swap size fs = ext3 # use the EXT3 filesystem for the disk image. dist = hardy # Default distribution to install. ---> For CUDA Support (Ubuntu 8.0.4) image = sparse # Specify sparse vs. full disk images. gateway = 140.XXX.XXX.XXX netmask = 255.255.255.0 broadcast = 140.XXX.XXX.XXX kernel = /boot/vmlinuz-`uname -r` initrd = /boot/initrd.img-`uname -r` mirror = http://gb.archive.ubuntu.com/ubuntu/ ext3_options = noatime,nodiratime,errors=remount-ro ext2_options = noatime,nodiratime,errors=remount-ro xfs_options = defaults reiser_options = defaults }}} rider@cloud:~$ sudo xen-create-image --hostname nvidia --ip 140.XXX.XXX.XXX [[BR]] == 步驟三: 查看你的顯卡資訊 == rider@cloud:~$ lspci -vv {{{ 01:00.0 VGA compatible controller: nVidia Corporation GeForce 9800 GT (rev a2) Subsystem: ASUSTeK Computer Inc. Device 82a0 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: pciback Kernel modules: nvidia, nvidiafb }}} == 步驟四: PCI Frontend Configuration 設定你的 DomU == rider@cloud:~$ sudo vim /etc/xen/nvidia.cfg [[BR]] {{{ kernel = '/boot/vmlinuz-2.6.22.9' ramdisk = '/boot/initrd.img-2.6.22.9' memory = '1024' vcpus = '4' # 配置你的 PCIE 顯示卡 pci = ['01:00.0'] root = '/dev/sda2 ro' disk = [ 'file:/home/domains/nvidia/disk.img,sda2,w', 'file:/home/domains/nvidia/swap.img,sda1,w', ] name = 'nvidia' # # Networking # vif = [ 'ip=140.xxx.xxx.xxx,mac=00:16:3E:AA:70:5C' ] # # Behaviour # on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart' }}} == 步驟五: PCI Backend Configuration 設定你的 Dom0 == rider@cloud:~$ sudo su - [[BR]] '''# Hide the device from dom0 so pciback can take control.''' [[BR]] root@cloud:~$ echo -n "0000:01:00.0" > /sys/bus/pci/drivers/nvidia/unbind [[BR]] '''# Give the dev_ids to pciback, and give it a new slot then bind.''' [[BR]] root@cloud:~$ echo -n "0000:01:00.0" > /sys/bus/pci/drivers/pciback/new_slot [[BR]] root@cloud:~$ echo -n "0000:01:00.0" > /sys/bus/pci/drivers/pciback/bind [[BR]] root@cloud:~$ cat /sys/bus/pci/drivers/pciback/slots [[BR]] {{{ 0000:01:00.0 }}} '''# Caution: Make sure that the device is not controlled by any driver: there should be no driver symlink for nvidia.''' [[BR]] {{{ PATH: /sys/bus/pci/devices/0000:01:00.0/ driver -> ../../../../bus/pci/drivers/nvidia ---> This symlink shouldn't exist. }}} == 步驟六: 硬體直接存取設定 == === Permissive Flag === rider@cloud:~$ sudo vim /etc/xen/xend-pci-permissive.sxp [[BR]] {{{ (unconstrained_dev_ids #('0123:4567:89AB:CDEF') ('0000:01:00.0') ) }}} === User-space Quirks === rider@cloud:~$ sudo vim /etc/xen/xend-pci-quirks.sxp [[BR]] {{{ (pci_ids # Entries are formated as follows: # :[::] ('10de:0605' # NVIDIA 9800GT ) ) }}} == 步驟七: 啟動並登入你的虛擬機器 DomU == '''說明: 用 root 免密碼先登入,然後建立自己的帳號. 改用自己的帳號登入(亦可用 root 登入 , 不新建帳號 ):''' [[BR]] @ Dom0 [[BR]] rider@cloud:~$ sudo xm create -c nvidia.cfg [[BR]] @ DomU [[BR]] root@nvidia:~# adduser username [[BR]] root@nvidia:~# vim /etc/sudoers [[BR]] {{{ username ALL=(ALL) ALL }}} == 步驟八: 設定你的 DomU 基本環境 == '''# 設定 locales (系統語系)''' [[BR]] rider@nvidia:~$ sudo vim /etc/profile [[BR]] {{{ # Locale export LANGUAGE="en_US.UTF-8" export LC_ALL="en_US.UTF-8" export LANG="en_US.UTF-8" }}} rider@nvidia:~$ source /etc/profile [[BR]] rider@nvidia:~$ sudo dpkg-reconfigure locales [[BR]] '''# 更新 PCI ID Database''' [[BR]] rider@nvidia:~$ sudo apt-get update [[BR]] rider@nvidia:~$ sudo apt-get install wget [[BR]] rider@nvidia:~$ sudo update-pciids [[BR]] '''# 查看顯卡資訊有無正常顯示''' [[BR]] rider@nvidia:~$ lspci [[BR]] {{{ 00:00.0 VGA compatible controller: nVidia Corporation GeForce 9800 GT (rev a2) }}} '''# 查看顯卡資源有無順利分配到 DomU''' [[BR]] rider@nvidia:~$ dmesg | grep pci {{{ pcifront pci-0: Installing PCI frontend pcifront pci-0: Creating PCI Frontend Bus 0000:00 pciback 0000:00:00.0: probing... pciback: pcistub_init_devices_late }}} == 實作二: 在虛擬機器( Dom0 / DomU )上試跑 CUDA Examples == == 步驟九: 安裝 CUDA Toolkit & SDK == '''# 安裝環境所需套件''' [[BR]] rider@nvidia:~$ sudo apt-get install autoconf automake build-essential gcc make mesa-common-dev libglu1-mesa-dev mesa-utils libxmu-headers libxmu6 libxmu-dev zlib1g-dev libjpeg62 libjpeg62-dev xutils-dev libxaw-headers libxaw7 libxaw7-dev libxext6 libxext-dev rxvt lwm xauth xvfb xfonts-100dpi xfonts-75dpi culmus xfonts-scalable xfonts-base libtool initramfs-tools libxi6 libxi-dev linux-kernel-devel xserver-xorg xserver-xorg-core xserver-xorg-dev[[BR]] '''# 下載 NVIDIA CUDA toolkit''' [[BR]] rider@nvidia:~$ mkdir -p nvidia [[BR]] rider@nvidia:~$ mkdir -p ./nvidia/cuda [[BR]] rider@nvidia:~$ cd ./nvidia/cuda/ [[BR]] rider@nvidia:~/nvidia/cuda$ wget !http://developer.download.nvidia.com/compute/cuda/2_1/toolkit/cudatoolkit_2.1_linux64_ubuntu8.04.run [[BR]] '''# 下載 NVIDIA CUDA SDK''' [[BR]] rider@nvidia:~/nvidia/cuda$ wget !http://developer.download.nvidia.com/compute/cuda/2_1/SDK/cuda-sdk-linux-2.10.1215.2015-3233425.run [[BR]] rider@nvidia:~/nvidia/cuda$ chmod a+x * [[BR]] rock@cloud:~$ sudo apt-get install autoconf automake build-essential gcc make libtool initramfs-tools libxi6 libxi-dev libxmu6 libxmu-dev xserver-xorg-core xserver-xorg-dev [[BR]] '''# 安裝 NVIDIA CUDA toolkit''' [[BR]] rider@nvidia:~/nvidia/cuda$ sudo sh cudatoolkit_2.1_linux64_ubuntu8.04.run [[BR]] {{{ Enter install path (default /usr/local/cuda, '/cuda' will be appended): }}} # Note: {{{ * Please make sure your PATH includes /usr/local/cuda/bin * Please make sure your LD_LIBRARY_PATH includes /usr/local/cuda/lib * or add /usr/local/cuda/lib to /etc/ld.so.conf and run ldconfig as root * Please read the release notes in /usr/local/cuda/doc/ * To uninstall CUDA, delete /usr/local/cuda * Installation Complete }}} '''# 安裝 NVIDIA CUDA SDK''' [[BR]] rock@cloud:~/nvidia/cuda$ sudo sh cuda-sdk-linux-2.10.1215.2015-3233425.run [[BR]] # Note: {{{ {{{ Enter install path (default /usr/local/cuda, '/cuda' will be appended): /usr/local/NVIDIA_CUDA_SDK }}} }}} {{{ Configuring SDK Makefile (/usr/local/NVIDIA_CUDA_SDK/common/common.mk)... * Please make sure your PATH includes /usr/local/cuda/bin * Please make sure your LD_LIBRARY_PATH includes /usr/local/cuda/lib * To uninstall the NVIDIA CUDA SDK, please delete /usr/local/NVIDIA_CUDA_SDK }}} '''# 設定 CUDA 執行環境''' [[BR]] rider@cloud:~$ sudo su [[BR]] root@cloud:~$ echo "export PATH=$PATH:/usr/local/cuda/bin" >> /etc/profile [[BR]] root@cloud:~$ echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib" >> /etc/profile [[BR]] root@cloud:~$ source /etc/profile [[BR]] root@cloud:~$ echo "/usr/local/cuda/lib" >> /etc/ld.so.conf [[BR]] root@cloud:~$ ldconfig [[BR]] root@cloud:~$ exit == 步驟十: 試跑 CUDA 範例 == '''# 換成 gcc-4.1 來編譯''' [[BR]] rider@nvidia:~$ sudo apt-get install gcc-4.1 g++-4.1 [[BR]] rider@nvidia:~$ sudo rm /usr/bin/gcc [[BR]] rider@nvidia:~$ sudo ln -sf /usr/bin/gcc-4.1 /usr/bin/gcc [[BR]] '''# 進入 CUDA 專案目錄''' [[BR]] rider@nvidia:~$ cd /usr/local/NVIDIA_CUDA_SDK/projects/ [[BR]] '''# 編譯建構全部範例''' [[BR]] rider@nvidia:/usr/local/NVIDIA_CUDA_SDK/$ sudo make [[BR]] '''# 選擇一各 CUDA 範例''' [[BR]] rider@nvidia:/usr/local/NVIDIA_CUDA_SDK/projects$ cd ./deviceQuery/ [[BR]] rider@nvidia:/usr/local/NVIDIA_CUDA_SDK/projects/deviceQuery$ sudo make [[BR]] '''# 進入編譯完成的專案目錄''' [[BR]] rider@nvidia:/usr/local/NVIDIA_CUDA_SDK/projects/deviceQuery$ cd ../../bin/linux/release/ [[BR]] '''# 執行''' [[BR]] rider@nvidia:/usr/local/NVIDIA_CUDA_SDK/bin/linux/release$ sudo ./deviceQuery [[BR]] '''# 輸出結果''' [[BR]] {{{ There is 1 device supporting CUDA Device 0: "GeForce 9800 GT" Major revision number: 1 Minor revision number: 1 Total amount of global memory: 1073414144 bytes Number of multiprocessors: 14 Number of cores: 112 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1.51 GHz Concurrent copy and execution: Yes Test PASSED Press ENTER to exit... }}}