wiki:Xen_GPU_cluster

Version 11 (modified by rider, 15 years ago) (diff)

--

Xen GPU cluster

Hardware

Machine Dell OptiPlex 755
Node 9 nodes
CPU Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Memory 6GB/node
Storage 160GB/node
Video Card NVIDIA GeForce 9800GT 1GB/node

Software

OS Ubuntu 8.04.2 with Kernel: 2.6.24-23-server x86_64


Part 1 Build essential environment

1.1 - Basic Environment

# NVIDIA CUDA driver #
rock@cloud:~/nvidia/cuda$ wget http://developer.download.nvidia.com/compute/cuda/2_1/drivers/NVIDIA-Linux-x86_64-180.22-pkg2.run
# NVIDIA CUDA toolkit #
rock@cloud:~/nvidia/cuda$ wget http://developer.download.nvidia.com/compute/cuda/2_1/toolkit/cudatoolkit_2.1_linux64_ubuntu8.04.run
# NVIDIA CUDA SDK #
rock@cloud:~/nvidia/cuda$ wget http://developer.download.nvidia.com/compute/cuda/2_1/SDK/cuda-sdk-linux-2.10.1215.2015-3233425.run
rock@cloud:~$ sudo apt-get install autoconf automake build-essential gcc make libtool initramfs-tool libxi6 libxi-dev libxmu6 libxmu-dev linux-kernel-devel linux-headers-2.6.24-23-server
rock@cloud:~$ sudo ln -sf /usr/src/linux-headers-2.6.24-23 /usr/src/linux
rock@cloud:~/nvidia/cuda$ sudo sh NVIDIA-Linux-x86_64-180.22-pkg2.run
rock@cloud:~$ sudo mkdir /opt/cuda
rock@cloud:~/nvidia/cuda$ sudo sh cudatoolkit_2.1_linux64_ubuntu8.04.run

Enter install path (default /usr/local/cuda, '/cuda' will be appended): /opt

# Note:

* Please make sure your PATH includes /opt/cuda/bin
* Please make sure your LD_LIBRARY_PATH includes /opt/cuda/lib
*   or add /opt/cuda/lib to /etc/ld.so.conf and run ldconfig as root

* Please read the release notes in /opt/cuda/doc/

* To uninstall CUDA, delete /opt/cuda
* Installation Complete

rock@cloud:~$ sudo mkdir /opt/NVIDIA_CUDA_SDK
rock@cloud:~/nvidia/cuda$ sudo sh cuda-sdk-linux-2.10.1215.2015-3233425.run

# Note:

{{{
Enter install path (default /usr/local/cuda, '/cuda' will be appended): /opt/cuda
}}}

Configuring SDK Makefile (/opt/NVIDIA_CUDA_SDK/common/common.mk)...

* Please make sure your PATH includes /opt/cuda/bin
* Please make sure your LD_LIBRARY_PATH includes /opt/cuda/lib

* To uninstall the NVIDIA CUDA SDK, please delete /opt/NVIDIA_CUDA_SDK

rock@cloud:~$ sudo vim /etc/profile

Add:
export PATH=$PATH:/opt/cuda/bin

rock@cloud:~$ source /etc/profile
rock@cloud:~$ sudo vim /etc/ld.so.conf

Add:
/opt/cuda/lib

rock@cloud:~$ sudo ldconfig

/opt/cuda/lib:
	libcublasemu.so.2 -> libcublasemu.so.2.1
	libcufftemu.so.2 -> libcufftemu.so.2.1
	libcublas.so.2 -> libcublas.so.2.1
	libcudart.so.2 -> libcudart.so.2.1
	libcufft.so.2 -> libcufft.so.2.1

1.2 NVIDIA Driver HowTo

# Rock said that the unknown identification of the VGA device might be the "pciids" problem.
Sol1:
rock@cloud:~$ sudo update-pciids <older version>
Sol2:
rock@cloud:~$ wget http://pciids.sourceforge.net/v2.2/pci.ids <latest version>
rock@cloud:~$ sudo cp pci.ids /usr/share/misc/
rock@cloud:~$ sudo lspci -v -v (the Unknown device 82a0?)

01:00.0 VGA compatible controller: nVidia Corporation GeForce 9800 GT (rev a2) (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Unknown device 82a0
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
	Region 5: I/O ports at dc80 [size=128]
	Expansion ROM at fea00000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express Endpoint IRQ 0
		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag+
		Device: Latency L0s <512ns, L1 <4us
		Device: AtnBtn- AtnInd- PwrInd-
		Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
		Device: RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
		Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
		Link: Supported Speed 2.5Gb/s, Width x16, ASPM L0s L1, Port 0
		Link: Latency L0s <512ns, L1 <1us
		Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
		Link: Speed 2.5Gb/s, Width x16

rock@cloud:~$ less /var/log/Xorg.0.log | grep nVidia (unknown chipset (0x0605) rev 162)

(--) PCI:*(1:0:0) nVidia Corporation unknown chipset (0x0605) rev 162, Mem @ 0xfd000000/24, 0xd0000000/28, 0xfa000000/25, I/O @ 0xdc80/7, BIOS @ 0xfea00000/17

DeviceID(0x0605) is unknown?

rock@cloud:~$ less /usr/share/misc/pci.ids | grep 9800

	0601  GeForce 9800 GT 512
	0604  GeForce 9800 GX2
        0605  GeForce 9800 GT
	0612  GeForce 9800 GTX
	0613  GeForce 9800 GTX+
	0614  GeForce 9800 GT
	0617  GeForce 9800M GTX
	10de  GeForce 9800M GTX

rock@cloud:~$ sudo Xorg -scanpci

Probing for PCI devices (Bus:Device:Function)

(0:0:0) unknown card (0x1028/0x0211) using a Intel Corporation DRAM Controller
(0:1:0) Intel Corporation PCI Express Root Port
(0:3:0) unknown card (0x1028/0x0211) using a Intel Corporation MEI Controller
(0:3:2) unknown card (0x1028/0x0211) using a Intel Corporation PT IDER Controller
(0:3:3) unknown card (0x1028/0x0211) using a Intel Corporation Serial KT Controller
(0:25:0) unknown card (0x1028/0x0211) using a Intel Corporation 82566DM-2 Gigabit Network Connection
(0:26:0) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4
(0:26:1) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5
(0:26:7) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2
(0:27:0) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) HD Audio Controller
(0:28:0) Intel Corporation 82801I (ICH9 Family) PCI Express Port 1
(0:29:0) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1
(0:29:1) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2
(0:29:2) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3
(0:29:7) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1
(0:30:0) Intel Corporation 82801 PCI Bridge
(0:31:0) Intel Corporation LPC Interface Controller
(0:31:2) unknown card (0x1028/0x0211) using a Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller
(0:31:3) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) SMBus Controller
(0:31:5) unknown card (0x1028/0x0211) using a Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller

---> (1:0:0) unknown card (0x1043/0x82a0) using an unknown chip (DeviceId 0x0605) from nVidia Corporation

rock@cloud:~$ sudo /etc/X11/xorg.conf

# Allocate the BusID for the VGA Device
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    BusID          "PCI:1:0:0"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce 9800 GT"
    Option         "RenderAccel" "True"
    Option         "UseEdidDpi" "False"
EndSection

rock@cloud:~$ less /var/log/Xorg.0.log | grep NVIDIA

(II) Module glx: vendor="NVIDIA Corporation"
(II) NVIDIA GLX Module  180.22  Tue Jan  6 09:40:07 PST 2009
(II) Module nvidia: vendor="NVIDIA Corporation"
(II) NVIDIA dlloader X Driver  180.22  Tue Jan  6 09:21:40 PST 2009
(II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
(--) Chipset NVIDIA GPU found
(**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
(==) NVIDIA(0): RGB weight 888
(==) NVIDIA(0): Default visual is TrueColor
(==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
(**) NVIDIA(0): Option "RenderAccel" "True"
(**) NVIDIA(0): Option "UseEdidDpi" "False"
(**) NVIDIA(0): Enabling RENDER acceleration
(II) NVIDIA(0): Support for GLX with the Damage and Composite X extensions is
(II) NVIDIA(0):     enabled.
(II) NVIDIA(0): NVIDIA GPU GeForce 9800 GT (G92) at PCI:1:0:0 (GPU-0)
(--) NVIDIA(0): Memory: 1048576 kBytes
(--) NVIDIA(0): VideoBIOS: 62.92.53.00.00
(II) NVIDIA(0): Detected PCI Express Link width: 16X
(--) NVIDIA(0): Interlaced video modes are supported on this GPU
(--) NVIDIA(0): Connected display device(s) on GeForce 9800 GT at PCI:1:0:0:
(--) NVIDIA(0):     ViewSonic VA721 (CRT-0)
(--) NVIDIA(0): ViewSonic VA721 (CRT-0): 400.0 MHz maximum pixel clock
(II) NVIDIA(0): Assigned Display Device: CRT-0
(==) NVIDIA(0): 
(==) NVIDIA(0): No modes were requested; the default mode "nvidia-auto-select"
(==) NVIDIA(0):     will be used as the requested mode.
(==) NVIDIA(0): 
(II) NVIDIA(0): Validated modes:
(II) NVIDIA(0):     "nvidia-auto-select"
(II) NVIDIA(0): Virtual screen size determined to be 1280 x 1024
(==) NVIDIA(0): DPI set to (75, 75); computed from built-in default
(==) NVIDIA(0): Enabling 32-bit ARGB GLX visuals.
(II) NVIDIA(0): Initialized GPU GART.
(II) NVIDIA(0): Setting mode "nvidia-auto-select"
(II) NVIDIA(0): NVIDIA 3D Acceleration Architecture Initialized
(==) NVIDIA(0): Disabling shared memory pixmaps
(II) NVIDIA(0): Using the NVIDIA 2D acceleration architecture
(==) NVIDIA(0): Backing store disabled
(==) NVIDIA(0): Silken mouse enabled
(**) NVIDIA(0): DPMS enabled

rock@cloud:~$ sudo glxinfo -display :0

#It seems that the 3D accerlation works fine without any trouble.
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce 9800 GT/PCI/SSE2
OpenGL version string: 2.1.2 NVIDIA 180.22

1.3 NVIDIA GPU StatusCheck

rock@cloud:~$ sudo nvidia-xconfig -query-gpu-info

# GPU Status check
Number of GPUs: 1

GPU #0:
  Name      : GeForce 9800 GT
  PCI BusID : PCI:1:0:0

  Number of Display Devices: 1

  Display Device 0 (CRT-0):
     EDID Name             : ViewSonic VA721
     Minimum HorizSync     : 30.000 kHz
     Maximum HorizSync     : 82.000 kHz
     Minimum VertRefresh   : 50 Hz
     Maximum VertRefresh   : 85 Hz
     Maximum PixelClock    : 140.000 MHz
     Maximum Width         : 1280 pixels
     Maximum Height        : 1024 pixels
     Preferred Width       : 1280 pixels
     Preferred Height      : 1024 pixels
     Preferred VertRefresh : 60 Hz
     Physical Width        : 340 mm
     Physical Height       : 270 mm

rock@cloud:~$ sudo nvidia-smi

Gpus found in probe:
Found Gpuid 0x1000
Attaching all probed Gpus...OK
Getting unit information...OK
Getting all static information..

Part 2 CUDA HowTo

2.1 NVIDIA CUDA Example

Attachments (5)

Download all attachments as: .zip