[16] | 1 | Disclaimers |
---|
| 2 | ----------- |
---|
| 3 | |
---|
| 4 | The files in this directory are provided by IBM on an "AS IS" basis |
---|
| 5 | without warranty of any kind. In addition, the results that you obtain |
---|
| 6 | from using these files to measure the general performance of your |
---|
| 7 | General Parallel File System (GPFS) file systems are "AS IS." Your |
---|
| 8 | reliance on any measurements is at your own risk and IBM does not assume |
---|
| 9 | any liability, whatsoever from your use of these files or your use of |
---|
| 10 | resultant performance measurements. The performance of GPFS file |
---|
| 11 | systems is affected by many factors, including the access patterns of |
---|
| 12 | application programs, the configuration and amount of memory on the SP |
---|
| 13 | nodes, the number and characteristics of IBM Virtual Shared Disk (VSD) |
---|
| 14 | servers, the number and speed of disks and disk adapters attached to the |
---|
| 15 | VSD servers, GPFS, VSD and SP switch configuration parameters, other |
---|
| 16 | traffic through the SP switch, etc. As a result, GPFS file system |
---|
| 17 | performance may vary and IBM does not make any particular performance |
---|
| 18 | claims for GPFS file systems. |
---|
| 19 | |
---|
| 20 | |
---|
| 21 | Introduction |
---|
| 22 | ------------ |
---|
| 23 | |
---|
| 24 | The files in this directory serve two purposes: |
---|
| 25 | |
---|
| 26 | - Provide a simple benchmark program (gpfsperf) that can be used to |
---|
| 27 | measure the performance of GPFS for several common file access patterns. |
---|
| 28 | - Give examples of how to use some of the gpfs_fcntl hints and |
---|
| 29 | directives that are new in GPFS version 1.3. |
---|
| 30 | |
---|
| 31 | There are four versions of the program binary built from a single set of |
---|
| 32 | source files. The four versions correspond to all of the possible |
---|
| 33 | combinations of single node/multiple node and with/without features that |
---|
| 34 | only are supported on GPFS version 1.3. Multinode versions of the gpfsperf |
---|
| 35 | program contain -mpi as part of their names, while versions that do not use |
---|
| 36 | features of GPFS requiring version 1.3 have a suffix of -v12 in their names. |
---|
| 37 | |
---|
| 38 | |
---|
| 39 | Parallelism |
---|
| 40 | ----------- |
---|
| 41 | |
---|
| 42 | There are two independent ways to achieve parallelism in the gpfsperf |
---|
| 43 | program. More than one instance of the program can be run on multiple |
---|
| 44 | nodes using Message Passing Interface (MPI) to synchronize their |
---|
| 45 | execution, or a single instance of the program can execute several |
---|
| 46 | threads in parallel on a single node. These two techniques can also be |
---|
| 47 | combined. When describing the behavior of the program, it should be |
---|
| 48 | understood that 'threads' means any of the threads of the gpfsperf |
---|
| 49 | program on any node where MPI runs it. |
---|
| 50 | |
---|
| 51 | When gpfsperf runs on multiple nodes, the multiple instances of the |
---|
| 52 | program communicate using the Message Passing Interface (MPI) to |
---|
| 53 | synchronize their execution and to combine their measurements into an |
---|
| 54 | aggregate throughput result. |
---|
| 55 | |
---|
| 56 | |
---|
| 57 | Access patterns |
---|
| 58 | --------------- |
---|
| 59 | |
---|
| 60 | The gpfsperf program operates on a file that is assumed to consist of a |
---|
| 61 | collection of records, each of the same size. It can generate three |
---|
| 62 | different types of access patterns: sequential, strided, and random. The |
---|
| 63 | meaning of these access patterns in some cases depends on whether or not |
---|
| 64 | parallelism is employed when the benchmark is run. |
---|
| 65 | |
---|
| 66 | The simplest access pattern is random. The gpfsperf program generates a |
---|
| 67 | sequence of random record numbers and reads or writes the corresponding |
---|
| 68 | records. When run on multiple nodes, or when multiple threads are used |
---|
| 69 | within instances of the program, each thread of the gpfsperf program uses a |
---|
| 70 | different seed for its random number generator, so each thread will access |
---|
| 71 | independent sequences of records. Two threads may access the same record |
---|
| 72 | if the same random record number occurs in two sequences. |
---|
| 73 | |
---|
| 74 | In the sequential access pattern, each gpfsperf thread reads from or writes |
---|
| 75 | to a contiguous partition of a file sequentially. For example, suppose that |
---|
| 76 | a 10 billion byte file consists of one million records of 10000 bytes each. |
---|
| 77 | If 10 threads read the file according to the sequential access |
---|
| 78 | pattern, then first thread will read sequentially through the partition |
---|
| 79 | consisting of the first 100,000 records, the next thread will read from the |
---|
| 80 | next 100,000 records, and so on. |
---|
| 81 | |
---|
| 82 | In a strided access pattern, each thread skips some number of records |
---|
| 83 | between each record that it reads or writes. Reading the file from the |
---|
| 84 | example above in a strided pattern, the first thread would read records 0, |
---|
| 85 | 10, 20, ..., 999,990. The second thread would read records 1, 11, 21, ..., |
---|
| 86 | 999,991, and so on. The gpfsperf program by default uses a stride, or |
---|
| 87 | distance between records, equal to the total number of threads operating on |
---|
| 88 | the file, in this case 10. |
---|
| 89 | |
---|
| 90 | |
---|
| 91 | Amount of data to be transferred |
---|
| 92 | -------------------------------- |
---|
| 93 | |
---|
| 94 | One of the input parameters to gpfsperf is the amount of data to be |
---|
| 95 | transferred. This is the total number of bytes to be read or written by all |
---|
| 96 | threads of the program. If there are T threads in total, and the total |
---|
| 97 | number of bytes to be transferred is N, each thread will read or write about |
---|
| 98 | N/T bytes, rounded to a multiple of the record size. By default, gpfsperf |
---|
| 99 | sets N to the size of the file. For the sequential or strided access |
---|
| 100 | pattern, this default means that every record in the file will be read or |
---|
| 101 | written exactly once. |
---|
| 102 | |
---|
| 103 | If N is greater than the size of the file, each thread will read or write |
---|
| 104 | its partition of the file repeatedly until reaching its share (N/T) of the |
---|
| 105 | bytes to be transferred. For example, suppose 10 threads sequentially read |
---|
| 106 | a 10 billion byte file of 10000 byte records when N is 15 billion. The |
---|
| 107 | first thread will read the first 100,000 records, then reread the first |
---|
| 108 | 50,000 records. The second thread will read records 100,000 through |
---|
| 109 | 199,999, then reread records 100,000 through 149,999, and so on. |
---|
| 110 | |
---|
| 111 | When using strided access patterns with other than the default stride, this |
---|
| 112 | behavior of gpfsperf can cause unexpected results. For example, suppose |
---|
| 113 | that 10 threads read the 10 billion byte file using a strided access |
---|
| 114 | pattern, but instead of the default stride of 10 records gpfsperf is told to |
---|
| 115 | use a stride of 10000 records. The file partition read by the first thread |
---|
| 116 | will be records 0, 10000, 20000, ..., 990,000. This is only 100 distinct |
---|
| 117 | records of 10000 bytes each, or a total of only 1 million bytes of data. |
---|
| 118 | This data will likely remain in the GPFS buffer pool after it is read the |
---|
| 119 | first time. If N is more than 10 million bytes, gpfsperf will "read" the |
---|
| 120 | same buffered data multiple times, and the reported data rate will appear |
---|
| 121 | anomalously high. To avoid this effect, performance tests using non-default |
---|
| 122 | strides should reduce N in the same proportion as the stride was increased |
---|
| 123 | from its default. |
---|
| 124 | |
---|
| 125 | |
---|
| 126 | Computation of aggregate data rate and utilization |
---|
| 127 | -------------------------------------------------- |
---|
| 128 | |
---|
| 129 | The results of a run of gpfsperf are reported as an aggregate data rate. |
---|
| 130 | Data rate is defined as the total number of bytes read or written by all |
---|
| 131 | threads divided by the total time of the test. It is reported in units of |
---|
| 132 | 1000 bytes/second. The test time is measured from before any node opens the |
---|
| 133 | test file until after the last node closes the test file. To insure a |
---|
| 134 | consistent environment for each test, before beginning the timed test period |
---|
| 135 | gpfsperf issues the GPFS_CLEAR_FILE_CACHE hint on all nodes. This flushes |
---|
| 136 | the GPFS buffer cache and releases byte-range tokens. Note that versions of |
---|
| 137 | GPFS prior to v1.3 do not support GPFS file hints, so the state of the |
---|
| 138 | buffer cache at the beginning of a test will be influenced by the state left |
---|
| 139 | by the prior test. |
---|
| 140 | |
---|
| 141 | Since all threads of a gpfsperf run do approximately the same amount of work |
---|
| 142 | (read or write N/T bytes), in principle they should all run for the same |
---|
| 143 | amount of time. In practice, however, variations in disk and switch |
---|
| 144 | response time lead to variations in execution times among the threads. Lock |
---|
| 145 | contention in GPFS further contributes to these variations in execution |
---|
| 146 | times. A large degree of non-uniformity is undesirable, since it means that |
---|
| 147 | some nodes are idle while they wait for threads on other nodes to finish. |
---|
| 148 | To measure the degree of uniformity of thread execution times, gpfsperf |
---|
| 149 | computes a quantity it calls "utilization." Utilization is the fraction of |
---|
| 150 | the total number of thread-seconds in a test during which threads actively |
---|
| 151 | perform reads or writes. A value of 1.0 indicates perfect overlap, while |
---|
| 152 | lower values denote that some threads were idle while others still ran. |
---|
| 153 | |
---|
| 154 | The following timeline illustrates how gpfsperf computes utilization for a |
---|
| 155 | test involving one thread on each of two nodes, reading a total of 100M: |
---|
| 156 | |
---|
| 157 | time event |
---|
| 158 | ---- ----- |
---|
| 159 | 0.0 Node 0 captures timestamp for beginning of test |
---|
| 160 | ... both nodes open file |
---|
| 161 | 0.1 Node 0 about to do first read |
---|
| 162 | 0.1 Node 1 about to do first read |
---|
| 163 | ... many file reads |
---|
| 164 | 4.9 Node 0 finishes last read |
---|
| 165 | 4.9 Node 1 finishes last read |
---|
| 166 | ... both nodes close file |
---|
| 167 | 5.0 Node 0 captures timestamp for end of test |
---|
| 168 | |
---|
| 169 | The utilization is ((4.9-0.1) + (4.9-0.1)) / 2*(5.0-0.0) = 0.96. The |
---|
| 170 | reported aggregate data rate would be 100M / (5.0-0.0) = 20M/sec. |
---|
| 171 | |
---|
| 172 | If node 0 ran significantly slower than node 1, it might have finished its |
---|
| 173 | last read at time 9.9 instead of at time 4.9, and the end of test timestamp |
---|
| 174 | might be 10.0 instead of 5.0. In this case the utilization would drop to |
---|
| 175 | ((4.9-0.1) + (9.9-0.1)) / 2*(10.0-0.0) = 0.73, and the data rate would be |
---|
| 176 | 100M / (10.0-0.0) = 10M/sec. |
---|
| 177 | |
---|
| 178 | |
---|
| 179 | Command line parameters |
---|
| 180 | ----------------------- |
---|
| 181 | |
---|
| 182 | There are four versions of gpfsperf in this directory: |
---|
| 183 | gpfsperf-mpi - runs on multiple nodes under MPI, requires GPFS v1.3 or later |
---|
| 184 | gpfsperf - runs only on a single node, requires GPFS v1.3 or later |
---|
| 185 | |
---|
| 186 | The command line for any of the versions of gpfsperf is: |
---|
| 187 | |
---|
| 188 | gpfsperf[-mpi] operation pattern fn [options] |
---|
| 189 | |
---|
| 190 | The order of parameters on the command line is not significant. |
---|
| 191 | |
---|
| 192 | The operation must be either "create", "read", "write", or "uncache". All |
---|
| 193 | threads in a multinode or multithreaded run of gpfsperf do the same |
---|
| 194 | operation to the same file, but at different offsets. Create means to |
---|
| 195 | insure that the file exists, then do a write test. The uncache operation |
---|
| 196 | does not read or write the file, but only removes any buffered data for the |
---|
| 197 | file from the GPFS buffer cache. |
---|
| 198 | |
---|
| 199 | Pattern must be one of "rand", "randhint", "strided", or "seq". The |
---|
| 200 | meanings of each of these was explained earlier, except for "randhint". The |
---|
| 201 | "randhint" pattern is the same as "rand", except that the GPFS multiple |
---|
| 202 | access range hint is used through the library functions in irreg.c to |
---|
| 203 | prefetch blocks before they are accessed by gpfsperf. |
---|
| 204 | |
---|
| 205 | The filename parameter fn should resolve to a file in a GPFS file system |
---|
| 206 | that is mounted on all nodes where gpfsperf is to run. The file must |
---|
| 207 | already exist unless the "create" operation was specified. Use of gpfsperf |
---|
| 208 | on files not in GPFS may be meaningful in some situations, but this use has |
---|
| 209 | not been tested. |
---|
| 210 | |
---|
| 211 | Each optional parameter is described below, along with its default value. |
---|
| 212 | |
---|
| 213 | -nolabels - Produce a single line of output containing all parameters |
---|
| 214 | and the measured data rate. Format of the output line is 'op |
---|
| 215 | pattern fn recordSize nBytes fileSize nProcs nThreads |
---|
| 216 | strideRecs inv ds dio fsync reltoken aio osync rate util'. |
---|
| 217 | This format may be useful for importing results into a |
---|
| 218 | spreadsheet or other program for further analysis. The |
---|
| 219 | default is to produce multi-line labelled output. |
---|
| 220 | |
---|
| 221 | -r recsize - Record size. Defaults to filesystem block size. Must be |
---|
| 222 | specified for create operations. |
---|
| 223 | |
---|
| 224 | -n nBytes - Number of bytes to transfer. Defaults to file size. Must be |
---|
| 225 | specified for create operations. |
---|
| 226 | |
---|
| 227 | -s stride - Number of bytes between successive accesses by the |
---|
| 228 | same thread. Only meaningful for strided access patterns. |
---|
| 229 | Must be a multiple of the record size. See earlier cautions |
---|
| 230 | about combining large values of -s with large values of -n. |
---|
| 231 | Default is number of threads times number of processes. |
---|
| 232 | |
---|
| 233 | -th nThreads - Number of threads per process. Default is 1. When there |
---|
| 234 | are multiple threads per process they read adjacent blocks of |
---|
| 235 | the file for the sequential and strided access patterns. For |
---|
| 236 | example, suppose a file of 60 records is being read by 3 |
---|
| 237 | nodes with 2 threads per node. Under the sequential pattern, |
---|
| 238 | thread 0 on node 0 will read records 0-9, thread 1 on node 0 |
---|
| 239 | will read records 10-19, thread 0 on node 1 will read records |
---|
| 240 | 20-29, etc. Under a strided pattern, thread 0 on node 0 will |
---|
| 241 | read records 0, 6, 12, ..., 54, thread 1 on node 0 will read |
---|
| 242 | records 1, 7, 13, ..., 55, etc. |
---|
| 243 | |
---|
| 244 | -noinv - Do not clear blocks of fn from the GPFS file cache before |
---|
| 245 | starting the test. The default is to clear the cache. If |
---|
| 246 | this option is given, the results of the test can depend |
---|
| 247 | strongly on the lock and buffer state left by the last test. |
---|
| 248 | For example, a multinode sequential read with -noinv will run |
---|
| 249 | more slowly after a strided write test than after a |
---|
| 250 | sequential write test. |
---|
| 251 | |
---|
| 252 | -ds - Use GPFS data shipping. Data shipping avoids lock conflicts |
---|
| 253 | by partitioning the file among the nodes running gpfsperf, |
---|
| 254 | turning off byte range locking, and sending messages to the |
---|
| 255 | appropriate agent node to handle each read or write request. |
---|
| 256 | However, since (n-1)/n of the accesses are remote, the |
---|
| 257 | gpfsperf threads cannot take advantage of local block |
---|
| 258 | caching, although they may still benefit from prefetching and |
---|
| 259 | writebehind. Also, since byte range locking is not in |
---|
| 260 | effect, use of data shipping suspends the atomicity |
---|
| 261 | guarantees of X/Open file semantics. See the GPFS Guide and |
---|
| 262 | Reference manual for more details. Data shipping should show |
---|
| 263 | the largest performance benefit for strided writes that have |
---|
| 264 | small record sizes. The default is not to use data shipping. |
---|
| 265 | |
---|
| 266 | -aio depth - use Asynch I/O, prefetching to depth (default 0, max 1000). |
---|
| 267 | This can be used with any of the seq/rand/strided test patterns. |
---|
| 268 | |
---|
| 269 | -dio - Use direct IO flag when opening the file. This will allow |
---|
| 270 | sector aligned/sized buffers to be read/written directly |
---|
| 271 | from the application buffer to the disks where the blocks |
---|
| 272 | are allocated. |
---|
| 273 | |
---|
| 274 | -reltoken - Release the entire file byte-range token after the file |
---|
| 275 | is newly created. In a multi-node environment (MPI), only |
---|
| 276 | the first process will create the file and all the other |
---|
| 277 | processes will wait and open the file after the creation |
---|
| 278 | has occurred. This flag tell the first process to release |
---|
| 279 | the byte-range token it automatically gets during the create. |
---|
| 280 | This may increase performance because other nodes that work |
---|
| 281 | on different ranges of the file will not need to revoke the |
---|
| 282 | range held by the node running the first process. |
---|
| 283 | |
---|
| 284 | -fsync - Insure that no dirty data remain buffered at the conclusion |
---|
| 285 | of a write or create test. The time to perform the necessary |
---|
| 286 | fsync operation is included in the test time, so this option |
---|
| 287 | reduces the reported aggregate data rate. The default is not |
---|
| 288 | to fsync the file. |
---|
| 289 | |
---|
| 290 | -osync - Turn on the O_SYNC flag when opening the file. |
---|
| 291 | This causes every write operation to force the data to disk |
---|
| 292 | on each call. The default is not osync. |
---|
| 293 | |
---|
| 294 | -v - Verbose tracing. In a multinode test using gpfsperf, output |
---|
| 295 | from each instance of the program will be intermingled. By |
---|
| 296 | telling MPI to label the output from each node (MP_LABELIO |
---|
| 297 | environment variable =yes), the verbose output will make more |
---|
| 298 | sense. |
---|
| 299 | |
---|
| 300 | -V - Very verbose tracing. This option will display the offset |
---|
| 301 | of every read or write operation on every node. As with -v, |
---|
| 302 | labelling the output by node is suggested. |
---|
| 303 | |
---|
| 304 | Numbers in options can be given using K, M, or G suffixes, in upper or lower |
---|
| 305 | case, to denote 2**10, 2**20, or 2**30, respectively, or can have an R or r |
---|
| 306 | suffix to denote a multiple of the record size. For example, to specify a |
---|
| 307 | record size of 4096 bytes and a size to read or write of 409600, one could |
---|
| 308 | write "-r 4k -n 100r". |
---|
| 309 | |
---|
| 310 | AIX only: |
---|
| 311 | If the AsynchronousIO (AIO) kernel extension has not been loaded yet, |
---|
| 312 | running the gpfsperf program will fail and display output like: |
---|
| 313 | exec(): 0509-036 Cannot load program gpfsperf because of the following errors: |
---|
| 314 | 0509-130 Symbol resolution failed for /usr/lib/threads/libc.a(aio.o) because: |
---|
| 315 | 0509-136 Symbol kaio_rdwr (number 0) is not exported from |
---|
| 316 | dependent module /unix. |
---|
| 317 | 0509-136 Symbol listio (number 1) is not exported from |
---|
| 318 | dependent module /unix. |
---|
| 319 | 0509-136 Symbol acancel (number 2) is not exported from |
---|
| 320 | dependent module /unix. |
---|
| 321 | 0509-136 Symbol iosuspend (number 3) is not exported from |
---|
| 322 | dependent module /unix. |
---|
| 323 | 0509-136 Symbol aio_nwait (number 4) is not exported from |
---|
| 324 | dependent module /unix. |
---|
| 325 | 0509-192 Examine .loader section symbols with the |
---|
| 326 | 'dump -Tv' command. |
---|
| 327 | |
---|
| 328 | If you do not wish to use AIO, you can recompile the gpfsperf program to not |
---|
| 329 | use the AIO calls: |
---|
| 330 | rm gpfsperf.o gpfsperf-mpi.o |
---|
| 331 | make OTHERINCL="-DNO_AIO" |
---|
| 332 | |
---|
| 333 | Enable AIO on your system, by doing commands like the following: |
---|
| 334 | lsattr -El aio0 |
---|
| 335 | chdev -l aio0 -P -a autoconfig=available -a minservers=10 -a maxservers=128 |
---|
| 336 | mkdev -l aio0 |
---|
| 337 | Minservers just tells AIX how many AIO kprocs to create immediately, and |
---|
| 338 | maxservers limits the total number created. On AIX 5.2 the meaning of |
---|
| 339 | maxservers has changed to mean "maximum number of servers per CPU", so a |
---|
| 340 | 16-way SMP should set maxservers=8 to get a total of 128 kprocs. |
---|
| 341 | |
---|
| 342 | |
---|
| 343 | |
---|
| 344 | Examples |
---|
| 345 | -------- |
---|
| 346 | |
---|
| 347 | Suppose that /gpfs is a GPFS file system that was formatted with 256K |
---|
| 348 | blocks and that it has at least a gigabyte of free space. Assuming that |
---|
| 349 | the gpfsperf programs have been copied into /gpfs/test, and that |
---|
| 350 | /gpfs/test is the current directory, the following ksh commands |
---|
| 351 | illustrate how to run gpfsperf: |
---|
| 352 | |
---|
| 353 | # Number of nodes on which the test will run. If this is increased, the |
---|
| 354 | # size of the test file should also be increased. |
---|
| 355 | export MP_PROCS=8 |
---|
| 356 | |
---|
| 357 | # File containing a list of nodes on which gpfsperf will run. There are |
---|
| 358 | # other ways to specify where the test runs besides using an explicit |
---|
| 359 | # host list. See the Parallel Operating Environment documentation for |
---|
| 360 | # details. |
---|
| 361 | export MP_HOSTFILE=/etc/cluster.nodes |
---|
| 362 | |
---|
| 363 | # Name of test file to be manipulated by the tests that follow. |
---|
| 364 | export fn=/gpfs/test/testfile |
---|
| 365 | |
---|
| 366 | # Verify block size |
---|
| 367 | mmlsfs gpfs -B |
---|
| 368 | |
---|
| 369 | # Create test file. All write tests in these examples specify -fsync, so |
---|
| 370 | # the reported data rate includes the overhead of flushing all dirty buffers |
---|
| 371 | # to disk. The size of the test file should be increased if more than 8 |
---|
| 372 | # nodes are used or if GPFS pagepool sizes have been increased from their |
---|
| 373 | # defaults. It may be necessary to increase the maximum size file the user |
---|
| 374 | # is allowed to create. See the ulimit command. |
---|
| 375 | ./gpfsperf-mpi create seq $fn -n 999m -r 256k -fsync |
---|
| 376 | |
---|
| 377 | # Read entire test file sequentially |
---|
| 378 | ./gpfsperf-mpi read seq $fn -r 256k |
---|
| 379 | |
---|
| 380 | # Rewrite test file sequentially using full block writes |
---|
| 381 | ./gpfsperf-mpi write seq $fn -r 256k -fsync |
---|
| 382 | |
---|
| 383 | # Rewrite test file sequentially using small writes. This requires GPFS to |
---|
| 384 | # read blocks in order to update them, so will have worse performance than |
---|
| 385 | # the full block rewrite. |
---|
| 386 | ./gpfsperf-mpi write seq $fn -r 64k -fsync |
---|
| 387 | |
---|
| 388 | # Strided read using big records |
---|
| 389 | ./gpfsperf-mpi read strided $fn -r 256k |
---|
| 390 | |
---|
| 391 | # Strided read using medium sized records. Performance is worse because |
---|
| 392 | # average I/O size has gone down. This behavior will not be seen unless |
---|
| 393 | # the stride is larger than a block (8*50000 > 256K). |
---|
| 394 | ./gpfsperf-mpi read strided $fn -r 50000 |
---|
| 395 | |
---|
| 396 | # Strided read using a very large stride. Reported performance is |
---|
| 397 | # misleading because each node just reads the same records over and over |
---|
| 398 | # from its GPFS buffer cache. |
---|
| 399 | ./gpfsperf-mpi read strided $fn -r 50000 -s 2400r |
---|
| 400 | |
---|
| 401 | # Strided write using a record size equal to the block size. Decent |
---|
| 402 | # performance, since record size matches GPFS lock granularity. |
---|
| 403 | ./gpfsperf-mpi write strided $fn -r 256k -fsync |
---|
| 404 | |
---|
| 405 | # Strided write using small records. Since GPFS lock granularity is |
---|
| 406 | # larger than a record, performance is much worse. Number of bytes |
---|
| 407 | # written is less than the entire file to keep test time reasonable. |
---|
| 408 | ./gpfsperf-mpi write strided $fn -r 10000 -n 100m -fsync |
---|
| 409 | |
---|
| 410 | # Strided write using small records and data shipping. Data shipping |
---|
| 411 | # trades additional communication overhead for less lock contention, |
---|
| 412 | # improving performance. |
---|
| 413 | ./gpfsperf-mpi write strided $fn -r 10000 -n 100m -ds -fsync |
---|
| 414 | |
---|
| 415 | # Random read of small records |
---|
| 416 | ./gpfsperf-mpi read rand $fn -r 10000 -n 100m |
---|
| 417 | |
---|
| 418 | # Random read of small records using the GPFS multiple access range hint. |
---|
| 419 | # Better performance (assuming more than MP_PROCS disks) because each node |
---|
| 420 | # has more than one disk read in progress at once due to prefetching. |
---|
| 421 | ./gpfsperf-mpi read randhint $fn -r 10000 -n 100m |
---|
| 422 | |
---|