## Sample configuration file ## Numbers may end with a single letter: ## k or K meaning 1024 ## m or M meaning 1048576 (1024*1024) ## ## The '#' character is the comment character. Any parameter ## modified herein should have any preceding '#' removed. ## ######## Memory / Shared Segment Configuration ######## ## The pagepool is used for I/O buffers. It is always pinned. ## The allowable range is 4M to 512M (AIX). ## The allowable range is 4M to 1300M (LINUX). #pagepool 64M ## maxblocksize controls the maximum file system block size allowed. ## File systems with larger block sizes cannot be mounted or created ## unless the value of maxblocksize is increased. ## The allowable range is 16K to 16M ## default: maxblocksize 1M #maxblocksize ## Maximum number of files to cache. If the number of concurrently open ## files is bigger, then the number of cached files will exceed this value. ## The allowable range is 1 to 100000 #maxFilesToCache 1000 ## The maximum number of stat cache entries. ## The default is 4 times the value of the maxFilesToCache parameter. ## The allowable range is 0 to 10000000 #maxStatCache ######## DMAPI configuration ######## ## The dmapiEventTimeout parameter controls the blocking of file operation ## threads of NFS and DFS, while in the kernel waiting for the handling of ## a DMAPI synchronous event. The parameter value is the maximum time, in ## milliseconds, the thread will block. When this time expires, the file ## operation returns ENOTREADY, and the event continues asynchronously. ## The NFS/DFS server is expected to repeatedly retry the operation, which ## eventually will find the response of the original event and continue. ## This mechanism applies only to read, write and truncate events, and only ## when such events come from NFS and DFS server threads. The timeout value ## is given in milliseconds. The value 0 indicates immediate timeout (fully ## asynchronous event). A value greater or equal 86400000 (which is 24 hours) ## is considered "infinity" (no timeout, fully synchronous event). ## The default value is 86400000. #dmapiEventTimeout 86400000 ## The dmapiSessionFailureTimeout parameter controls the blocking of file ## operation threads, while in the kernel, waiting for the handling of a DMAPI ## synchronous event that is enqueued on a session that has suffered a failure. ## The parameter value is the maximum time, in seconds, the thread will wait ## for the recovery of the failed session. When this time expires and the ## session has not yet recovered, the event is aborted and the file operation ## fails, returning the EIO error. The timeout value is given in full seconds. ## The value 0 indicates immediate timeout (immediate failure of the file ## operation). A value greater or equal 86400 (which is 24 hours) is considered ## "infinity" (no timeout, indefinite blocking until the session recovers). ## The default value is 0. #dmapiSessionFailureTimeout 0 ## The dmapiMountTimeout parameter controls the blocking of mount operations, ## waiting for a disposition for the mount event to be set. This timeout is ## activated at most once on each node, by the first external mount of a ## file system which has DMAPI enabled, and only if there has never before ## been a mount disposition. Any mount operation on this node that starts ## while the timeout period is active will wait for the mount disposition. The ## parameter value is the maximum time, in seconds, that the mount operation ## will wait for a disposition. When this time expires and there is still no ## disposition for the mount event, the mount operation fails, returning the ## EIO error. The timeout value is given in full seconds. The value 0 indicates ## immediate timeout (immediate failure of the mount operation). A value ## greater or equal 86400 (which is 24 hours) is considered "infinity" (no ## timeout, indefinite blocking until the there is a disposition). ## The default value is 60. #dmapiMountTimeout 60 ######## Prefetch tuning ########## ## The value of the 'prefetchThreads' parameter controls the maximum ## possible number of threads dedicated to prefetching data for ## files that are read sequentially, or to handle sequential writebehind. ## The actual degree of parallelism for prefetching is determined ## dynamically in the daemon. ## (minimum 2, maximum 104) #prefetchThreads 72 ## The 'worker1Threads' parameter controls the maximum number of threads ## that are used to handle other operations associated with data access. ## The primary use is for random read/write requests that cannot be prefetched. ## random IO requests or small file activity. ## (minimum 1, maximum 64) #worker1Threads 48 ## maxMBpS is an estimate of how many MB per sec of data can be transferred ## in or out of a single node. The value is used in calculating the ## amount of IO that can be done to effectively prefetch data for readers ## and/or or write-behind data from writers. The maximum number of IOs in ## progress concurrantly will be 2 * min(nDisks, maxMBpS*avgIOtime/blockSize), ## where nDisks is the number disks that make a filesystem, ## avgIOtime is a measured average of the last 16 full block IO times, and ## blockSize is the block size for a full block in the filesystem (e.g. 256K). ## By lowering this value, you can artificially limit how much IO one node ## can put on all the VSD servers, if there are lots of nodes that ## can overrun a few VSD servers. Setting this too high will usually ## not hurt because of other limiting factors such as the size of the ## pagepool, or the number of prefetchThreads or worker1Threads. #maxMBpS 150 ######## Problem determination Configuration ######## ## Tracing of individual classes of events/operations can be activated by ## adding "trace " lines below. trace all 0 ## The 'unmountOnDiskFail' keyword controls how the daemon will respond when ## a disk failure is detected. ## ## When it is set to "no", the daemon will mark the disk as failed and ## continue as long as it can without using the disk. All nodes that are ## using this disk will be notified of the disk failure. The disk can be ## made active again by using the "mmchdisk" command. This is the ## suggested setting when metadata and data replication is used because ## the replica can be used until the disk can be brought online again. ## ## When this is set to "yes", any disk failure will cause only the local ## node to panic (force-unmount) the filesystem that contains that disk. ## Other filesystems on this node and other nodes will continue to function ## normally (if they can. The local node can try and remount the filesystem ## when the disk problem has been resolved. This is the suggested setting ## when using VSD disks in large multinode configurations and replication is ## not being used. ## ## When it is set to "meta", the daemon will mark the disk as failed and ## continue as long as it can without using the disk. All nodes that are ## using this disk will be notified of the disk failure. The disk can be ## made active again by using the "mmchdisk" command. This is the ## suggested setting when metadata replication is used and there are lots of ## dataOnly disks because the replica can be used until the disk can be ## brought online again. The filesystem will remain mounted over dataOnly disk ## failures, at the expense of user applications getting EIO errors when ## trying to use disks that have been marked down. #unmountOnDiskFail no ## The 'dataStructDump' keyword controls whether mmfs will produce a ## formatted dump of its internal data structures into a file named ## internaldump..signal whenever it aborts. ## The following entry can either be a directory name in which the file ## will reside, or otherwise a boolean value. When given a positive ## boolean value the directory defaults to /tmp/mmfs. #dataStructureDump yes ######## Node Override Configuration ######## ## ## In a multi-node configuration, it may be desirable to configure some ## nodes differently than others. This can be accomplished by placing ## separate, potentially different, copies of the mmfs.cfg file on each ## node. However, since maintaining separate copies of the configuration ## file on each node will likely be more difficult and error prone, ## the same effect can be achieved via node overrides wherein a single ## mmfs.cfg file is replicated on every node. ## ## A node override is introduced by a line containing a node name or list ## of node names in square brackets. All parameter specifications ## that follow will apply only to the listed nodes. A "[common]" line ## ends a section of node overrides. For example the following fragment: ## ## pagepool 30M ## ## [tiger5,tiger6] ## pagepool 10M ## ## [tiger9] ## pagepool 64M ## ## [common] ## maxFilesToCache 200 ## ## configures the page pool on most nodes as 30 megabytes. However, ## on tiger5 and tiger6 the page pool is configured with a smaller ## page pool of 10 megabytes. On tiger9 the page pool is configured ## with a larger page pool of 64 megabytes. Lines after the "[common]" line ## again apply to all nodes, i.e. every node will have a maxFilesToCache ## of 200.