| 1 | = 2010-07-26 = |
| 2 | |
| 3 | == File System : Data Deduplication == |
| 4 | |
| 5 | * 今天看到戴爾(Dell)併購 Ocarina Networks 好獲得刪除重複資料功能,又重新搜尋了一下目前自由軟體可以做到 Deduplication 的解決方案。目前看起來有三種:(1) ZFS (2) [http://www.lessfs.com/wordpress/ lessfs] (3) [http://www.opendedup.org/ SDFS] |
| 6 | * [http://www.digitimes.com.tw/tw/dt/n/shwnws.asp?Cnlid=4&cat=400&cat1=10&cat1=&id=0000192291_E963XHUF8IVGE92T9CQPV 硬體龍頭戴爾極力發展軟實力] |
| 7 | {{{ |
| 8 | 19日(2010-07-19)戴爾宣布購併儲存軟體業者 Ocarina Networks,該公司旗下軟體以壓縮、刪除重複資料功能見長。 |
| 9 | 所以透過Ocarina Networks 的軟體,便可達到儲存效率最佳化,有效降低硬體、能源等各方面相關成本。 |
| 10 | }}} |
| 11 | |
| 12 | * [http://punetech.com/understanding-data-de-duplication/ 關於 Data Deduplication 的定義與分類] |
| 13 | * 定義:[http://en.wikipedia.org/wiki/Data_deduplication 維基百科] |
| 14 | {{{ |
| 15 | Data deduplication or Single Instancing essentially refers to the elimination of redundant data. |
| 16 | In the deduplication process, duplicate data is deleted, leaving only one copy (single instance) |
| 17 | of the data to be stored. However, indexing of all data is still retained should that data ever |
| 18 | be required. |
| 19 | }}} |
| 20 | * 2009-09-23 : [http://www.linux-mag.com/id/7535 Deduping Storage Deduplication] |
| 21 | * 2007-07-12 : [http://www.backupcentral.com/content/view/58/47/ What is deduplication? (updated 6-08)] |
| 22 | * 分類: |
| 23 | 1. Point of Application – Source Vs Target |
| 24 | 2. Time of Application – Inline vs Post-Process |
| 25 | 3. Granularity – File vs Sub-File level |
| 26 | 4. Algorithm – Fixed size blocks Vs Variable length data segments |
| 27 | * [[Image(http://blog.druva.com/wp-content/uploads/2009/01/dedup-tree.jpg)]] |
| 28 | * '''Target based Deduplication'' vs '''Source based Deduplication'' |
| 29 | * [[Image(http://blog.druva.com/wp-content/uploads/2009/01/target-source-dedup.jpg)]] |
| 30 | * 2007-07-30 : [http://www.backupcentral.com/content/view/129/47/ Two different types of de-duplication] |
| 31 | * 2007-07-31 : [http://www.backupcentral.com/content/view/130/47/ De-duplication & remote restores] |
| 32 | * '''Inline-process Deduplication''' vs '''Post-process Deduplication''' |
| 33 | * [[Image(http://blog.druva.com/wp-content/uploads/2009/01/inline-post-dedup.jpg)]] |
| 34 | * '''File Level Deduplication''' vs '''Sub-file Level Deduplication''' |
| 35 | * '''Fixed-Length Blocks''' vs '''Variable-Length Data Segments''' |
| 36 | * [[Image(http://blog.druva.com/wp-content/uploads/2009/01/file-bocks.jpg)]]] |
| 37 | |
| 38 | * 論文: FAST'08 - [http://www.usenix.org/events/fast08/tech/full_papers/zhu/zhu_html/index.html Avoiding the Disk Bottleneck in the Data Domain Deduplication File System] |
| 39 | * [http://www.snia.org/education/tutorials/2009/spring/data-management/ SNIA Data Protection and Management 2009] - [http://www.snia.org/education/tutorials/2009/spring/data-management/DanielBudiansky_Understanding_Data_Deduplication.pdf Understanding Data Deduplication] - Daniel Budiansky, Larry Freeman |
| 40 | |
| 41 | * 2010-05-13 : [http://www.enterprisestorageforum.com/continuity/article.php/11568_3882106_2/Open-Source-Deduplication-Ready-for-Enterprises.htm Open Source Deduplication: Ready for Enterprises?] |
| 42 | * 這裡提到一個 [http://www.baculasystems.com/eng Bacula System] 也要 Open Source,自由軟體版本必須到 http://www.bacula.org/ 才找得到。 |
| 43 | * [http://www.zmanda.com/images/logo-index-main.png Zmanda] Zmanda 這間公司倒是超常看到,不管是 Linux World 或者最近在找 MySQL 備份,都可以看到它的蹤影。從公司網站上看起來,有分成(1) 基於[http://www.amanda.org/ 自由軟體 Amanda] 的網路備份、(2) MySQL 備份跟(3) 雲端備份。 |
| 44 | * 至於 [http://www.nexenta.com/corp/nexentastor-overview/nexentastor-releases/nexentastor-30 Nexenta Systems] 是基於 ZFS 來作 inline deduplication |
| 45 | |
| 46 | * [http://www.opendedup.org/ Opendedup 的 SDFS] - 看起來是 2010 年初才變成自由軟體 - 授權是 GPLv2. - [http://code.google.com/p/opendedup/ Google Code 專案網站] |
| 47 | * 2010-03-25 : [http://ostatic.com/blog/sdfs-a-robust-deduplication-file-system-for-linux SDFS: A Robust Deduplication File System for Linux] |
| 48 | * 2010-03-25 : [http://www.cio.com.au/article/340870/open_source_deduplication_software_released_linux/ Open source deduplication software released for Linux] |
| 49 | * [[Image(http://opendedup.googlecode.com/files/Screenshot-1.png)]] |
| 50 | |
| 51 | * 2010-02-22 : [http://searchstorage.techtarget.com.au/articles/38919-Two-open-source-data-deduplication-tools Two open-source data deduplication tools] |
| 52 | * 這裡除了介紹 ZFS 以外,還介紹了一套叫做 [http://backuppc.sourceforge.net/ backuppc] |
| 53 | * 還有人幫 backuppc 作了一個虛擬機器版本 - [http://gotitsolutions.org/2007/01/15/open-source-backup-and-data-de-duplication-virtual-appliance-2.html Open source backup and data de-duplication virtual appliance] |
| 54 | |
| 55 | * [http://code.google.com/p/ostor/ OStor] - [http://ostor.sourceforge.net/ 舊的 SourceForge 網站] |
| 56 | * 2009-11-01 : [http://ppraveen.wordpress.com/2009/11/01/introducing-ostor-data-deduplication-in-the-cloud-open-source-project/ Introducing OStor – data deduplication in the cloud. Open source project.] |
| 57 | * 2009-11-01 : [http://dedup.wordpress.com/2009/11/01/ostor-data-deduplication-in-the-cloud-howto/ OStor – data deduplication in the cloud – HowTo] |
| 58 | |
| 59 | * [cloud:wiki:jazz/10-05-20 2010-05-20] 至中興大學演講,會後與另一場關於虛擬化的講者,麟瑞科技陳中欣先生聊到 NetApp 的 Deduplication 技術是 Block-level,也因此針對虛擬機器的 Disk 可以達到 deduplication 的目的。 |
| 60 | * [[Image(http://trac.nchc.org.tw/cloud/raw-attachment/wiki/jazz/10-05-20/10-05-30_NetApp_Block-level-deduplication.png)]] |
| 61 | |
| 62 | * [cloud:wiki:jazz/10-03-03 2010-03-03] 邀請 Sun 來演講 ZFS, 發現原來 ZFS 也有 deduplication 的特性呢!!真好!! |
| 63 | * 2009-12-03 : [http://hub.opensolaris.org/bin/view/Community+Group+zfs/dedup ZFS Deduplication Frequently Asked Questions (FAQ)] |
| 64 | * 2009-11-03 : [http://www.h-online.com/open/news/item/ZFS-with-data-deduplication-848638.html ZFS with data deduplication] |
| 65 | * 2009-11-02 : [http://blogs.sun.com/bonwick/entry/zfs_dedup ZFS Deduplication] |
| 66 | |
| 67 | * [wiki:jazz/09-02-11#A-SISCOW 2009-02-11 : 關於 A-SIS & COW] |
| 68 | * [http://en.wikipedia.org/wiki/NTFS#Single_Instance_Storage_.28SIS.29 Single Instance Storage (SIS)] 是 NTFS 的特點 |
| 69 | * SIS 的特性跟 [http://en.wikipedia.org/wiki/Copy-on-write Copy-on-Write (COW)] 很相似,而 COW 最被廣泛應用的地方就是 QEMU 這些虛擬化技術所用的檔案系統了。 |
| 70 | * [http://blog.scottlowe.org/2007/09/21/nfs-for-vmware-storage/ 不少人在談論在 NFS 上跑 VMWare] |
| 71 | * 而 ITHome 也報導過 [http://www.ithome.com.tw/itadm/article.php?c=45228&s=11 NetApp A-SIS] 可以幫忙 [http://www.ithome.com.tw/itadm/article.php?c=52322 活用儲存虛擬化:以少量的實體空間,因應龐大的資料存取需求] |
| 72 | * [http://media.netapp.com/documents/tr-3505.pdf 這裡]有 !NetApp 官方的 A-SIS 的介紹。 |
| 73 | * [http://www.redbooks.ibm.com/redpapers/pdfs/redp4320.pdf IBM RedBook] 介紹了 IBM 自家的儲存系統也支援 A-SIS,這張圖非常經典地說明了如何在 Block-level 做 deduplication。 |
| 74 | * [[Image(wiki:jazz/09-02-11:SIS_Deduplication.jpg)]] |
| 75 | * 目前有實作 Deduplication 概念的檔案系統並不多。或許可以進一步了解一下 A-SIS 是怎麼做 Block-level 的 Deduplication,這樣或許可以強化 DRBL AOE Windows 跟 Xen 虛擬化所需的硬碟空間。 |
| 76 | |
| 77 | * 先前一直有在注意 virtualization 所帶來的資料重疊問題,!NetApp 在這方面就很厲害,可以從 File System 下手,把重複的檔案進行濃縮(deduplication)。今天剛好看到 Linux Magazine 的文章「[http://www.linux-mag.com/id/7535 Deduping Storage Depulication]」,裡面有提到目前許多商業解決方案,但自由軟體呢?目前似乎只有用 FUSE 寫的 [http://sourceforge.net/projects/lessfs/ lessfs],它的官方網站 http://www.lessfs.com/ 目前並沒有太多資料,希望未來會有更多這類的檔案系統出現。我第一個想到的問題是在 loop device image 裡重複的檔案,該怎麼進行 deduplication 呢?? 同樣的 vmdk 這一些虛擬化的硬碟,有辦法作 deduplication 嘛?? ([wiki:jazz/09-09-23#FileSystem:lessfs:deduplication 2009-09-23]) |