Extract from zip and tar.gz

Hello
First, I needed the DetectFileFormat tool only, but now, extraction of zip and tar.gz is also needed.
To get started, please kindly give me the samples to fully extract the zip and tar.gz archives maintaining the original folder hierarchy.
Thank you :slight_smile:

Hello @australian.dev.nerds,
Use following sample to extract tar.gz archive:

using (FileStream gzipFile = File.Open("sample.tar.gz"), FileMode.Open))
{
    using (TarArchive tararchive = TarArchive.FromGZip(gzipFile))
    {
         tararchive.ExtractToDirectory("out");
    }
}

TarArchive.FromGZip method is concise but under the hood it fully extract gzip archive into memory, so beware of memory consumption.
If memory consumption is a concern than you have to extract gzip to filesystem first: extract gzip. Then extract tar.

Zip extraction sample can be found here. ExtractToDirectory method keeps the folder hierarchy.

1 Like

Hello
Thanks, 2 questions if you don’t mind:

1, I have a zip file with one file in its root, named “Categories.xml”
Just wanna extract that single file into a MemoryStream, without extracting or reading whole Zip file
Any sample?

  1. If needed to read just file / folder names (hierarchy) of a zip file to show in a Tree control very fast, possible?
  1. If your zip has single entry, extract it to a memory stream using following code sample:
MemoryStream memStream = new MemoryStream();
using (FileStream zipFile = File.Open("archive.zip", FileMode.Open))
{
    using (Archive archive = new Archive(zipFile))
    {
        archive.Entries[0].Extract(memStream);
    }
}

1 Like

Thanks, yep I’ve seen that sample on docs, but not sure if it’s right?
Entries[0].Extract will save the 1st item of archive, how do we know which file in root is that?
Should be able to pass the file name, and perhaps specify it is in the root of archive!
Am I right?

  1. You can access file and folder names using Name property. The name of entry contains full path to it respecting hierarchy, e. g. root\subfolder\file.txt Based on those names compose a tree-like structure. For clarity run this sample
using (Archive a = new Archive("hierarchy.zip"))
{
    foreach (ArchiveEntry e in a.Entries)
       Console.WriteLine(e.Name);                 
}

Entries collection contains files in that order in which they stored in zip container. Zip format supports the order. We have no method for seeking archive entry by name.
You can use LINQ to find entry by name like this:

ArchiveEntry e = archive.Entries.FirstOrDefault(e => string.Compare(e.Name, "desired.txt", StringComparison.CurrentCultureIgnoreCase) == 0);