Background


In my previous blog post, "How to List All Files in a Public Azure Storage Container", I demonstrated how to use the Azure REST API to retrieve a comprehensive list of all file information in a public Azure Storage container without requiring any keys or authentication. To recap briefly, by appending ?restype=container&comp=list to the end of the container URL, you can obtain an XML document containing all the blob objects. Leveraging this method, you can easily create scripts to download all files from a public Azure Storage container without the need to install any additional tools, such as Azure Storage Explorer.

Solution


In this example storage container, I have prepared some files including a folder named img. My goal is to download all files from this container to my local computer and keep the folder structure.

The public URL of this container is: https://work996.blob.core.windows.net/graduate35, based on my previous post, the REST API to list all files for this container is: https://work996.blob.core.windows.net/graduate35?restype=container&comp=list

REST API Response:

<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ContainerName="https://work996.blob.core.windows.net/graduate35">
    <Blobs>
        <!-- ... -->
        <Blob>
            <Name>hello.txt</Name>
            <Url>https://work996.blob.core.windows.net/graduate35/hello.txt</Url>
            <Properties>
                <Last-Modified>Wed, 25 Sep 2024 04:27:14 GMT</Last-Modified>
                <Etag>0x8DCDD1A55023903</Etag>
                <Content-Length>1</Content-Length>
                <Content-Type>text/plain</Content-Type>
                <Content-Encoding />
                <Content-Language />
                <Content-MD5>DMF1ucDxtqgxw5niaXcmYQ==</Content-MD5>
                <Cache-Control />
                <BlobType>BlockBlob</BlobType>
                <LeaseStatus>unlocked</LeaseStatus>
            </Properties>
        </Blob>
        <Blob>
            <Name>img/_005b7748-3cb3-49b5-a074-d53d0a77cf24.jpg</Name>
            <Url>https://work996.blob.core.windows.net/graduate35/img/_005b7748-3cb3-49b5-a074-d53d0a77cf24.jpg</Url>
            <Properties>
                <Last-Modified>Wed, 25 Sep 2024 04:27:14 GMT</Last-Modified>
                <Etag>0x8DCDD1A5547EA61</Etag>
                <Content-Length>154148</Content-Length>
                <Content-Type>image/jpeg</Content-Type>
                <Content-Encoding />
                <Content-Language />
                <Content-MD5>Zz24tO6v7beQfDJOrULpaw==</Content-MD5>
                <Cache-Control />
                <BlobType>BlockBlob</BlobType>
                <LeaseStatus>unlocked</LeaseStatus>
            </Properties>
        </Blob>
        <!-- ... -->
    </Blobs>
    <NextMarker />
</EnumerationResults>

Now, I can use any script language I like to dump all the files to my computer.

PowerShell

$containerUrl = Read-Host "Please enter the public container URL of Azure Blob Storage" 
#"https://work996.blob.core.windows.net/graduate35"

# Extract the container name from the URL
$containerName = ($containerUrl -split "/")[-1]

# Create a local folder
$localFolder = ".\$containerName"
if (-Not (Test-Path -Path $localFolder)) {
    New-Item -ItemType Directory -Path $localFolder
}

$restApiUrl = "$($containerUrl)?restype=container&comp=list"
Write-Host "REST API URL: $restApiUrl"

try {
    # Retrieve information of all files under the container
    Invoke-RestMethod -Uri $restApiUrl -OutFile .\$containerName.xml
    [xml]$xmlContent = Get-Content -Path .\$containerName.xml
}
catch {
    Write-Error "Failed to fetch container information. Error: $_"
    exit 1
}

foreach ($blob in $xmlContent.EnumerationResults.Blobs.Blob) {
    $blobUrl = $blob.Url
    $blobName = $blob.Name

    Write-Host "Processing Blob: $blobName"

    $localFilePath = Join-Path $localFolder $blobName

    $localFileDir = Split-Path $localFilePath -Parent
    if (-Not (Test-Path -Path $localFileDir)) {
        New-Item -ItemType Directory -Path $localFileDir
    }

    try {
        Invoke-WebRequest -Uri $blobUrl -OutFile $localFilePath
        Write-Host "Downloaded: $blobUrl to $localFilePath"
    }
    catch {
        Write-Error "Failed to download $blobUrl. Error: $_"
    }
}

Write-Host "All files have been downloaded to the $localFolder folder."

Please notice, in this script, I am downloading the xml response to a file on local disk. This is to workaround an issue where I failed to find a solution to let PowerShell resolve the XML from HTTP response directly. If you are a PowerShell guy, you may fix this ugly workaround :)

Python

Thanks Azure Open AI GPT-4o to translate the PowerShell script to Python.

import os
import requests
import xml.etree.ElementTree as ET

container_url = input("Please enter the public container URL of Azure Blob Storage: ")

container_name = container_url.rstrip('/').split('/')[-1]

local_folder = f"./{container_name}"
if not os.path.exists(local_folder):
    os.makedirs(local_folder)

rest_api_url = f"{container_url}?restype=container&comp=list"
print(f"REST API URL: {rest_api_url}")

try:
    response = requests.get(rest_api_url)
    response.raise_for_status()
    xml_content = response.content
except requests.RequestException as e:
    print(f"Failed to fetch container information. Error: {e}")
    exit(1)

root = ET.fromstring(xml_content)

for blob in root.findall('.//Blob'):
    blob_url = blob.find('Url').text
    blob_name = blob.find('Name').text

    print(f"Processing Blob: {blob_name}")

    local_file_path = os.path.join(local_folder, blob_name)

    local_file_dir = os.path.dirname(local_file_path)
    if not os.path.exists(local_file_dir):
        os.makedirs(local_file_dir)

    try:
        blob_response = requests.get(blob_url)
        blob_response.raise_for_status()
        with open(local_file_path, 'wb') as file:
            file.write(blob_response.content)
        print(f"Downloaded: {blob_url} to {local_file_path}")
    except requests.RequestException as e:
        print(f"Failed to download {blob_url}. Error: {e}")

print(f"All files have been downloaded to the {local_folder} folder.")