Skip to content

Colab Pro stuck and disconnected every time when downloading dataset from Huggingface #5947

@Charley-xiao

Description

@Charley-xiao

When downloading datasets, Colab Pro always gets stuck and, after 5 minutes, disconnects

This has wasted a lot of my compute units and I cannot understand the cause of that.

Example:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="bezzam/DiffuserCam-Lensless-Mirflickr-Dataset-NORM", 
    repo_type="dataset", 
    max_workers=1,  # Critical for stability
    resume_download=True
)

gives

/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py:190: UserWarning: The `resume_download` argument is deprecated and ignored in `snapshot_download`. Downloads always resume whenever possible.
  warnings.warn(
/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
Downloading (incomplete total...):   0%
 2.31k/302M [00:20<7:25:22, 11.3kB/s]
Fetching 22 files:   5%
 1/22 [00:00<00:04,  5.08it/s]
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
WARNING:huggingface_hub.utils._http:Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.

This issue persists even after I set my HF_TOKEN.

Browser: Chrome

Runtime: A100, G4

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions