How to Upload Gutenberg.zip to Gcp Bucket

Cover image for Uploading and Downloading Zip Files In GCP Cloud Storage Using Python

Jake Witcher

Uploading and Downloading Zip Files In GCP Cloud Storage Using Python

GCP (Google Cloud Platform) cloud storage is the object storage service provided by Google for storing many information formats from PNG files to zipped source code for web apps and deject functions. The information is stored in a apartment, key/value-similar data construction where the key is your storage object'south name and the value is your data.

Object storage is great for storing massive amounts of data equally a single entity, information that will later be accessed all at once every bit opposed to data that will be read and written in minor subsets as is the case with relational and non-relational databases.

If you're looking to store a collection of files as a unmarried unit of measurement, either to annal a large number of log files for time to come audits or to bundle and store lawmaking as a part of an automated deployment cycle, it'due south likely you will do so by packing all of it together as a zippo file.

Using an awarding to automate the process of creating, altering, or unzipping a cipher file in memory is a useful skill to have all the same working with memory streams and bytes rather than integers, strings, and objects can be daunting when it is unfamiliar territory.

Whether you lot are specifically looking to upload and download zip files to GCP deject storage or you simply have an interest in learning how to work with zip files in memory, this mail service will walk you through the process of creating a new zip file from files on your local machine and uploading them to cloud storage besides as downloading an existing zip file in cloud storage and unzipping it to a local directory.

Establishing Credentials

Earlier you can brainstorm uploading and downloading local files to cloud storage every bit zilch files, you will need to create the client object used in your Python lawmaking to communicate with your projection's cloud storage resources in GCP.

There are diverse ways to found credentials that will grant the client object access to a cloud storage bucket, the nearly common of which is to create a service business relationship and assign it to your application in ane of two ways.

The starting time pick is to assign the service account to a particular resources upon deployment. For example, if your code is existence deployed as a GCP cloud function, you would attach the service account to the application upon deployment using either the gcloud sdk:

                          # using powershell and the gcloud sdk to deploy a python cloud function                                          gcloud                                          functions                                          deploy                                          my-deject-function                                          `                            --entry-betoken                                          my_function_name                                          `                            --runtime                                          python38                                          `                            --service-account                                          my-deject-function              @              my-projection-id.iam.gserviceaccount.com                                          `                            --trigger-http                                                  

Enter fullscreen way Exit fullscreen mode

Or by using an IAC (infrastructure every bit code) solution like Terraform:

                          resource              "google_service_account"              "my_cloud_func_sa"              {              account_id              =              "my-cloud-role"              display_name              =              "Cloud Function Service Account"              }              resources              "google_project_iam_binding"              "cloud_storage_user"              {              project              =              "my-project-id"              role              =              "roles/storage.objectAdmin"              members              =              [              "serviceAccount:              ${              google_service_account              .              my_cloud_func_sa              .              email              }              "              ,              ]              }              resource              "google_cloud_functions_function"              "my_cloud_func"              {              proper name              =              "my-cloud-function"              entry_point              =              "my_function_name"              runtime              =              "python38"              service_account_email              =              google_service_account              .              my_cloud_func_sa              .              email              trigger_http              =              truthful              }                      

Enter fullscreen mode Go out fullscreen way

Note that the service account as defined in Terraform is likewise being referenced in a google_project_iam_binding resource equally a member that will be assigned the function of storage.objectAdmin. Yous volition need to assign a similar function (or ideally ane with the minimal permissions required for your code to perform its tasks) if y'all choose to create a service account using the GCP console.

For code being deployed with an assigned service account, creating the GCP cloud storage client in Python requires only the project id be passed equally an statement to the client constructor.

                          from              google.cloud              import              storage              client              =              storage              .              Client              (              project              =              GCP_PROJECT_ID              ,              )                      

Enter fullscreen style Exit fullscreen mode

However if you would similar to upload and download to cloud storage using a CLI application or to examination your cloud office before deploying it, you will want to utilise a locally stored JSON credentials file.

To create the file, open the GCP panel and select IAM & Admin from the Navigation card, accessed through the hamburger carte du jour icon in the meridian left corner.

From the IAM & Admin menu, select the Service Accounts page and either create a new service account or click on the link of an existing one, institute under the Email cavalcade of the service accounts table.

At the bottom of the Details page for the selected service account, click Add Key > Create New Key and select the JSON pick.

This volition download the JSON credentials file to your machine.

Anyone with access to this file will have the credentials necessary to make changes to your cloud resources according to the permissions of this service account. Store information technology in a secure place and do not check this file into source command. If you practice, immediately delete the cardinal from the same card used to create it and remove the JSON file from source control.

To allow your client object to use these credentials and access GCP cloud storage, initializing the client volition require a few actress steps. You volition need to create a credentials object using the from_service_account_file method on the service_account.Credentials class of the google.oauth2 library. The only required argument for this method is the absolute or relative file path to your JSON credentials file.

This credentials object will be passed as a second argument to the storage.Customer form constructor.

                          from              google.cloud              import              storage              from              google.oauth2              import              service_account              credentials              =              service_account              .              Credentials              .              from_service_account_file              (              SERVICE_ACCOUNT_FILE              )              client              =              storage              .              Client              (              project              =              GCP_PROJECT_ID              ,              credentials              =              credentials              )                      

Enter fullscreen manner Exit fullscreen mode

Uploading Local Files to Cloud Storage as a Cypher File

Now that your client object has the required permissions to access deject storage you lot tin begin uploading local files as a zip file.

Assuming that the files yous intend to upload are all in the same directory and are not already zipped, you will upload the files to GCP deject storage as a goose egg file past creating a nothing archive in memory and uploading it as bytes.

                          from              google.cloud              import              storage              from              zipfile              import              ZipFile              ,              ZipInfo              def              upload              ():              source_dir              =              pathlib              .              Path              (              SOURCE_DIRECTORY              )              archive              =              io              .              BytesIO              ()              with              ZipFile              (              annal              ,              'w'              )              equally              zip_archive              :              for              file_path              in              source_dir              .              iterdir              ():              with              open              (              file_path              ,              'r'              )              as              file              :              zip_entry_name              =              file_path              .              proper noun              zip_file              =              ZipInfo              (              zip_entry_name              )              zip_archive              .              writestr              (              zip_file              ,              file              .              read              ())              archive              .              seek              (              0              )              object_name              =              'super-of import-data-v1'              bucket              =              client              .              saucepan              (              BUCKET_NAME              )              blob              =              storage              .              Blob              (              object_name              ,              bucket              )              blob              .              upload_from_file              (              archive              ,              content_type              =              'awarding/nada'              )                      

Enter fullscreen way Get out fullscreen mode

io.BytesIO() creates an in memory binary stream used by the ZipFile object to store all the information from your local files every bit bytes.

The files in the source directory are iterated over and for each one a ZipInfo object is created and written to the ZipFile object along with the contents of the source file. The ZipInfo object corresponds to an private file entry within a zip file and volition exist labeled with whatever file name and extension you apply in the constructor to instantiate the ZipInfo object. Using zip_entry_name = file_path.proper name as in the case higher up will fix the file name and extension in the cypher file to friction match the proper name and extension of the local file.

The in memory binary stream (the archive variable) is what y'all volition be uploading to GCP cloud storage, notwithstanding a prerequisite for uploading an in memory stream is that the stream position be set to the beginning of the stream. Without moving the position of the stream back to nada with archive.seek(0) you will become an error from the Google API when you try to upload the data.

With the in memory binary stream ready to exist delivered, the remaining lines of code create a new Bucket object for the specified bucket and a Hulk object for the storage object. The zipped files are and then uploaded to deject storage and tin can later retrieved using the storage object name y'all used to create the Blob case.

A bucket in cloud storage is a user divers partition for the logical separation of data and a hulk (as the Python class is called) is another name for a storage object.

Downloading a Null File Blob in Deject Storage to a Local Directory

To download a zip file storage object and unzip it into a local directory, you lot will need to opposite the process by first creating a bucket object and a hulk object in order to download the zero file as bytes.

                          def              download              ():              target_dir              =              pathlib              .              Path              (              TARGET_DIRECTORY              )              object_name              =              'super-important-data-v1'              bucket              =              client              .              bucket              (              BUCKET_NAME              )              blob              =              storage              .              Blob              (              object_name              ,              bucket              )              object_bytes              =              blob              .              download_as_bytes              ()              archive              =              io              .              BytesIO              ()              archive              .              write              (              object_bytes              )              with              ZipFile              (              annal              ,              'w'              )              every bit              zip_archive              :              zip_archive              .              extractall              (              target_dir              )                      

Enter fullscreen mode Exit fullscreen manner

Once downloaded, the bytes can be written to an in memory stream which will in turn be used to create a ZipFile object in order to extract the files to your target directory. io.BytesIO() is again used to create the in memory binary stream and the write method on the BytesIO object is used to write the downloaded bytes to the stream. The ZipFile object has a method for extracting all of its contents to a specified directory, making the concluding step a simple one.

With these two functions and the appropriate credentials you lot should take everything you need to beginning uploading and downloading your ain zip files into cloud storage using Python.

And if you lot'd like to see all the Python code in one place, you can detect information technology hither as a Gist on my Github account.

favorsinfic1987.blogspot.com

Source: https://dev.to/jakewitcher/uploading-and-downloading-zip-files-in-gcp-cloud-storage-using-python-2l1b

0 Response to "How to Upload Gutenberg.zip to Gcp Bucket"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel