How to Upload Gutenberg.zip to Gcp Bucket
GCP (Google Cloud Platform) cloud storage is the object storage service provided by Google for storing many information formats from PNG files to zipped source code for web apps and deject functions. The information is stored in a apartment, key/value-similar data construction where the key is your storage object'south name and the value is your data.
Object storage is great for storing massive amounts of data equally a single entity, information that will later be accessed all at once every bit opposed to data that will be read and written in minor subsets as is the case with relational and non-relational databases.
If you're looking to store a collection of files as a unmarried unit of measurement, either to annal a large number of log files for time to come audits or to bundle and store lawmaking as a part of an automated deployment cycle, it'due south likely you will do so by packing all of it together as a zippo file.
Using an awarding to automate the process of creating, altering, or unzipping a cipher file in memory is a useful skill to have all the same working with memory streams and bytes rather than integers, strings, and objects can be daunting when it is unfamiliar territory.
Whether you lot are specifically looking to upload and download zip files to GCP deject storage or you simply have an interest in learning how to work with zip files in memory, this mail service will walk you through the process of creating a new zip file from files on your local machine and uploading them to cloud storage besides as downloading an existing zip file in cloud storage and unzipping it to a local directory.
Establishing Credentials
Earlier you can brainstorm uploading and downloading local files to cloud storage every bit zilch files, you will need to create the client object used in your Python lawmaking to communicate with your projection's cloud storage resources in GCP.
There are diverse ways to found credentials that will grant the client object access to a cloud storage bucket, the nearly common of which is to create a service business relationship and assign it to your application in ane of two ways.
The starting time pick is to assign the service account to a particular resources upon deployment. For example, if your code is existence deployed as a GCP cloud function, you would attach the service account to the application upon deployment using either the gcloud sdk:
# using powershell and the gcloud sdk to deploy a python cloud function gcloud functions deploy my-deject-function ` --entry-betoken my_function_name ` --runtime python38 ` --service-account my-deject-function @ my-projection-id.iam.gserviceaccount.com ` --trigger-http
Or by using an IAC (infrastructure every bit code) solution like Terraform:
resource "google_service_account" "my_cloud_func_sa" { account_id = "my-cloud-role" display_name = "Cloud Function Service Account" } resources "google_project_iam_binding" "cloud_storage_user" { project = "my-project-id" role = "roles/storage.objectAdmin" members = [ "serviceAccount: ${ google_service_account . my_cloud_func_sa . email } " , ] } resource "google_cloud_functions_function" "my_cloud_func" { proper name = "my-cloud-function" entry_point = "my_function_name" runtime = "python38" service_account_email = google_service_account . my_cloud_func_sa . email trigger_http = truthful }
Note that the service account as defined in Terraform is likewise being referenced in a google_project_iam_binding
resource equally a member that will be assigned the function of storage.objectAdmin
. Yous volition need to assign a similar function (or ideally ane with the minimal permissions required for your code to perform its tasks) if y'all choose to create a service account using the GCP console.
For code being deployed with an assigned service account, creating the GCP cloud storage client in Python requires only the project id be passed equally an statement to the client constructor.
from google.cloud import storage client = storage . Client ( project = GCP_PROJECT_ID , )
However if you would similar to upload and download to cloud storage using a CLI application or to examination your cloud office before deploying it, you will want to utilise a locally stored JSON credentials file.
To create the file, open the GCP panel and select IAM & Admin from the Navigation card, accessed through the hamburger carte du jour icon in the meridian left corner.
From the IAM & Admin menu, select the Service Accounts page and either create a new service account or click on the link of an existing one, institute under the Email cavalcade of the service accounts table.
At the bottom of the Details page for the selected service account, click Add Key > Create New Key and select the JSON pick.
This volition download the JSON credentials file to your machine.
Anyone with access to this file will have the credentials necessary to make changes to your cloud resources according to the permissions of this service account. Store information technology in a secure place and do not check this file into source command. If you practice, immediately delete the cardinal from the same card used to create it and remove the JSON file from source control.
To allow your client object to use these credentials and access GCP cloud storage, initializing the client volition require a few actress steps. You volition need to create a credentials object using the from_service_account_file
method on the service_account.Credentials
class of the google.oauth2
library. The only required argument for this method is the absolute or relative file path to your JSON credentials file.
This credentials object will be passed as a second argument to the storage.Customer
form constructor.
from google.cloud import storage from google.oauth2 import service_account credentials = service_account . Credentials . from_service_account_file ( SERVICE_ACCOUNT_FILE ) client = storage . Client ( project = GCP_PROJECT_ID , credentials = credentials )
Uploading Local Files to Cloud Storage as a Cypher File
Now that your client object has the required permissions to access deject storage you lot tin begin uploading local files as a zip file.
Assuming that the files yous intend to upload are all in the same directory and are not already zipped, you will upload the files to GCP deject storage as a goose egg file past creating a nothing archive in memory and uploading it as bytes.
from google.cloud import storage from zipfile import ZipFile , ZipInfo def upload (): source_dir = pathlib . Path ( SOURCE_DIRECTORY ) archive = io . BytesIO () with ZipFile ( annal , 'w' ) equally zip_archive : for file_path in source_dir . iterdir (): with open ( file_path , 'r' ) as file : zip_entry_name = file_path . proper noun zip_file = ZipInfo ( zip_entry_name ) zip_archive . writestr ( zip_file , file . read ()) archive . seek ( 0 ) object_name = 'super-of import-data-v1' bucket = client . saucepan ( BUCKET_NAME ) blob = storage . Blob ( object_name , bucket ) blob . upload_from_file ( archive , content_type = 'awarding/nada' )
io.BytesIO()
creates an in memory binary stream used by the ZipFile
object to store all the information from your local files every bit bytes.
The files in the source directory are iterated over and for each one a ZipInfo
object is created and written to the ZipFile
object along with the contents of the source file. The ZipInfo
object corresponds to an private file entry within a zip file and volition exist labeled with whatever file name and extension you apply in the constructor to instantiate the ZipInfo
object. Using zip_entry_name = file_path.proper name
as in the case higher up will fix the file name and extension in the cypher file to friction match the proper name and extension of the local file.
The in memory binary stream (the archive
variable) is what y'all volition be uploading to GCP cloud storage, notwithstanding a prerequisite for uploading an in memory stream is that the stream position be set to the beginning of the stream. Without moving the position of the stream back to nada with archive.seek(0)
you will become an error from the Google API when you try to upload the data.
With the in memory binary stream ready to exist delivered, the remaining lines of code create a new Bucket object for the specified bucket and a Hulk object for the storage object. The zipped files are and then uploaded to deject storage and tin can later retrieved using the storage object name y'all used to create the Blob case.
A bucket in cloud storage is a user divers partition for the logical separation of data and a hulk (as the Python class is called) is another name for a storage object.
Downloading a Null File Blob in Deject Storage to a Local Directory
To download a zip file storage object and unzip it into a local directory, you lot will need to opposite the process by first creating a bucket object and a hulk object in order to download the zero file as bytes.
def download (): target_dir = pathlib . Path ( TARGET_DIRECTORY ) object_name = 'super-important-data-v1' bucket = client . bucket ( BUCKET_NAME ) blob = storage . Blob ( object_name , bucket ) object_bytes = blob . download_as_bytes () archive = io . BytesIO () archive . write ( object_bytes ) with ZipFile ( annal , 'w' ) every bit zip_archive : zip_archive . extractall ( target_dir )
Once downloaded, the bytes can be written to an in memory stream which will in turn be used to create a ZipFile
object in order to extract the files to your target directory. io.BytesIO()
is again used to create the in memory binary stream and the write
method on the BytesIO
object is used to write the downloaded bytes to the stream. The ZipFile
object has a method for extracting all of its contents to a specified directory, making the concluding step a simple one.
With these two functions and the appropriate credentials you lot should take everything you need to beginning uploading and downloading your ain zip files into cloud storage using Python.
And if you lot'd like to see all the Python code in one place, you can detect information technology hither as a Gist on my Github account.
Source: https://dev.to/jakewitcher/uploading-and-downloading-zip-files-in-gcp-cloud-storage-using-python-2l1b
0 Response to "How to Upload Gutenberg.zip to Gcp Bucket"
Post a Comment