Issue
I have a number of .csv files of tabular data stored in different folders of a Cloud Storage bucket that have been imported from an external data source. Every day, a new file is imported into each folder of the Cloud Storage bucket. Each file contains a whitespace (" ") in the filename with the ".csv" extension. I have written a Cloud Function to copy every existing file from this source bucket to a newly created cleaned bucket and modify the filename by replacing the space " " character with a dash "-" character. Is there a way to implement that the Cloud Function only does this to the new file being uploaded using Cloud Functions and Pub/Sub instead of the approach of doing a manual scan of which files are in both buckets? Essentially what I would like to do is to send and access the filename and file metadata in the Pub/Sub event, but I am not aware of how to send and access this data in the Pub/Sub event.
Thanks in advance!
Solution
This Answer by Marc Anthony B explains renaming the filename by removing square brackets []. You can follow the same to remove white space and replace with underscore by changing the regex pattern like below.
The code will basically follow these 3 steps
- List the objects that you want to rename.
- Iterate that list.
- For each object, change the name. The files aren´t renamed in the backend. It performs a copy followed by a delete for each object you're renaming.
import re
from google.cloud import storage
storage_client = storage.Client()
bucket_name = "my_bucket"
bucket = storage_client.bucket(bucket_name)
storage_client = storage.Client()
blobs = storage_client.list_blobs(bucket_name)
pattern = r"\s" # regex for detecting whitespace
for blob in blobs:
if re.match(pattern, blob.name):
fixed_var = re.sub(pattern, "_", blob.name)
new_blob = bucket.rename_blob(blob, fixed_var)
print("Changed")
print("No change required")
You can also use the gsutil mv
command to rename all objects with a given prefix to have a new prefix.you can refer this document for more information
gsutil mv gs://my_bucket/oldprefix gs://my_bucket/newprefix
Answered By - Sathi Aiswarya Answer Checked By - Marie Seifert (WPSolving Admin)