Streaming Reads with Python and Google Cloud Storage
In data processing, efficiency and reliability are paramount. As a data engineer, you’ll often need to read files in resource constrained environments. One common approach to reading a file is to stream the file and process it in smaller chunks. I recently came across a way to accomplish this using Google Cloud Storage (GCS), Python, and a CRC32C checksum (to verify the file’s integrity). Some reasons why this approach could be useful and why this post exists: