Resumable uploadsGo to examples Show This page discusses resumable uploads in Cloud Storage. Resumable uploads are the recommended method for uploading large files, because you don't have to restart them from the beginning if there is a network failure while the upload is underway. IntroductionA resumable upload allows you to resume data transfer operations to Cloud Storage after a communication failure has interrupted the flow of data. Resumable uploads work by sending multiple requests, each of which contains a portion of the object you're uploading. This is different from a single-request upload, which contains all of the object's data in a single request and must restart from the beginning if it fails part way through.
How tools and APIs use resumable uploadsDepending on how you interact with Cloud Storage, resumable uploads might be managed automatically on your behalf. Click a tab in the table below to learn more: ConsoleThe Cloud Console manages resumable uploads automatically on your behalf. However, if you refresh or navigate away from the Cloud Console while an upload is underway, the upload is cancelled. gsutilThe gsutil command-line tool uses resumable uploads in the gsutil cp and gsutil rsync commands when uploading data to Cloud Storage. If your upload is interrupted, you can resume it by running the same command that you used to start the upload. When resuming a gsutil cp upload that includes multiple files, use the -n flag to prevent re-uploading files that already completed successfully. You can set a minimum size for performing resumable uploads with the resumable_threshold parameter in the boto configuration file. The default value for resumable_threshold is 8 MiB. Client librariesC++You can toggle the use of resumable uploads as part of the WriteObject method. C#You can initiate a resumable upload with CreateObjectUploader. GoBy default, resumable uploads occur automatically when the file is larger than 16 MiB. You change the cutoff for performing resumable uploads with Writer.ChunkSize. Resumable uploads are always chunked when using the Go client library. JavaResumable uploads are controlled through the writer method. You can alternatively use the createFrom method to specify a cutoff size for performing resumable uploads. Node.jsResumable uploads are automatically managed when using the createWriteStream method. PHPResumable uploads are automatically managed on your behalf, but can be directly controlled using the resumable option. PythonResumable uploads occur automatically when the file is larger than 8 MiB. Alternatively, you can use Resumable Media to manage resumable uploads on your own. RubyAll uploads are treated as resumable uploads. REST APIsJSON APIThe Cloud Storage JSON API uses a POST Object request that includes the query parameter uploadType=resumable to initiate the resumable upload. This request returns as session URI that you then use in one or more PUT Object requests to upload the object data. For a step-by-step guide to building your own logic for resumable uploading, see Performing resumable uploads. XML APIThe Cloud Storage XML API uses a POST Object request that includes the header x-goog-resumable: start to initiate the resumable upload. This request returns as session URI that you then use in one or more PUT Object requests to upload the object data. For a step-by-step guide to building your own logic for resumable uploading, see Performing resumable uploads. Resumable uploads of unknown sizeThe resumable upload mechanism supports transfers where the file size is not known in advance. This can be useful for cases like compressing an object on-the-fly while uploading, since it's difficult to predict the exact file size for the compressed file at the start of a transfer. The mechanism is useful either if you want to stream a transfer that can be resumed after being interrupted, or if chunked transfer encoding does not work for your application. For more information, see Streaming transfers. ConsiderationsThis section is useful if you are building your own client that sends resumable upload requests directly to the JSON or XML API. Session URIsWhen you initiate a resumable upload, Cloud Storage returns a session URI, which you use in subsequent requests to upload the actual data. An example of a session URI in the JSON API is: https://storage.googleapis.com/upload/storage/v1/b/my-bucket/o?uploadType=resumable&name=my-file.jpg&upload_id=ABg5-UxlRQU75tqTINorGYDgM69mX06CzKO1NRFIMOiuTsu_mVsl3E-3uSVz65l65GYuyBuTPWWICWkinL1FWcbvvOAAn example of a session URI in the XML API is: https://storage.googleapis.com/my-bucket/my-file.jpg?upload_id=ABg5-UxlRQU75tqTINorGYDgM69mX06CzKO1NRFIMOiuTsu_mVsl3E-3uSVz65l65GYuyBuTPWWICWkinL1FWcbvvOAThis session URI acts as an authentication token, so the requests that use it don't need to be signed and can be used by anyone to upload data to the target bucket without any further authentication. Because of this, be judicious in sharing the session URI and only share it over HTTPS. A session URI expires after one week but can be cancelled prior to expiring. If you make a request using a session URI that is no longer valid, you receive one of the following errors:
In both cases, you have to initiate a new resumable upload, obtain a new session URI, and start the upload from the beginning using the new session URI. Upload performanceResumable uploads are pinned in the region where you initiate them. For example, if you initiate a resumable upload in the US and give the session URI to a client in Asia, the upload still goes through the US. Continuing a resumable upload in a region where it wasn't initiated can cause slow uploads. If you use a Compute Engine instance to initiate a resumable upload, the instance should be in the same location as the Cloud Storage bucket you upload to. You can then use a geo IP service to pick the Compute Engine region to which you route customer requests, which helps keep traffic localized to a geo-region. Integrity checksWe recommend that you request an integrity check of the final uploaded object to be sure that it matches the source file. You can do this by calculating the MD5 digest of the source file and adding it to the Content-MD5 request header. Checking the integrity of the uploaded file is particularly important if you are uploading a large file over a long period of time, because there is an increased likelihood of the source file being modified over the course of the upload operation. Resent dataOnce Cloud Storage persists bytes in a resumable upload, those bytes cannot be overwriten, and Cloud Storage ignores attempts to do so. Because of this, you should not send different data when rewinding to an offset that you sent previously. For example, say you're uploading a 100,000 byte object, and your connection is interrupted. When you check the status, you find that 50,000 bytes were successfully uploaded and persisted. If you attempt to restart the upload at byte 40,000, Cloud Storage ignores the bytes you send from 40,000 to 50,000. Cloud Storage begins persisting the data you send at byte 50,001. What's next
|