Are you looking to level up your google sheets game with automation? In the guide below well be exploring how to use google sheets with python Show
Learning Outcomes
Module ImportsIn order to run the module imports below you will need to have installed googles recommended python client: A quick way to do that would be to run the following command and to restart your Jupyter Notebook server: pip install --upgrade google-api-python-clientAdditionally I would encourage you to download the gcloud software development kit (SDK), which will come in useful for deploying and access your Google Cloud Platform resources: You can find more information on how to do that from:
Download PygsheetsWe will be using a python package called pygsheets to easily interact with a new google sheet that we will create. If youre using anaconda you can navigate to your terminal and install the latest version of this package with: conda install pygsheetsAlternatively if youre using pip or pip3 to manage your packages you can type either: pip install pygsheets pip3 install pygsheets !pip install pygsheetsAs you can see from above Ive previously installed pygsheets, after youve installed the python package if youre working from a jupyter notebook, then please restart the Kernel: import pygsheetsUsing a previously obtained local .json file you can authenticate to your google service account. with open('service_account.json') as source: info = json.load(source) credentials = service_account.Credentials.from_service_account_info(info)Create A New Google Sheet + Obtain The Unique IDFirstly we are going to create a new google sheet, then well obtain the id of that specific google sheet which can be found within the URL: Google Sheet Wizardry With Python CommandsYou can successfully authenticate to google sheets with a .json key like so: client = pygsheets.authorize(service_account_file='service_account.json')How To Connect To a Specific Google sheetNow that weve authenticated pygsheets with our google cloud project lets connect to a specific google sheet. In order to do this we will need to grab the URL link and share the spreadsheet. spreadsheet_url = "https://docs.google.com/spreadsheets/d/1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q/edit?usp=sharing"
There are three ways to open the spreadsheet:
Lets showcase all three and you can take your pick # sheet = client.open('Sheet1') ## You will need to activate the Google Drive API as well as the spreadsheet API for this one to work sheet = client.open_by_key('1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q') sheet = client.open_by_url('https://docs.google.com/spreadsheets/d/1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q/edit?usp=sharing')print(sheet)How To Select A Specific Google WorksheetLets select the automatically created worksheet called Sheet1: After uploading a dataframe into a google sheet, you can see how many columns and rows you have with: wks.cols # To view the number of columns wks.rows # To view the number of rows print(f"There are {wks.cols} columns in the gsheet!") print(f"There are {wks.rows} rows in the gsheet!")# There are 19 columns in the gsheet! # There are 1001 rows in the gsheet!How To Upload Data To A Google Sheet From A Pandas DataFrameI am going to use the keyword data that I originally downloaded from Ahrefs, however if you have another csv thats absolutely fine. Id encourage you to simply read any csv file and to create a pandas dataframe: df = pd.read_csv('../1_Keywords/data/digital-marketing-keyword-ideas.csv', delimiter='t', encoding='UTF-16') df.drop(columns=['#'], inplace=True)To upload a dataframe to this worksheet we can use the following syntax: set_dataframe(df, start, copy_index=False, copy_head=True, extend=False, fit=False, escape_formulae=False, **kwargs) wks.set_dataframe(df, start=(1,1))Its worth remembering that when you run the above command you will overwrite any cell values. However if you prefer to progressively append new rows to the bottom of the worksheet, please use the below syntax: wks.set_dataframe(df, start=(1,1), extend=True)Pro Tip: If you dont have enough rows within your existing worksheet, then uploading a pandas dataframe might fail! In order to account for this you can check the existing number of rows within the worksheet vs the number of rows within the dataframe and add extra rows if required. We will do the same checking for the columns too! # If the number of rows within the worksheet is less than the dataframe: if wks.rows < df.shape[0]: number_of_rows_to_add = df.shape[0] - wks.rows + 1 # Adding the required number of rows wks.add_rows(number_of_rows_to_add) # If the number of cols within the worksheet is less than the dataframe: elif wks.cols < df.shape[1]: number_of_cols_to_add = df.shape[1] - wks.cols + 1 wks.add_cols(number_of_cols_to_add) else: pass How To Download A Pandas Dataframe From A Google Sheets Specific WorksheetWe can also get data directly from any google sheets worksheet in the form of a pandas dataframe: get_as_df(has_header=True, index_column=None, start=None, end=None, numerize=True, empty_value='', value_render=, **kwargs)[source] dataframe_two = wks.get_as_df()print(f"This new dataframe has {dataframe_two.shape[0]} rows and {dataframe_two.shape[1]} columns") # This new dataframe has 1000 rows and 11 columnsdataframe_two.head(6)How To Get All Of The Google Sheet Values In A Python FormatWe can also get all of the values in a list of lists: all_values = wks.get_all_values()In my case, notice how this has picked up all of the empty spaces that are located on the right hand side of the worksheet: print(all_values[0]) ['Keyword', 'Country', 'Difficulty', 'Volume', 'CPC', 'Clicks', 'CPS', 'Return Rate', 'Parent Keyword', 'Last Update', 'SERP Features', '', '', '', '', '', '', '', '']A way to quickly remove the empty strings per list would be to do a nested list comprehension: cleaned_values = [[item for item in unique_list if item ]for unique_list in all_values]How To Get Cell Ranges In Google Sheets With PythonYou can also extract specific ranges (columns and rows) similar to your excel functions: cell_range = wks.range('A1:F10', returnas='matrix')print(cell_range)[['Keyword', 'Country', 'Difficulty', 'Volume', 'CPC', 'Clicks'], ['coast', 'gb', '42', '70000', '2.5', '64715'], ['hubspot', 'gb', '67', '63000', '5', '59708'], ['digital marketing', 'gb', '74', '19000', '7', '11033'], ['digital', 'gb', '89', '16000', '2.5', '5912'], ['content meaning', 'gb', '45', '4400', '17', '622'], ['digital media', 'gb', '24', '3600', '3', '1671'], ['digital marketing agency', 'gb', '57', '3400', '18', '3925'], ['digital uk', 'gb', '24', '3100', '1.2', '2402'], ['what is digital marketing', 'gb', '74', '3100', '2.5', '2119']]How To Get A Single Row For Extracting The Column HeadersWe can get a single row with: wks.get_row(row, returnas='matrix', include_tailing_empty=True, **kwargs) headers = wks.get_row(1, include_tailing_empty=False)print(headers)['Keyword', 'Country', 'Difficulty', 'Volume', 'CPC', 'Clicks', 'CPS', 'Return Rate', 'Parent Keyword', 'Last Update', 'SERP Features']️️️ Also notice how weve changed the include_tailing_empty to False which automatically removed any empty strings in that row ️️️ How To Extract A Single ColumnSometimes you might want to select a specific column from your worksheet: first_column = wks.get_col(1) first_column_data = first_column[1:] # We are doing a python slice here to avoid # extrecting the column names from the first row (keyword)How To Sort By A ColumnYou can easily sort the google sheet by a specific column with: wks.sort_range(start, end, basecolumnindex=0, sortorder='ASCENDING') # Sorts the data in rows based on the given column index. wks.sort_range(start='A2', end='L1001',basecolumnindex=4, sortorder='DESCENDING' )Its worth knowing that np.nans (not a number) are treated as large numbers within google sheets. Therefore lets clear the sheet, remove all of the nans before uploading the dataframe, then we will perform this operation again. How To Clear A Google Sheet With PythonYou can clear all of the existing rows and columns within the sheet by referencing the same workspace variable (wks) and the .clear() syntax: Now Id recommend removing any nans before uploading your dataframe! So lets remove any np.nans within the Volume Column: df.dropna(subset=['Volume'], inplace=True) wks.set_dataframe(df, start=(1,1))Lets try sorting again by the Volume column with pygsheets: wks.sort_range(start='A2', end='L1001', basecolumnindex=4, sortorder='DESCENDING')How To Loop Over Every RowYou can iterate over row in your google sheet with a for loop: for row in wks: print(row)Updating Specific Parts Of Your Google SheetUpdating parts of your Google sheet gives you the flexibility to do a raw data dump into google sheets. Afterwards and if other situations or APIs change, you can specifically update any column, row or a range within your google sheet. How To Replace Specific Values Within Your Google SheetReplacing values within your google sheets is incredibly easy and can be done via: wks.replace(pattern, replacement=None, **kwargs)Lets for example try replacing any np.nans (not a number) in the Clicks and CPS column: wks.replace("NaN", replacement="0")Although this has indeed replaced the NaN values, the operation has occurred across all columns and rows within the worksheet. This might not be exactly what youre looking for. For example maybe youre simply looking to replace cell values within 2 columns? Introducing Find And Update Valueswks.clear() wks.set_dataframe(df, start=(1,1))We will perform the following operations:
If youd like to update a range of values use the following syntax, for this example well assign the selected nan values to Other: wks.update_values(crange=None, values=None, cell_list=None, extend=False, majordim='ROWS', parse=None) for cell in cells: cell.value = "Other" wks.update_values(cell_list=cells)As you can see, weve only updated the CPC and Clicks column NaN values with Other and the other column values remain the same! In the example above weve specifically referenced a cell_list, however you could also use the two methods below: # Update a single cell. wks.update_value('A1', "Numbers on Stuff") # Update the worksheet with the numpy array values. Beginning at cell 'A2'. wks.update_values('A2', my_numpy_array.to_list())Column Data ManipulationHow To Add Multiple ColumnsTo add columns to your existing worksheet you can use this syntax: wks.add_cols(5)How To Get The Column Names With The Column Index PositionsFirstly we will extract all of the headers on row 1, then we will wrap this variable with enumerate and convert it into a python dictionary. This will allow us to create a lookup table for index positions! Notice below how we have also changed: The reason why is that we just want to make sure that if we had empty column values between tables, we would also pick up any empty string values between columns. headers = wks.get_row(1, include_tailing_empty=True) enumerated_headers = list(enumerate(headers))print(f"{enumerated_headers}")[(0, 'Keyword'), (1, 'Country'), (2, 'Difficulty'), (3, 'Volume'), (4, 'CPC'), (5, 'Clicks'), (6, 'CPS'), (7, 'Return Rate'), (8, 'Parent Keyword'), (9, 'Last Update'), (10, 'SERP Features'), (11, ''), (12, ''), (13, ''), (14, ''), (15, ''), (16, ''), (17, ''), (18, '')] Now lets remove any of the enumerated headers that have an empty string with a list comprehension: enumerated_headers = [tuple_object for tuple_object in enumerated_headers if tuple_object[1]]print(f"These are the cleaned, enumerated headers: n n {enumerated_headers}")These are the cleaned, enumerated headers: [(0, 'Keyword'), (1, 'Country'), (2, 'Difficulty'), (3, 'Volume'), (4, 'CPC'), (5, 'Clicks'), (6, 'CPS'), (7, 'Return Rate'), (8, 'Parent Keyword'), (9, 'Last Update'), (10, 'SERP Features')]We can then convert this list of tuples into a python dictionary to create a look up table for columns versus their index positions: lookup_table = dict(enumerated_headers) lookup_table_reversed = {value: key for key, value in lookup_table.items()}print(f"From this you can see how we've created a lookup table via a python dictionary where the key is the column name and the value is the index position: nn {lookup_table_reversed}")From this you can see how we've created a lookup table via a python dictionary where the key is the column name and the value is the index position: {'Keyword': 0, 'Country': 1, 'Difficulty': 2, 'Volume': 3, 'CPC': 4, 'Clicks': 5, 'CPS': 6, 'Return Rate': 7, 'Parent Keyword': 8, 'Last Update': 9, 'SERP Features': 10}How To Delete ColumnsYou will need to pass two parameters when deleting columns via pyghseets: wks.delete_cols(index, number=1) index Index of first column to delete. number Number of columns to delete. print(f"Currently we have {wks.cols} columns in the google sheet.") # Currently we have 19 columns in the google sheet.max(lookup_table.keys()) + 1 # 11As we can see the highest index position that we have as a true column from row 1 is 10. Therefore lets delete all of the columns from index position 12 and greater: Pro tip: Now the reason why it is 12 and not 11 is because pygsheets starts its index positioning for columns and rows at 1, whilst python starts its indexing at position 0. Also pygsheets is inclusive with the starting index (1 means from 1 onwards, rather than column 2 onwards). wks.delete_cols(index=12,number=wks.cols - 10)You can now easily delete multiple columns by creating your own column index with enumerate! How To Insert ColumnsThe following syntax allows you to insert multiple columns: wks.insert_cols(col, number=1, values=None, inherit=False) Parameters: col Index of the col at which the values will be inserted. number Number of columns to be inserted. values Content to be inserted into new columns. inherit New cells will inherit properties from the column to the left (True) or to the right (False).How To Update A Single ColumnIn order to update a single column, use this syntax: wks.update_col(index, values, row_offset=0)We can use the previous lookup table that we created to search for a specific column: data = wks.update_col(index = lookup_table_reversed['CPS'] + 1 ,values = wks.rows * ["NA"], row_offset=1)Protip: If you need to create a list of a specific size in python use the following code: this_list_will_be_10_5s = [5] * 10 print([5] * 10) # [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]How To Adjust Column WidthDo you need to adjust specific columns to be a certain width? No problem! wks.adjust_column_width(start, end=None, pixel_size=None)Also its worth remembering that whenever you use wks.clear(), all of the formatting will be erased from the worksheet. We can just provide pygsheets with a start index and end index for columns with a specific pixel size like so: wks.adjust_column_width(start=1, end=3, pixel_size=300)Row Data ManipulationHow To Select A Single RowTo select a single row you can just use: wks.get_row(row, returnas='matrix', include_tailing_empty=True, **kwargs) include_tailing_empty whether to include empty trailing cells/values after last non-zero value row index of row kwargs all parameters of pygsheets.Worksheet.get_values() returnas (matrix, cell, range) return as cell objects or just 2d array or range object wks.get_row(1)['Keyword', 'Country', 'Difficulty', 'Volume', 'CPC', 'Clicks', 'CPS', 'Return Rate', 'Parent Keyword', 'Last Update', 'SERP Features']How To Insert Multiple RowsSelecting multiple rows is very easy and can be done like so: wks.insert_rows(row, number=1, values=None, inherit=False)Lets find the last row to add a row, and we can insert a new row at index position 3. Pro Tip: When you insert new rows, they must be below the index number of the final row. If you need to create more rows, Id recommend using this method instead: wks.add_rows(500)How To Delete RowsTo delete rows, utilise this syntax: wks.delete_rows(index, number=1) :param index: Index of first row to delete :param number: Number of rows to delete wks.delete_rows(5, number=7)Formatting Google Sheets In PythonFrozen Rows + ColumnsYou can easily set the number of Frozen Rows + Columns with the two commands: wks.frozen_rows = 1 wks.frozen_cols = 2How To Bold CellsLets bold all of the cells in the first row: from pygsheets.datarange import DataRangemodel_cell = wks.cell('A1') model_cell.set_text_format('bold', True) DataRange('A1','K1', worksheet=wks).apply_format(model_cell)The first image below shows the BEFORE, and the second image shows the AFTER after of applying the DataRange based upon the model cell! Before: After: Batching Your API CallsIf you are going to make a lot of requests to the Google Sheets API, its recommended to batch your API requests. This reduces the chances that your API requests will be unsuccesful due to rate limits. The way that pygsheets implements API batching is as follows: wks.unlink() for i in range(10): wks.update_value((1, i), i) # wont call api wks.link() # will do all the updatesYou can easily share any connected spreadsheet with: .share(email_or_domain, role='reader', type='user', **kwargs)Additionally you can include an email message to your user! sheet.share('', role='commenter', type='user', emailMessage='Here is the spreadsheet we talked about!')How To Export A Google Sheet To A .CSVAlso its possible to easily export your google sheet to a .csv file with: wks.export(file_format=, filename=None, path='')However well need to enable the Google Drive API for this to work, so quickly go to your Google Cloud Project and activate this API for the following command to work: wks.export(filename='this_is_a_csv_file')How To Convert A Google Sheet Into JSONSometimes when communicating with another developer, they might want to receive your data in JSON format. You can easily turn your google sheets data into a JSON API and feed it directly into another REST API by chaining these two commands: wks.get_as_df().to_json()This will do two things:
How To Use Googles Python Native API For Google SheetsThe above commands solely work within Pygsheets, however its important that you learn how to connect with Googles Native Python API for google sheets. How To Connect To The APIfrom googleapiclient.discovery import build with open('service_account.json') as source: info = json.load(source) credentials = service_account.Credentials.from_service_account_info(info)Sending Your First RequestIn this tutorial, we are going to be specifically look at the batchClear, batchGet, batchUpdate as these three end points are incredibly useful. service = discovery.build('sheets', 'v4', credentials=credentials) # The ID of the spreadsheet to update. spreadsheet_id = 'my-spreadsheet-id' # TODO: Update placeholder value. batch_clear_values_request_body = { # The ranges to clear, in A1 notation. 'ranges': [], # TODO: Update placeholder value. # TODO: Add desired entries to the request body. } request = service.spreadsheets().values().batchClear(spreadsheetId=spreadsheet_id, body=batch_clear_values_request_body) response = request.execute() # TODO: Change code below to process the `response` dict: print(response)Lets Understand This Step By Step:
Clearing A Column / Range With batchClearwith open('service_account.json') as source: info = json.load(source) credentials = service_account.Credentials.from_service_account_info(info) google_ghseets_api= build('sheets', 'v4', credentials=credentials, cache_discovery=False) spreadsheet_id = '1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q' batch_clear_values_request_body = { # The ranges to clear, in A1 notation. 'ranges': ['A1:A1000'], # TODO: Update placeholder value.. } request = google_ghseets_api.spreadsheets().values().batchClear(spreadsheetId=spreadsheet_id, body=batch_clear_values_request_body) response = request.execute() print(response){'spreadsheetId': '1OBR8HEE2rjaWqZPnb8mHnp3YpxxELcy5Zqve3gtX22Q', 'clearedRanges': ['Sheet1!A1:A989']}Adding data validation boxes is relatively easy, the code below will add several data validation boxes to the second column. request = { "requests": [ { "setDataValidation": { "range": { "sheetId": 0, "startRowIndex": 1, "startColumnIndex": 1, "endColumnIndex":2 }, "rule": { "condition": { "type": "ONE_OF_LIST", "values": [ { "userEnteredValue": "EUROPE_LR" }, { "userEnteredValue": "EUROPE_HR" }, { "userEnteredValue": "WORLD_HR" }, { "userEnteredValue": "WORLD_LR" } ] } } } } ] } request = google_ghseets_api.spreadsheets().batchUpdate(spreadsheetId=spreadsheet_id, body=request) response = request.execute()How To Retrieve A Range of Cells# The A1:D1 notation of the values to retrieve. ranges = ['A1:D1000'] # TODO: Update placeholder value. request = google_ghseets_api.spreadsheets().values().batchGet(spreadsheetId=spreadsheet_id, ranges=ranges) response = request.execute()spreadsheet_data = response['valueRanges'][0]['values']print(spreadsheet_data[0:3]) # [['', 'Country', 'Difficulty', 'Volume'], ['', 'gb', '42', '70000'], ['', 'EUROPE_LR', '67', '63000']]Hopefully this guide has provided you with a comprehensive overview of how best to use Google Sheets with Python!
|