In the first episode of this lesson, we read a CSV file into a pandas’ DataFrame. We learned how to:
In this lesson, we will explore ways to access different parts of the data using:
Loading our dataWe will continue to use the surveys dataset that we worked with in the last episode. Let’s reopen and read in the data again:
Indexing and Slicing in PythonWe often want to work with subsets of a DataFrame object. There are different ways to accomplish this including: using labels (column headings), numeric ranges, or specific x,y index locations. Selecting data using Labels (Column Headings)We use square brackets
We can also create a new object that contains only the data within the
We can pass a list of column names too, as an index to select columns in that order. This is useful when we need to reorganize our data. NOTE: If a column name is not contained in the DataFrame, an exception (error) will be raised.
Python tells us what type of error it is in the traceback, at the bottom it says
Let’s remind ourselves that Python uses 0-based indexing. This means that the first element in an object is located at position
Slicing Subsets of Rows in PythonSlicing using the
The stop bound in Python is different from what you might be used to in languages like Matlab and R.
We can also reassign values within subsets of our DataFrame. But before we do that, let’s look at the difference between the concept of copying objects and the concept of referencing objects in Python. Copying Objects vs Referencing Objects in PythonLet’s start with an example:
You might think that the code In contrast, the Let’s look at what happens when we reassign the values within a subset of the DataFrame that references another DataFrame object:
Let’s try the following code:
What is the difference between these two dataframes? When we assigned the first 3 columns the value of To review and recap:
Okay, that’s enough of that. Let’s create a brand new clean dataframe from the original data CSV file.
Slicing Subsets of Rows and Columns in PythonWe can select specific ranges of our data in both the row and column directions using either label or integer-based indexing.
To
select a subset of rows and columns from our DataFrame, we can use the
which gives the output
Notice that we asked for a slice from 0:3. This yielded 3 rows of data. When you ask for 0:3, you are actually telling Python to start at index 0 and select rows 0, 1, 2 up to but not including 3. Let’s explore some other ways to index and select subsets of data:
NOTE: Labels must be found in the DataFrame or you will get a Indexing by labels We can also select a specific data value using a row and column location within the DataFrame and
In this gives the output Remember that Python indexing begins at 0. So, the index location [2, 6] selects the element that is 3 rows down and 7 columns over in the DataFrame.
Subsetting Data using CriteriaWe can also select a subset of our data using criteria. For example, we can select all rows that have a year value of 2002:
Which produces the following output:
Or we can select all rows that do not contain the year 2002:
We can define sets of criteria too:
Python Syntax Cheat SheetWe can use the syntax below when querying data by criteria from a DataFrame. Experiment with selecting various subsets of the “surveys” data.
Using masks to identify a specific conditionA mask can be useful to locate where a particular subset of values exist or don’t exist - for example, NaN, or “Not a Number” values. To understand masks, we also need to understand Boolean values
include
When we ask Python whether To create a boolean mask:
Let’s try this out. Let’s identify all locations in the survey data that have null (missing or NaN) data values. We can use the A snippet of the output is below:
To select the rows where there are null values, we can use the mask as an index to subset our data as follows:
Note that the We can run
Let’s take a minute to look at the statement above. We are using the Boolean object
How do you slice a list at a specific index in Python?To extract elements with specific indices from a Python list, use slicing list[start:stop:step] . If you cannot use slicing because there's no pattern in the indices you want to access, use the list comprehension statement [lst[i] for i in indices] , assuming your indices are stored in the variable indices .
How do you create a subset of a list?Create list subsets. Select General Lists in the model settings bar.. Select a list, then select Open.. Select Subsets > Insert.. Type a name for the list subset. Create extra list subsets by typing each list subset on a new line. ... . Select OK. ... . Select Grid View and select the list items you want to include in the list subset.. How do you select a specific item in a list Python?To select elements from a Python list, we will use list. append(). We will create a list of indices to be accessed and the loop is used to iterate through this index list to access the specified element. And then we add these elements to the new list using an index.
|