How To Get Column Names In Pandas Dataframe

Python DataFrames are two-dimensional tables that can be scaled and mutated. Rows and columns are used to store data in Pandas DataFrame. It is possible to identify the columns by their header names, which are unique to each column. In this tutorial, you will learn to get column names in Pandas Dataframe using Python.

Using examples, this tutorial will cover the various methods for retrieving column names from a Pandas Dataframe.

Get Column Names In Pandas Dataframe Using Python

Consider a straightforward dataframe, which we will be employing throughout the remainder of this guide.

# import pandas library
import numpy as np
import pandas as pd
 
# create pandas DataFrame
df = pd.DataFrame({'University Names': ['Carnegie', 'Harvard', 'MIT', 'UC Berkley', 'UC Penn', 'Stanford'],
                   'USNews Ranking': [5,3,2,8,7,1],
                   'Total Graduate Courses': [20,24 ,15, 11, 18, 22],
                   'Average Tution': [80000, 75000, 85000, 70000, 90000, 100000],
                   'Admit Rate': [15, 13, 7, 18, 9, 5]
                   })
print(df)

Output:

  University Names  USNews Ranking  Total Graduate Courses  Average Tution  Admit Rate
0         Carnegie               5                      20           80000          15
1          Harvard               3                      24           75000          13
2              MIT               2                      15           85000           7
3       UC Berkley               8                      11           70000          18
4          UC Penn               7                      18           90000           9
5         Stanford               1                      22          100000           5

1. Using Column Attributes

With the Columns attribute of a Pandas Dataframe, you can get the names of the columns in the quickest and most convenient way. Using the df.columns() will return all the columns present in the dataframe.

Let us see in the below example code the usage of column attributes to get the columns in the pandas dataframe.

# import pandas library
import numpy as np
import pandas as pd
 
# Creating A Sample Data Frame
df = pd.DataFrame({'University Names': ['Carnegie', 'Harvard', 'MIT', 'UC Berkley', 'UC Penn', 'Stanford'],
                   'USNews Ranking': [5,3,2,8,7,1],
                   'Total Graduate Courses': [20,24 ,15, 11, 18, 22],
                   'Average Tution': [80000, 75000, 85000, 70000, 90000, 100000],
                   'Admit Rate': [15, 13, 7, 18, 9, 5]
                   })

#Print All the Columns
print(df.columns)

Output:

Index(['University Names', 'USNews Ranking', 'Total Graduate Courses',
       'Average Tution', 'Admit Rate'],
      dtype='object')
  • As you can see in the above code, I was able to get all the column names using the df.columns.
  • columns function is part of the dataframe library that allows getting all the column names present in the dataframe.
  • But the return is in form of a list.

2. Using df.columns.values To Get Column Name In Pandas DataFrame

Assuming you are using Python 3.5 or higher, or the most recent Pandas version 1.4 or higher, you could use the function df.columns.values, which will return all of the columns as a NumPy array or list.

Let us see in the below example code the usage of df.columns.values to get the column name in Pandas Dataframe.

# import pandas library
import numpy as np
import pandas as pd
 
# Creating A Sample Data Frame
df = pd.DataFrame({'University Names': ['Carnegie', 'Harvard', 'MIT', 'UC Berkley', 'UC Penn', 'Stanford'],
                   'USNews Ranking': [5,3,2,8,7,1],
                   'Total Graduate Courses': [20,24 ,15, 11, 18, 22],
                   'Average Tution': [80000, 75000, 85000, 70000, 90000, 100000],
                   'Admit Rate': [15, 13, 7, 18, 9, 5]
                   })

#Print All the Columns
print(df.columns.values)

Output:

['University Names' 'USNews Ranking' 'Total Graduate Courses'
 'Average Tution' 'Admit Rate']
  • As you can see in the above code, I was able to print only the column names using the column.values method.
  • It returns only the list of column names present in the dataframe.
  • If you are using the Python version later than 3.5 then you have to use the column.values.tolist() function to get the list of column names in pandas.
  • Let us see the example code for the later Python 3.5 version.
# import pandas library
import numpy as np
import pandas as pd
 
# Creating A Sample Data Frame
df = pd.DataFrame({'University Names': ['Carnegie', 'Harvard', 'MIT', 'UC Berkley', 'UC Penn', 'Stanford'],
                   'USNews Ranking': [5,3,2,8,7,1],
                   'Total Graduate Courses': [20,24 ,15, 11, 18, 22],
                   'Average Tution': [80000, 75000, 85000, 70000, 90000, 100000],
                   'Admit Rate': [15, 13, 7, 18, 9, 5]
                   })

getColumnNames = df.columns.values.tolist()

#Print All the Columns
print(getColumnNames)

Output:

['University Names', 'USNews Ranking', 'Total Graduate Courses', 'Average Tution', 'Admit Rate']

3. Using List() Method

The list() method of the Pandas Dataframe can also be used to obtain a list of column headers from a Pandas Dataframe. Passing the Dataframe object to the list() method results in a list of all the column headers being returned by this method.

Let us see in the below example code the usage of the list method to get the column name in the pandas dataframe.

# import pandas library
import numpy as np
import pandas as pd
 
# Creating A Sample Data Frame
df = pd.DataFrame({'University Names': ['Carnegie', 'Harvard', 'MIT', 'UC Berkley', 'UC Penn', 'Stanford'],
                   'USNews Ranking': [5,3,2,8,7,1],
                   'Total Graduate Courses': [20,24 ,15, 11, 18, 22],
                   'Average Tution': [80000, 75000, 85000, 70000, 90000, 100000],
                   'Admit Rate': [15, 13, 7, 18, 9, 5]
                   })

#getting The DataFrame in List
columnList = list(df)

#Print All the Columns
print(columnList)

Output:

['University Names', 'USNews Ranking', 'Total Graduate Courses', 'Average Tution', 'Admit Rate']
  • As you can see in the above code, I was able to get the names of columns present in the dataframe but just typecasting the dataframe in a list.
  • Typecasting the dataframe variable returned the list of column names present in the dataframe.

4. Using DataType To Get Column Name In Pandas DataFrame

In some cases, we may need to retrieve the column name along with the type of the column. In that case, the dtypes attribute can be used to our advantage. This function returns a list containing the data types of each column in the dataframe that was passed in.

Let us see in the below example code, how you can use the datatype to get the column names in Pandas dataframe.

# import pandas library
import numpy as np
import pandas as pd
 
# Creating A Sample Data Frame
df = pd.DataFrame({'University Names': ['Carnegie', 'Harvard', 'MIT', 'UC Berkley', 'UC Penn', 'Stanford'],
                   'USNews Ranking': [5,3,2,8,7,1],
                   'Total Graduate Courses': [20,24 ,15, 11, 18, 22],
                   'Average Tution': [80000, 75000, 85000, 70000, 90000, 100000],
                   'Admit Rate': [15, 13, 7, 18, 9, 5]
                   })

#Get List Of Column Name and Its DataType
columnList = df.dtypes

#Print All the Columns
print(columnList)

Output:

University Names          object
USNews Ranking             int64
Total Graduate Courses     int64
Average Tution             int64
Admit Rate                 int64
dtype: object

How To Get Column Names With NaN

Additionally, we can get all of the column headers with a value of 0 or 1. NAN stands for null in Pandas, and it represents the absence of values.

1. Using isna().any()

If a column contains no data, we can retrieve all of the columns with no data by using the Pandas isna() and isnull() methods.

Whether the values are NA is determined by the isna() method, which returns a boolean same-sized object. None and numpy are examples of non-aligned values. The value of NaN is mapped to the value of True in the following way: Otherwise, False values are assigned to all other possible combinations.

# import pandas library
import numpy as np
import pandas as pd
 
# Creating A Sample Data Frame
df = pd.DataFrame({'University Names': ['Carnegie', 'Harvard', 'MIT', 'UC Berkley', 'UC Penn', 'Stanford'],
                   'USNews Ranking': [5,3,2,8,7,1],
                   'Total Graduate Courses': [20,24 ,15, 11, 18, 22],
                   'Average Tution': [80000, np.nan, 85000, 70000, 90000, np.nan],
                   'Admit Rate': [15, 13, 7, 18, 9, 5]
                   })

#Get List Of Column Name with Nan
columnList = df.isna().any()

#Print All the Columns
print(columnList)

Output:

University Names          False
USNews Ranking            False
Total Graduate Courses    False
Average Tution             True
Admit Rate                False
dtype: bool
How To Get Column Names In Pandas Dataframe

You can also use another method to obtain the columns containing null values.

2. Using isnull().any()

It takes a scalar or array-like object and returns true or false depending on whether values are missing from the object (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike). To demonstrate how it works, let’s look at an example.

# import pandas library
import numpy as np
import pandas as pd
 
# Creating A Sample Data Frame
df = pd.DataFrame({'University Names': ['Carnegie', 'Harvard', 'MIT', 'UC Berkley', 'UC Penn', 'Stanford'],
                   'USNews Ranking': [5,3,2,8,7,1],
                   'Total Graduate Courses': [20,24 ,15, 11, 18, 22],
                   'Average Tution': [80000, np.nan, 85000, 70000, 90000, np.nan],
                   'Admit Rate': [15, 13, 7, 18, 9, 5]
                   })

#Get List Of Column Name with Nan
columnList = df.isnull().any()

#Print All the Columns
print(columnList)

Output:

University Names          False
USNews Ranking            False
Total Graduate Courses    False
Average Tution             True
Admit Rate                False
dtype: bool

Wrap Up

Pandas Dataframes are used to store information in rows and columns. Each column will have a unique header name that will be used to distinguish it from the others.

We have used a variety of attributes and methods to obtain the column names in the Pandas Dataframe, including df.columns, df.columns.values, df.columns.values.tolist(), list(df), and so on.

Please let me know in the comments section if you are still having problems with the above example, and I will gladly assist you as soon as possible.

section. I would be delighted to assist you as soon as possible.

Further Read:

  1. Normal Distribution in Python
  2. ImportError: attempted relative import with no known parent package
  3. [Fixed] TypeError: list indices must be integers or slices, not tuple
  4. How To Replace Characters In A String In Python

Leave a Comment