Dataframe in Pandas
A Pandas DataFrame is a two-dimensional, labeled data structure similar to a spreadsheet or SQL table. It consists of rows and columns, where:
- Rows are indexed (default is integers starting from 0).
- Columns can have labels and hold data of various types (e.g., integers, strings, floats).
Features of a Pandas DataFrame: 1. Two-Dimensional: Data is organized in rows and columns.
2. Heterogeneous Data: Each column can have a different data type.
3. Indexing: Both rows and columns are indexed, making it easy to access and manipulate data.
4. Flexibility: Can be created from various sources like dictionaries, lists, NumPy arrays, or external files (CSV, Excel, SQL, etc.).
Example: Creating a DataFrame Let’s create a dataframe in the Pandas.
a) Creating a DataFrame from a dictionary.
Python
data = { 'Name': [‘Ashish’, 'Katrina', 'Alia'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Mumbai'] } df = pd.DataFrame(data) print(df)
The output of the above code is shown below:
b) Creating a DataFrame from a list of lists.
Python
newdata = [ ['Ashish', 25, 'New York'], ['Esha', 30, 'Los Angeles'], ['Salman', 35, 'Mumbai'] ] columnnames = ['Name', 'Age', 'City'] df = pd.DataFrame(newdata, columns=columnnames) print(df)
The output of the above code is shown below:
Accessing Data in a DataFrame • Access a column:
Python
The output of the above code is shown below:
• Access a row:
Python
print(df.iloc[1]) # Access row by integer index
The output of the above code is shown below:
Dataframe.size The Dataframe.size property in series used to return the number of the items in the dataframe. It's calculated by multiplying the total number of rows by the total number of columns
Size= Number of columns * Number of rows
Example: Create a dataframe.
Python
data = { 'Name': ['Ashish', 'Katrina', 'Alia'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Mumbai'] } df = pd.DataFrame(data) print(df)
The above code is used to create the dataframe.
Python
The above code is used to get the size of the dataframe. The output of the above code is 9. We can also see in the image below: