This document introduces the Pandas library for data analysis, covering its import, usage for reading files, creating DataFrames, and accessing data efficiently. Key concepts include working with CSV and Excel files, DataFrame operations, and indexing methods.
Pandas is a powerful Python library for data analysis and manipulation. This document explains how to import Pandas, read CSV and Excel files, create and work with DataFrames, and efficiently access and slice data using various indexing methods. Readers will learn practical techniques for handling tabular data in Python.
Pandas is a widely used Python library that provides tools for data analysis and manipulation. It offers pre-built classes and functions to simplify working with structured data, such as tables and spreadsheets. Importing Pandas is done using the import command, and it is common to use the abbreviation pd for convenience.
To use Pandas, ensure it is installed in your environment. Import the library as follows:
1import pandas as pd
This command gives access to Pandas’ extensive functionality for data analysis.
Pandas can read various file types, including CSV and Excel files. The process involves specifying the file path and using the appropriate function:
1# Reading a CSV file
2csv_path = 'data.csv'
3df = pd.read_csv(csv_path)
4
5# Reading an Excel file
6excel_path = 'data.xlsx'
7df_excel = pd.read_excel(excel_path)
Both methods return a DataFrame, a core data structure in Pandas for tabular data.
A DataFrame can be created from a dictionary, where keys are column labels and values are lists representing rows:
1data = {
2 'artist': ['Artist1', 'Artist2'],
3 'released': [2001, 2002]
4}
5df = pd.DataFrame(data)
| Key | Description |
|---|---|
| artist | Column label for artists |
| released | Column label for release year |
To select a single column:
1df_artist = df[['artist']]
To select multiple columns:
1df_selected = df[['artist', 'released']]
Slicing rows and columns can be done using indexing methods.
Pandas provides iloc and loc for accessing specific elements:
iloc uses integer-based indexing:1# First row, first column
2df.iloc[0, 0]
3# Second row, first column
4df.iloc[1, 0]
5# First row, third column
6df.iloc[0, 2]
loc uses label-based indexing:1# Access by row and column labels
2df.loc[0, 'artist']
3df.loc[1, 'artist']
If the index is customized (e.g., replaced with labels like ‘A’, ‘B’), loc can access data by those labels.
DataFrames can be sliced to create new DataFrames containing selected rows and columns:
1# First two rows, first three columns
2z = df.iloc[:2, :3]
3
4# Using loc for a range of columns
5z = df.loc[:2, 'artist':'released']
Pandas streamlines data analysis in Python by providing intuitive methods for importing, manipulating, and accessing tabular data. Its DataFrame structure and indexing capabilities make it a versatile tool for handling complex datasets efficiently.
(2) Pandas is primarily used for data analysis and manipulation in Python.
(3) When creating a DataFrame from a dictionary, the keys become column labels and the values become rows.
| Function | Description |
|---|---|
| A. read_csv | 1. Reads data from an Excel file |
| B. read_excel | 2. Reads data from a CSV file |
| C. DataFrame | 3. Creates a DataFrame from structured data |
| D. head | 4. Displays the first few rows of a DataFrame |
A-2, B-1, C-3, D-4.
The loc method in Pandas can be used with custom index labels as well as column names.
True. The loc method allows access to data using custom index labels and column names.
(4) Slicing can assign values to a new variable, making statement 4 incorrect.
(2) The correct syntax is df[[‘artist’, ‘released’]].