File Formats

July 25, 2025 4 min read Python Data-Science File-Formats Docs IBM-FSSD Csv Json Excel Data-Storage

This document explores common file formats used in data science, including their structure, advantages, and typical use cases for data storage and exchange.

On this page

This document covers the essential file formats used in data science, such as CSV, JSON, and Excel. It explains their structure, how to read and write them in Python, and the advantages and limitations of each format for data storage and exchange.

Introduction

File formats are fundamental to data science, enabling the storage, exchange, and analysis of data. Understanding the structure and use cases of different file formats is crucial for efficient data handling.

Common File Formats in Data Science

Format	Description	Typical Use Cases
CSV	Comma-separated values; plain text, tabular data	Data export/import, spreadsheets
JSON	JavaScript Object Notation; hierarchical, human-readable	APIs, config files, web data
Excel	Microsoft Excel format; supports formulas, formatting	Business data, analytics

CSV (Comma-Separated Values)

CSV files store tabular data in plain text, with each line representing a row and columns separated by commas. They are widely used for data exchange due to their simplicity and compatibility.

1name,age,score
2Alice,30,85
3Bob,25,90

JSON (JavaScript Object Notation)

JSON is a lightweight, human-readable format for representing structured data. It supports nested objects and arrays, making it suitable for complex data structures.

1[
2  { "name": "Alice", "age": 30, "score": 85 },
3  { "name": "Bob", "age": 25, "score": 90 }
4]

Excel Files

Excel files (XLS, XLSX) are binary formats that support multiple sheets, formulas, and formatting. They are commonly used in business and analytics.

Reading and Writing Files in Python

Python provides libraries for working with these formats:

csv for CSV files
json for JSON files
pandas for all formats, including Excel

Example: Reading a CSV File

1import csv
2with open('data.csv', 'r') as file:
3    reader = csv.reader(file)
4    for row in reader:
5        print(row)

Example: Reading a JSON File

1import json
2with open('data.json', 'r') as file:
3    data = json.load(file)
4    print(data)

Example: Reading an Excel File with pandas

1import pandas as pd
2df = pd.read_excel('data.xlsx')
3print(df.head())

Advantages and Limitations

Format	Advantages	Limitations
CSV	Simple, widely supported	No data types, no hierarchy
JSON	Supports complex/nested data, human-readable	Larger files, not ideal for tabular data
Excel	Rich features, formulas, formatting	Proprietary, larger file size

Conclusion

Understanding file formats is essential for effective data science workflows. Choosing the right format depends on the data structure, use case, and compatibility requirements. Python makes it easy to read and write these formats for analysis and sharing.

FAQ

A binary format for storing images
A plain text format for tabular data with comma-separated values
A markup language for web pages
A compressed archive format

(2) CSV is a plain text format where each line represents a row and columns are separated by commas, making it ideal for tabular data exchange.

It supports complex and nested data structures
It is only readable by machines
It is limited to tabular data
It cannot be used for web APIs

(1) JSON supports complex and nested data structures, making it suitable for hierarchical data and web APIs.

csv
json
pandas
xml

(3) pandas is a powerful Python library that can read and write Excel files, as well as CSV and JSON formats.

Format	Use Case
A. CSV	1. Data export/import, spreadsheets
B. JSON	2. APIs, config files, web data
C. Excel	3. Business data, analytics

A-1, B-2, C-3.

It cannot store tabular data
It does not support data types or hierarchy
It is not human-readable
It is a proprietary format

(2) CSV does not support data types or hierarchical structures, which limits its use for complex data.

JSON files are human-readable and support nested data structures.

True. JSON is designed to be both human-readable and capable of representing complex, nested data structures.

The file will be read correctly
Only plain text content will be read, losing formatting and formulas
The file will be converted to JSON
The file will not open at all

(2) The csv module can only read plain text; Excel-specific features like formatting and formulas will be lost.

The file size
The compatibility with the tools and systems involved
The color scheme of the file
The number of sheets in the file

(2) Compatibility with the tools and systems is the most important factor when selecting a file format for data exchange.

Web Scraping

Module Summary

Browse Courses

File Formats

Introduction

Common File Formats in Data Science

CSV (Comma-Separated Values)

JSON (JavaScript Object Notation)

Excel Files

Reading and Writing Files in Python

Example: Reading a CSV File

Example: Reading a JSON File

Example: Reading an Excel File with pandas

Advantages and Limitations

Conclusion

FAQ

Which of the following best describes the CSV file format?

What is a key advantage of using JSON for data storage?

Which Python library is commonly used to read Excel files?

Match the following file formats with their typical use cases

Which of the following is a limitation of the CSV format?

True or False

What is the most likely outcome if you try to open an Excel file with the csv module in Python?

Which of the following should be checked first when choosing a file format for data exchange?