Blog

Mastering File Handling in Python: A Comprehensive Guide

File handling is an essential aspect of programming, enabling developers to store, retrieve, and manipulate data. Python, with its rich set of built-in libraries, provides robust support for file operations, ranging from basic text file interactions to advanced binary and random file access. This guide will walk you through the fundamentals of file handling in Python, step by step, from basic concepts to more advanced techniques.

1. Reading and Writing Text Files

Getting Started with Text Files

Opening Files: The first step in file handling is to open the file using the built-in open() function. This function requires two main arguments: the file path and the mode of operation (e.g., 'r' for reading, 'w' for writing).pythonfile = open('example.txt', 'r') file.close() However, it’s recommended to use the with statement, ensuring the file is properly closed after its suite finishes:pythonwith open('example.txt', 'r') as file: # Operations with file

Reading from Files

Python offers multiple methods to read data from files. .read() reads the entire file, .readline() reads the next line, and .readlines() returns a list of lines:pythonwith open('example.txt', 'r') as file: content = file.read() print(content)

Writing to Files

Writing to files is just as straightforward. The .write() method writes a string to the file, and .writelines() can write a list of strings:pythonwith open('example.txt', 'w') as file: file.write("Hello, Python!\n")

2. Binary and Image File Handling

Understanding Binary Files

Binary file operations are similar to text file operations but use a binary mode ('b') appended to the file mode. This is crucial for non-text files, like images or executable files.

with open('example.bin', 'wb') as file:
    file.write(b'\x00\xFF')  # Writing bytes

Handling Image Files

Image files can be treated as binary files but manipulating them usually requires an external library like Pillow. However, for basic read/write operations, binary mode suffices.

3. Random File Access

Seeking and Telling

Random access to files allows you to jump to specific positions within a file. The .seek() method moves the file pointer to a specified position, and .tell() returns the current position.

with open('example.txt', 'rb+') as file:
    file.seek(0, 2)  # Move to the end of the file
    position = file.tell()
    print(f'Current position: {position}')

Advanced Topics

After mastering the basics, consider exploring these advanced topics:

  • File and Directory Management: Learn to navigate and manipulate the filesystem with os and shutil.
  • Context Managers: Understand how to manage resources efficiently using the with statement.
  • Serialization/Deserialization: Use pickle for object serialization and modules like json or csv for working with these formats.
  • Working with Archives: Compress and decompress files using zipfile and tarfile.
  • File System Paths: Employ pathlib for an object-oriented approach to file system path manipulation.

Project 1 – Employee Timesheet

Let’s build a small project (not a performant one but good enough to learn the concepts with scope for further improvement)

Creating an employee timesheet application is an excellent practical use case to apply and master file handling concepts in Python. This console application will demonstrate how to add, modify, delete, and report employee timesheets using simple text file operations. Here’s how you can implement it step by step:

Step 1: Setup and Initial Design

First, let’s outline the basic functionality and design our file structure. We’ll keep a single text file (timesheets.txt) to store the data, with each line representing an entry in the format:

Format of Data: EmployeeID, Date, Hours

Sample Values:

101,2023-03-01,8
102,2023-03-01,9

Step 2: Adding an Entry

We’ll start by implementing the functionality to add a new timesheet entry.

def add_entry():
    with open('timesheets.txt', 'a') as file:
        employee_id = input("Enter Employee ID: ")
        date = input("Enter Date (YYYY-MM-DD): ")
        hours = input("Enter Hours Worked: ")
        file.write(f"{employee_id},{date},{hours}\n")
        print("Entry added successfully.")

Step 3: Modifying an Entry

Modifying an entry will involve reading all entries, modifying the relevant one, and writing them back.

def modify_entry():
    employee_id = input("Enter Employee ID to modify: ")
    date = input("Enter Date of entry to modify (YYYY-MM-DD): ")
    modified = False
    with open('timesheets.txt', 'r') as file:
        lines = file.readlines()
    with open('timesheets.txt', 'w') as file:
        for line in lines:
            if line.startswith(f"{employee_id},{date}"):
                hours = input("Enter new Hours Worked: ")
                file.write(f"{employee_id},{date},{hours}\n")
                modified = True
            else:
                file.write(line)
    if modified:
        print("Entry modified successfully.")
    else:
        print("Entry not found.")

Step 4: Deleting an Entry

Deleting an entry is similar to modifying, except you omit the entry you want to delete.

def delete_entry():
    employee_id = input("Enter Employee ID to delete: ")
    date = input("Enter Date of entry to delete (YYYY-MM-DD): ")
    deleted = False
    with open('timesheets.txt', 'r') as file:
        lines = file.readlines()
    with open('timesheets.txt', 'w') as file:
        for line in lines:
            if not line.startswith(f"{employee_id},{date}"):
                file.write(line)
            else:
                deleted = True
    if deleted:
        print("Entry deleted successfully.")
    else:
        print("Entry not found.")

Step 5: Generating Reports

A simple report might list all entries for a given employee or date.

def generate_report():
    report_type = input("Report by (Employee/Date): ").lower()
    key = input("Enter Employee ID or Date (YYYY-MM-DD): ")
    print("Timesheet Entries:")
    with open('timesheets.txt', 'r') as file:
        for line in file:
            parts = line.strip().split(',')
            if (report_type == "employee" and parts[0] == key) or \
               (report_type == "date" and parts[1] == key):
                print(line.strip())

Step 6: Main Function and User Interaction

Finally, let’s create a simple loop in the main function to interact with the user and call these functions based on user input.

def main():
    while True:
        print("\nEmployee Timesheet Application")
        print("1. Add Entry")
        print("2. Modify Entry")
        print("3. Delete Entry")
        print("4. Generate Report")
        print("5. Exit")
        choice = input("Enter your choice: ")
        
        if choice == '1':
            add_entry()
        elif choice == '2':
            modify_entry()
        elif choice == '3':
            delete_entry()
        elif choice == '4':
            generate_report()
        elif choice == '5':
            print("Exiting application.")
            break
        else:
            print("Invalid choice. Please enter a number between 1 and 5.")

if __name__ == "__main__":
    main()

Conclusion

This simple console application demonstrates the core concepts of file handling in Python, including reading, writing, modifying, and deleting file entries. By implementing this timesheet application, you’ll get hands-on experience with practical file operations, preparing you for more complex file handling tasks in your future projects.

Readings

  • Explore file handling in depth

Assignment

  • Validations – The code is lacking validations. Try putting in some.
  • Feature to list all timesheet entry
  • Use RandomAccess concept to search for entries for edit and delete

Project 2 – Timesheet with OOP

Here is a code reference of the time sheet application using object oriented approach. Study the code, identify potential issue and drop a comment with your solution.

The timesheets.txt file is a fixed length format

1         2024-01-01 40                            
2         2024-01-02 50    
       
FORMAT: 
EmployeeID: 10 characters
1 space (separator)
Date: 10 characters
1 space (separator)
Hours: 5 characters

This totals to 27 characters
                 

The code:
class TimesheetEntry:
    def __init__(self, employee_id, date, hours):
        self.employee_id = employee_id
        self.date = date
        self.hours = hours
    
    def __str__(self):
      # Format with explicit spacing
      return f"{self.employee_id:<10}{self.date}    {self.hours}".ljust(50) 

    @classmethod
    def from_string(cls, data_str):
        # Attempt to split by spaces first
        parts = data_str.strip().split()
        # Check if we have the expected 3 parts; if not, handle the special case
        if len(parts) == 3:
            return cls(parts[0], parts[1], parts[2])
        elif len(parts) == 2:
            # Assuming the second part is a concatenation of date and hours
            employee_id = parts[0]
            date_hours = parts[1]
            # Splitting date and hours based on expected lengths
            date = date_hours[:10]  # YYYY-MM-DD
            hours = date_hours[10:]  # Remaining part
            return cls(employee_id, date, hours)
        else:
            raise ValueError(f"Invalid data format: '{data_str}'. Expected format: 'employee_id date hours'")

    
class TimesheetFileHandler:
    def __init__(self, filepath):
        self.filepath = filepath
        self.record_length = 52  # Fixed length for each record including newline

    def add_entry(self, entry):
        with open(self.filepath, 'a') as file:
            file.write(str(entry) + '\n')

    def find_record_positions(self, employee_id=None, date=None):
        positions = []
        with open(self.filepath, 'r') as file:
            position = 0  # Track the starting position of each line
            while True:
                current_pos = file.tell()  # Get the current position before reading
                line = file.readline()
                if not line:
                    break  # End of file
                if employee_id and line.startswith(employee_id) or date and date in line:
                    positions.append(current_pos)
                position += 1
        return positions

    def read_entry(self, position):
        with open(self.filepath, 'r') as file:
            file.seek(position)
            return TimesheetEntry.from_string(file.readline().strip())

    def modify_entry(self, position, new_entry):
        # For simplicity, we'll delete the old entry and append the new one
        # A more efficient approach would require a fixed record length
        self.delete_entry(position)
        self.add_entry(new_entry)

    def delete_entry(self, position):
        with open(self.filepath, 'r+') as file:
            file.seek(0)
            lines = file.readlines()
            file.seek(0)
            file.truncate()
            for current_position, line in enumerate(lines):
                if current_position != position // self.record_length:
                    file.write(line)


class TimesheetApp:
    def __init__(self):
        self.file_handler = TimesheetFileHandler('timesheets.txt')

    def add_entry_ui(self):
        employee_id = input("Enter Employee ID: ")
        date = input("Enter Date (YYYY-MM-DD): ")
        hours = input("Enter Hours Worked: ")
        entry = TimesheetEntry(employee_id, date, hours)
        self.file_handler.add_entry(entry)
        print("Entry added successfully.")

    def modify_entry_ui(self):
        employee_id = input("Enter Employee ID to modify: ")
        date = input("Enter Date of entry to modify (YYYY-MM-DD): ")
        positions = self.file_handler.find_record_positions(employee_id, date)
        if positions:
            hours = input("Enter new Hours Worked: ")
            new_entry = TimesheetEntry(employee_id, date, hours)
            for position in positions:
                self.file_handler.modify_entry(position, new_entry)
            print("Entry modified successfully.")
        else:
            print("Entry not found.")

    def delete_entry_ui(self):
        employee_id = input("Enter Employee ID to delete: ")
        positions = self.file_handler.find_record_positions(employee_id=employee_id)
        if positions:
            for position in positions:
                self.file_handler.delete_entry(position)
            print("Entry deleted successfully.")
        else:
            print("Entry not found.")

    def generate_report_ui(self):
        choice = input("Report by (Employee ID/Date): ").lower()
        if choice == "employee id":
            employee_id = input("Enter Employee ID: ")
            positions = self.file_handler.find_record_positions(employee_id=employee_id)
        elif choice == "date":
            date = input("Enter Date (YYYY-MM-DD): ")
            positions = self.file_handler.find_record_positions(date=date)
        else:
            print("Invalid choice.")
            return
        if positions:
            print("\nTimesheet Entries:")
            for position in positions:
                entry = self.file_handler.read_entry(position)
                print(entry)
        else:
            print("No entries found.")

    def run(self):
        actions = {'1': self.add_entry_ui, '2': self.modify_entry_ui, '3': self.delete_entry_ui, '4': self.generate_report_ui}
        while True:
            print("\nEmployee Timesheet Application")
            print("1. Add Entry")
            print("2. Modify Entry")
            print("3. Delete Entry")
            print("4. Generate Report")
            print("5. Exit")
            choice = input("Enter your choice: ")
            if choice == '5':
                print("Exiting application.")
                break
            action = actions.get(choice)
            if action:
                action()
            else:
                print("Invalid choice. Please enter a number between 1 and 5.")



def main():
    app = TimesheetApp()
    app.run()

if __name__ == "__main__":
    main()

The OOPS code also uses Random file access concept as well.

How useful was this post?

Click on a heart to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

Leave a Reply