File handling is an essential aspect of programming, enabling developers to store, retrieve, and manipulate data. Python, with its rich set of built-in libraries, provides robust support for file operations, ranging from basic text file interactions to advanced binary and random file access. This guide will walk you through the fundamentals of file handling in Python, step by step, from basic concepts to more advanced techniques.
1. Reading and Writing Text Files
Getting Started with Text Files
Opening Files: The first step in file handling is to open the file using the built-in open()
function. This function requires two main arguments: the file path and the mode of operation (e.g., 'r'
for reading, 'w'
for writing).pythonfile = open('example.txt', 'r') file.close()
However, it’s recommended to use the with
statement, ensuring the file is properly closed after its suite finishes:pythonwith open('example.txt', 'r') as file: # Operations with file
Reading from Files
Python offers multiple methods to read data from files. .read()
reads the entire file, .readline()
reads the next line, and .readlines()
returns a list of lines:pythonwith open('example.txt', 'r') as file: content = file.read() print(content)
Writing to Files
Writing to files is just as straightforward. The .write()
method writes a string to the file, and .writelines()
can write a list of strings:pythonwith open('example.txt', 'w') as file: file.write("Hello, Python!\n")
2. Binary and Image File Handling
Understanding Binary Files
Binary file operations are similar to text file operations but use a binary mode ('b'
) appended to the file mode. This is crucial for non-text files, like images or executable files.
with open('example.bin', 'wb') as file:
file.write(b'\x00\xFF') # Writing bytes
Handling Image Files
Image files can be treated as binary files but manipulating them usually requires an external library like Pillow. However, for basic read/write operations, binary mode suffices.
3. Random File Access
Seeking and Telling
Random access to files allows you to jump to specific positions within a file. The .seek()
method moves the file pointer to a specified position, and .tell()
returns the current position.
with open('example.txt', 'rb+') as file:
file.seek(0, 2) # Move to the end of the file
position = file.tell()
print(f'Current position: {position}')
Advanced Topics
After mastering the basics, consider exploring these advanced topics:
- File and Directory Management: Learn to navigate and manipulate the filesystem with
os
andshutil
. - Context Managers: Understand how to manage resources efficiently using the
with
statement. - Serialization/Deserialization: Use
pickle
for object serialization and modules likejson
orcsv
for working with these formats. - Working with Archives: Compress and decompress files using
zipfile
andtarfile
. - File System Paths: Employ
pathlib
for an object-oriented approach to file system path manipulation.
Project 1 – Employee Timesheet
Let’s build a small project (not a performant one but good enough to learn the concepts with scope for further improvement)
Creating an employee timesheet application is an excellent practical use case to apply and master file handling concepts in Python. This console application will demonstrate how to add, modify, delete, and report employee timesheets using simple text file operations. Here’s how you can implement it step by step:
Step 1: Setup and Initial Design
First, let’s outline the basic functionality and design our file structure. We’ll keep a single text file (timesheets.txt
) to store the data, with each line representing an entry in the format:
Format of Data: EmployeeID, Date, Hours
Sample Values:
101,2023-03-01,8 102,2023-03-01,9
Step 2: Adding an Entry
We’ll start by implementing the functionality to add a new timesheet entry.
def add_entry():
with open('timesheets.txt', 'a') as file:
employee_id = input("Enter Employee ID: ")
date = input("Enter Date (YYYY-MM-DD): ")
hours = input("Enter Hours Worked: ")
file.write(f"{employee_id},{date},{hours}\n")
print("Entry added successfully.")
Step 3: Modifying an Entry
Modifying an entry will involve reading all entries, modifying the relevant one, and writing them back.
def modify_entry():
employee_id = input("Enter Employee ID to modify: ")
date = input("Enter Date of entry to modify (YYYY-MM-DD): ")
modified = False
with open('timesheets.txt', 'r') as file:
lines = file.readlines()
with open('timesheets.txt', 'w') as file:
for line in lines:
if line.startswith(f"{employee_id},{date}"):
hours = input("Enter new Hours Worked: ")
file.write(f"{employee_id},{date},{hours}\n")
modified = True
else:
file.write(line)
if modified:
print("Entry modified successfully.")
else:
print("Entry not found.")
Step 4: Deleting an Entry
Deleting an entry is similar to modifying, except you omit the entry you want to delete.
def delete_entry():
employee_id = input("Enter Employee ID to delete: ")
date = input("Enter Date of entry to delete (YYYY-MM-DD): ")
deleted = False
with open('timesheets.txt', 'r') as file:
lines = file.readlines()
with open('timesheets.txt', 'w') as file:
for line in lines:
if not line.startswith(f"{employee_id},{date}"):
file.write(line)
else:
deleted = True
if deleted:
print("Entry deleted successfully.")
else:
print("Entry not found.")
Step 5: Generating Reports
A simple report might list all entries for a given employee or date.
def generate_report():
report_type = input("Report by (Employee/Date): ").lower()
key = input("Enter Employee ID or Date (YYYY-MM-DD): ")
print("Timesheet Entries:")
with open('timesheets.txt', 'r') as file:
for line in file:
parts = line.strip().split(',')
if (report_type == "employee" and parts[0] == key) or \
(report_type == "date" and parts[1] == key):
print(line.strip())
Step 6: Main Function and User Interaction
Finally, let’s create a simple loop in the main function to interact with the user and call these functions based on user input.
def main():
while True:
print("\nEmployee Timesheet Application")
print("1. Add Entry")
print("2. Modify Entry")
print("3. Delete Entry")
print("4. Generate Report")
print("5. Exit")
choice = input("Enter your choice: ")
if choice == '1':
add_entry()
elif choice == '2':
modify_entry()
elif choice == '3':
delete_entry()
elif choice == '4':
generate_report()
elif choice == '5':
print("Exiting application.")
break
else:
print("Invalid choice. Please enter a number between 1 and 5.")
if __name__ == "__main__":
main()
Conclusion
This simple console application demonstrates the core concepts of file handling in Python, including reading, writing, modifying, and deleting file entries. By implementing this timesheet application, you’ll get hands-on experience with practical file operations, preparing you for more complex file handling tasks in your future projects.
Readings
- Explore file handling in depth
Assignment
- Validations – The code is lacking validations. Try putting in some.
- Feature to list all timesheet entry
- Use RandomAccess concept to search for entries for edit and delete
Project 2 – Timesheet with OOP
Here is a code reference of the time sheet application using object oriented approach. Study the code, identify potential issue and drop a comment with your solution.
The timesheets.txt file is a fixed length format
1 2024-01-01 40 2 2024-01-02 50 FORMAT: EmployeeID: 10 characters 1 space (separator) Date: 10 characters 1 space (separator) Hours: 5 characters This totals to 27 characters The code:
class TimesheetEntry:
def __init__(self, employee_id, date, hours):
self.employee_id = employee_id
self.date = date
self.hours = hours
def __str__(self):
# Format with explicit spacing
return f"{self.employee_id:<10}{self.date} {self.hours}".ljust(50)
@classmethod
def from_string(cls, data_str):
# Attempt to split by spaces first
parts = data_str.strip().split()
# Check if we have the expected 3 parts; if not, handle the special case
if len(parts) == 3:
return cls(parts[0], parts[1], parts[2])
elif len(parts) == 2:
# Assuming the second part is a concatenation of date and hours
employee_id = parts[0]
date_hours = parts[1]
# Splitting date and hours based on expected lengths
date = date_hours[:10] # YYYY-MM-DD
hours = date_hours[10:] # Remaining part
return cls(employee_id, date, hours)
else:
raise ValueError(f"Invalid data format: '{data_str}'. Expected format: 'employee_id date hours'")
class TimesheetFileHandler:
def __init__(self, filepath):
self.filepath = filepath
self.record_length = 52 # Fixed length for each record including newline
def add_entry(self, entry):
with open(self.filepath, 'a') as file:
file.write(str(entry) + '\n')
def find_record_positions(self, employee_id=None, date=None):
positions = []
with open(self.filepath, 'r') as file:
position = 0 # Track the starting position of each line
while True:
current_pos = file.tell() # Get the current position before reading
line = file.readline()
if not line:
break # End of file
if employee_id and line.startswith(employee_id) or date and date in line:
positions.append(current_pos)
position += 1
return positions
def read_entry(self, position):
with open(self.filepath, 'r') as file:
file.seek(position)
return TimesheetEntry.from_string(file.readline().strip())
def modify_entry(self, position, new_entry):
# For simplicity, we'll delete the old entry and append the new one
# A more efficient approach would require a fixed record length
self.delete_entry(position)
self.add_entry(new_entry)
def delete_entry(self, position):
with open(self.filepath, 'r+') as file:
file.seek(0)
lines = file.readlines()
file.seek(0)
file.truncate()
for current_position, line in enumerate(lines):
if current_position != position // self.record_length:
file.write(line)
class TimesheetApp:
def __init__(self):
self.file_handler = TimesheetFileHandler('timesheets.txt')
def add_entry_ui(self):
employee_id = input("Enter Employee ID: ")
date = input("Enter Date (YYYY-MM-DD): ")
hours = input("Enter Hours Worked: ")
entry = TimesheetEntry(employee_id, date, hours)
self.file_handler.add_entry(entry)
print("Entry added successfully.")
def modify_entry_ui(self):
employee_id = input("Enter Employee ID to modify: ")
date = input("Enter Date of entry to modify (YYYY-MM-DD): ")
positions = self.file_handler.find_record_positions(employee_id, date)
if positions:
hours = input("Enter new Hours Worked: ")
new_entry = TimesheetEntry(employee_id, date, hours)
for position in positions:
self.file_handler.modify_entry(position, new_entry)
print("Entry modified successfully.")
else:
print("Entry not found.")
def delete_entry_ui(self):
employee_id = input("Enter Employee ID to delete: ")
positions = self.file_handler.find_record_positions(employee_id=employee_id)
if positions:
for position in positions:
self.file_handler.delete_entry(position)
print("Entry deleted successfully.")
else:
print("Entry not found.")
def generate_report_ui(self):
choice = input("Report by (Employee ID/Date): ").lower()
if choice == "employee id":
employee_id = input("Enter Employee ID: ")
positions = self.file_handler.find_record_positions(employee_id=employee_id)
elif choice == "date":
date = input("Enter Date (YYYY-MM-DD): ")
positions = self.file_handler.find_record_positions(date=date)
else:
print("Invalid choice.")
return
if positions:
print("\nTimesheet Entries:")
for position in positions:
entry = self.file_handler.read_entry(position)
print(entry)
else:
print("No entries found.")
def run(self):
actions = {'1': self.add_entry_ui, '2': self.modify_entry_ui, '3': self.delete_entry_ui, '4': self.generate_report_ui}
while True:
print("\nEmployee Timesheet Application")
print("1. Add Entry")
print("2. Modify Entry")
print("3. Delete Entry")
print("4. Generate Report")
print("5. Exit")
choice = input("Enter your choice: ")
if choice == '5':
print("Exiting application.")
break
action = actions.get(choice)
if action:
action()
else:
print("Invalid choice. Please enter a number between 1 and 5.")
def main():
app = TimesheetApp()
app.run()
if __name__ == "__main__":
main()
The OOPS code also uses Random file access concept as well.