20 min read

Counting Lines in Files Across Folders and Subfolders with Python


When working with large projects or datasets, it can be useful to analyze the contents of multiple files scattered across various directories and subdirectories. For instance, you might want to quickly count how many lines of code or text are present in all files within a folder, including its nested subfolders. This task can be automated using a simple Python script that recursively traverses the folder structure, counts the lines in each file, and outputs the results.

In this article, we will guide you through the process of creating a Python program that counts the number of lines in each file across all folders and subfolders of a given directory. We will use Python’s os module for directory traversal and built-in file handling capabilities to read and count lines efficiently.

Prerequisites

Before diving into the code, it’s important to note that you need to have Python installed on your machine. If you’re new to Python, make sure to install the latest version from python.org.

Python Modules Used:

  1. os: For directory traversal and file path operations.
  2. open(): To open and read files.
  3. sum(): To count lines in files efficiently.

Understanding the Problem

The task is to count the number of lines in each file, including all nested subfolders, within a given root folder. We also want to:

  • Display the folder structure as we process it.
  • Count and report the total number of lines in each folder.
  • Summarize the total number of lines across all files after processing is complete.

Plan of Action:

  1. Directory Traversal: We’ll use Python’s os.walk() function to traverse the folder structure. This function iterates through all directories, subdirectories, and files starting from a root directory.
  2. Counting Lines in Files: We’ll use the open() function to read each file and count its lines.
  3. Reporting: The program will print the number of lines for each file, the total for each folder, and a final summary for all folders.

Code Implementation

Here’s the Python code that accomplishes the task:

import os

def count_lines_in_file(file_path):
    """Count the number of lines in a single file."""
    with open(file_path, 'r', errors='ignore') as file:
        return sum(1 for line in file)

def count_lines_in_folder(folder_path):
    """Recursively count the number of lines in each file in a folder (and its subfolders)."""
    total_lines = 0
    file_count = 0
    
    # Traverse all folders and files in the given folder path
    for root, dirs, files in os.walk(folder_path):
        print(f'\nProcessing folder: {root}')
        folder_lines = 0  # To keep track of lines in this particular folder
        
        # Iterate through each file in the current directory
        for file in files:
            file_path = os.path.join(root, file)
            try:
                # Count lines in the file
                lines = count_lines_in_file(file_path)
                print(f'{file_path}: {lines} lines')
                folder_lines += lines
                file_count += 1
            except Exception as e:
                print(f"Error reading {file_path}: {e}")

        # Display folder-wise total lines count
        if folder_lines > 0:
            print(f'Total lines in {root}: {folder_lines} lines')

        total_lines += folder_lines

    # Print the overall total lines count in all files
    print(f"\nTotal lines across all files: {total_lines} lines in {file_count} files.")

# Replace 'your_folder_path' with the actual folder path you want to analyze
count_lines_in_folder('your_folder_path')

How the Code Works

1. count_lines_in_file() function:

This function opens a file in read mode and counts how many lines are present using Python’s built-in sum() function. It iterates over each line in the file and counts them.

def count_lines_in_file(file_path):
    with open(file_path, 'r', errors='ignore') as file:
        return sum(1 for line in file)
  • errors='ignore': This argument ensures that the program doesn’t crash if a file contains non-standard characters or encoding issues.

2. count_lines_in_folder() function:

This function recursively traverses the directory structure using os.walk(), which yields the directory path (root), subdirectories (dirs), and files (files) at each level of the directory.

def count_lines_in_folder(folder_path):
    for root, dirs, files in os.walk(folder_path):
        print(f'\nProcessing folder: {root}')

For each file in a directory, the program calls count_lines_in_file() to get the number of lines and keeps track of the total lines for that folder. After processing all files in a folder, it prints the total lines in that folder.

3. Displaying Folder and File Information:

For every file, the program prints its path and the number of lines it contains. After processing all files in a folder, the total number of lines in that folder is displayed. Finally, after processing all folders, the total line count across all files is printed.

print(f'{file_path}: {lines} lines')
print(f'Total lines in {root}: {folder_lines} lines')

4. Final Output:

Once the program has processed all the directories and files, it provides a summary of the total lines across all files and the number of files processed.

print(f"\nTotal lines across all files: {total_lines} lines in {file_count} files.")

Example of Program Output

Let’s consider a folder structure like this:

/my_folder
    /subfolder1
        file1.txt
        file2.txt
    /subfolder2
        file3.txt
    file4.txt

After running the program, the output might look like this:

Processing folder: /my_folder
/my_folder/file4.txt: 23 lines
Total lines in /my_folder: 23 lines

Processing folder: /my_folder/subfolder1
/my_folder/subfolder1/file1.txt: 45 lines
/my_folder/subfolder1/file2.txt: 30 lines
Total lines in /my_folder/subfolder1: 75 lines

Processing folder: /my_folder/subfolder2
/my_folder/subfolder2/file3.txt: 12 lines
Total lines in /my_folder/subfolder2: 12 lines

Total lines across all files: 110 lines in 4 files.

Key Takeaways:

  • Folder Traversal: The program automatically processes all subfolders and files within the provided root folder.
  • Line Counting: The number of lines in each file is counted and displayed.
  • Summary: The total number of lines across all files in the directory structure is summarized at the end.

Conclusion

This Python script is a simple but effective way to analyze and count the lines of code or text across multiple files in a directory and its subdirectories. By leveraging Python’s os module and built-in file handling capabilities, we can efficiently traverse directories and count lines in each file, providing a summary for the entire folder structure.

You can easily customize the script to suit different needs, such as filtering specific file types, handling large files more efficiently, or even saving the output to a file instead of printing it to the console.

Happy coding, and feel free to adapt the code to your use cases!

Support ❤️
If you have enjoyed my content and code, do support me by buying a couple of coffees. This will enable me to dedicate more time to research and create new content. Cheers!
Share this Article
Share this article with your network to help others!
What's your Feedback?
Do let me know your thoughts around this article.