Start Chat
Search
Ithy Logo

Adjusting Excel Data Queries to Exclude Subfolders

A Comprehensive Guide to Filtering Main Folder Files in Power Query

Power+Query+Excel+folder

Key Takeaways

  • Utilize the Folder.Contents Function to load only main folder files, effectively excluding subfolders.
  • Modify the M Code in Power Query's Advanced Editor to ensure precision in data extraction.
  • Apply Specific Filters on the Folder Path to remove any residual subdirectory files from your query results.

Introduction

When managing data in Excel, especially from shared sources like SharePoint, it’s crucial to streamline your data queries for efficiency and clarity. One common requirement is to fetch data exclusively from the main folder, excluding any files residing in subfolders. This ensures that your dataset remains focused and manageable, preventing potential confusion or data redundancy.

Step-by-Step Guide to Exclude Subfolders in Excel Power Query

1. Loading Folder Data into Power Query

Begin by loading the folder data into Power Query. This forms the foundation of your data manipulation process.

  1. Navigate to the Data Tab: Open your Excel workbook and go to the Data tab on the ribbon.

  2. Get Data from Folder: Click on Get Data > From File > From Folder.

  3. Select the Main Folder: Browse to the main folder path, such as /sites/controlling/Shared%20Documents/Kniha%20j%C3%ADzd, and click OK.

  4. Open Power Query Editor: Once the data is loaded, the Power Query Editor will launch, displaying a table of all files and subfolders within the selected directory.

2. Filtering Out Subfolders

After loading the data, the next step is to filter out any files that reside within subfolders.

  1. Identify the Folder Path Column: In the Power Query Editor, locate the column named Folder Path. This column contains the full path for each file, including those in subdirectories.

  2. Apply a Path Filter: Click on the filter dropdown in the Folder Path column. Deselect any rows that contain additional folder names beyond the main folder. This can be done by identifying paths that include slashes ("/") beyond the primary directory.

  3. Alternative Filtering Method: If the filter dropdown method is insufficient, add a custom column to facilitate more precise filtering:

    • Add a Custom Column: Go to the Add Column tab and select Custom Column.
    • Enter a Custom Formula: Use a formula like Text.Contains([Folder Path], "subfolder_name") to identify subfolders. Replace "subfolder_name" with the actual subfolder name or pattern relevant to your dataset.
    • Filter Out Subfolder Rows: Exclude rows where the custom column returns TRUE, effectively removing any files from subdirectories.

3. Modifying the M Code in Advanced Editor

For a more robust solution, especially when dealing with dynamic folder structures, modifying the M code directly can offer greater control.

  1. Access the Advanced Editor: In the Power Query Editor, navigate to the Home tab and click on Advanced Editor.

  2. Understand the Default Function: Typically, the default M code uses Folder.Files, which retrieves all files from the specified folder and its subfolders.

  3. Replace with Folder.Contents Function: Modify the M code by replacing Folder.Files with Folder.Contents. This function lists only the items (both files and folders) directly within the specified directory, excluding any nested subfolders.

    
    let
        Source = Folder.Contents("https://maproczech.sharepoint.com/sites/controlling/Shared%20Documents/Kniha%20j%C3%ADzd"),
        #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] <> null))
    in
        #"Filtered Rows"
                

    In this example, the Folder.Contents function replaces Folder.Files, and a subsequent filter is applied to select only files by ensuring the Extension column is not null.

4. Applying Additional Filters

Even after using Folder.Contents, there might be cases where folders themselves appear in the query results. To ensure that only files are displayed:

  1. Filter by File Type: Add a filter to display only rows where the Kind column equals File.

  2. Remove Folders: Alternatively, you can filter out any rows where the Extension column is null, as folders typically do not have extensions.

5. Loading the Final Data into Excel

After applying all necessary filters and ensuring that only main folder files are present:

  1. Close & Load: Click on Close & Load in the Power Query Editor to import the filtered data into your Excel workbook.

  2. Verify the Data: Ensure that the imported data only contains files from the main folder and excludes any subfolder contents.


Understanding Power Query Functions

Folder.Files vs. Folder.Contents

Power Query offers two primary functions for accessing folder data:

Function Description Use Case
Folder.Files Retrieves all files from the specified folder and its subfolders. When you need a comprehensive list of all files within a directory hierarchy.
Folder.Contents Lists only the items (files and folders) directly within the specified folder, excluding any nested subfolders' contents. When you intend to work exclusively with files in the main folder without delving into subdirectories.

Choosing the appropriate function is pivotal in controlling the scope of your data query. For scenarios where subfolder data should be excluded, Folder.Contents is the preferred choice.

Custom Filtering Techniques

Beyond the basic filtering methods, custom filtering allows for more granular control over your data query:

  • Using Custom Columns: Create custom columns that flags subfolder files based on specific patterns or criteria in the Folder Path.
  • Advanced Text Functions: Utilize functions like Text.Contains or Text.StartsWith to identify and exclude unwanted file paths.
  • Conditional Filtering: Implement conditional statements to dynamically filter files based on multiple conditions simultaneously.

Implementing the Solution: Example Workflow

Let’s walk through an example workflow to solidify the concepts discussed:

Scenario

You have a SharePoint folder located at /sites/controlling/Shared%20Documents/Kniha%20j%C3%ADzd containing various files and multiple subfolders. Your objective is to load only the files from the main folder into Excel using Power Query.

Step 1: Load Data Using Folder.Contents

  1. Navigate to the Data tab and select Get Data > From File > From Folder.

  2. Enter the main folder path: /sites/controlling/Shared%20Documents/Kniha%20j%C3%ADzd and click OK.

  3. In the Power Query Editor, click on Advanced Editor and replace Folder.Files with Folder.Contents as follows:

    
    let
        Source = Folder.Contents("https://maproczech.sharepoint.com/sites/controlling/Shared%20Documents/Kniha%20j%C3%ADzd"),
        #"Filtered Files" = Table.SelectRows(Source, each ([Extension] <> null))
    in
        #"Filtered Files"
                
  4. This modification ensures that only items directly within the main folder are listed.

Step 2: Apply Filters to Exclude Folders

  1. With the data loaded, apply a filter on the Extension column to exclude any rows where the extension is null. This typically removes folder entries.

  2. Ensure that the filtered data now displays only files from the main folder.

Step 3: Load the Data into Excel

  1. After verifying the filtered data, click Close & Load to import the data into your Excel worksheet.

  2. Your Excel workbook now contains a list of files solely from the main folder, devoid of any subfolder contents.


Advanced Techniques and Best Practices

Dynamic Folder Path Handling

For users dealing with multiple folders or frequently changing directory structures, dynamic handling of folder paths can enhance the flexibility of your data queries.

  • Parameterization: Use Power Query parameters to define folder paths, allowing easy updates without modifying the M code directly.
  • Relative Paths: Implement relative paths to make your queries more portable across different environments or shareable across teams.

Error Handling and Data Validation

Ensuring the integrity of your data queries involves implementing robust error handling and validation mechanisms.

  • Check for Empty Folders: Incorporate checks to handle scenarios where the main folder might be empty or missing expected files.
  • Validate File Types: Beyond filtering by extension, validate that the files meet specific criteria (e.g., file size, modification date) relevant to your analysis.

Optimizing Performance

Large datasets can slow down your Excel workbook. Optimize your Power Query steps to enhance performance:

  • Minimize Steps: Consolidate multiple filtering steps into a single operation where possible.
  • Disable Background Refresh: Prevent Power Query from refreshing data in the background to reduce computational load.
  • Load to Data Model: For complex analyses, consider loading data directly to the data model instead of individual worksheets.

Troubleshooting Common Issues

Files Still Showing from Subfolders

If after applying the above steps, files from subfolders still appear in your query results:

  • Re-verify Folder Path Filters: Ensure that your folder path filters accurately exclude subdirectories by checking for additional slashes or folder names.
  • Inspect Custom Column Logic: If using custom columns for filtering, double-check the logic to ensure it correctly identifies and excludes subfolders.
  • Confirm Function Replacement: Ensure that Folder.Files has been successfully replaced with Folder.Contents in the Advanced Editor.

Permissions and Access Issues

Accessing folders on SharePoint may sometimes be restricted due to permissions:

  • Verify Access Rights: Ensure that your user account has sufficient permissions to access the main folder and its contents.
  • Check SharePoint Settings: Administrators can review SharePoint settings to confirm that data access policies are correctly configured.

M Code Errors

Errors in the M code can prevent successful data loading:

  • Syntax Errors: Carefully review the M code for any syntax mistakes, such as missing commas or incorrect function names.
  • Path Accuracy: Double-check that the folder paths specified in the M code correctly point to the desired directory.

Conclusion

Effectively managing data queries in Excel, particularly when interfacing with SharePoint folders, is essential for maintaining streamlined and focused datasets. By leveraging Power Query's Folder.Contents function, modifying M code, and applying precise filtering techniques, users can effortlessly exclude subfolders from their data queries. This not only enhances data clarity but also optimizes performance, ensuring that your Excel workbooks remain efficient and easy to navigate.

Implementing these strategies requires a combination of understanding Power Query functions, meticulous filtering, and proactive troubleshooting. With these tools at your disposal, managing large and complex folder structures becomes a manageable task, empowering you to harness the full potential of Excel for your data analysis needs.


References


Last updated January 20, 2025
Ask Ithy AI
Download Article
Delete Article