File Directory Acquisition Tool
August 22, 2024About 1 min
Directory:
- Generate only a list of format-specific files in the specified directory
- Combine with other functions to generate a list of format-specific files in the specified directory and leave the output file with its original file structure
gen_folder_list
Generates a list of all files of the specified type in the folder.
Parameters
| Parameter | Type | Default | Description | 
|---|---|---|---|
| path | str | Required | Path to folder to process | 
| mode | str | mandatory | File types to look for, optional: 'pdf','img','md' | 
| recursive | bool | False | Whether to recursively search subdirectories | 
Exceptions
| Exception | Description | 
|---|---|
| ValueError | if modeis not'pdf','img'or'md' | 
| Type | Description
| type | description | 
|---|---|
| list | List of full paths to files | 
Example code
files = gen_folder_list("/path/to/folder", "pdf", True)
print(files)Note
- This function filters files of the specified type according to the modeparameter.
- If recursiveisTrue, files in subdirectories are searched recursively.
get_files
Generates a list of files in a folder, keeping the structure of the files the same before and after processing.
Parameters
Warning
Note that the out parameter must be associated with a conversion function (e.g. Doc2X PDF conversion function/Doc2X image conversion function) in output_format consistent!
| Parameters | Type | Default Value | Description | 
|---|---|---|---|
| path | str | Mandatory | Path to folder to process | 
| mode | str | Required | File type to process, pdforimg | 
| out | str | Mandatory | Type of file to output, 'md','md_dollar','latex','docx'or'pdf'(when used in a RAG) | 
Return value
Returns a tuple (list1, list2) containing two lists:
- list1(- list): list of full paths- Elements are full paths to files (strings).
 
- list2(- list): list of relative paths.- The element is the relative path to the file (string).
 
Note
- list1and- list2are the same length.
- For inputandoutput_format, these path lists can be used