Extracting specific data from a small pdf may be easy. But if it is a large pdf, this will be a hardship for humans. These tasks can be made easy by automating pdf using UiPath. Reading and extracting PDF data using regular expressions in UiPath involves extracting text from a PDF file and searching for patterns that match a specified regular expression. Regular expressions are a powerful tool for identifying and extracting specific patterns from a larger document or dataset.
Steps for Reading and Extracting PDF Data
Step 1. Go to UiPath studio, click Manage Package, and search for UiPath.Pdf.Activities, and install it.
Step 2. Drag and drop Read PDF Text if you need to read a structured pdf.
Note: If your pdf is unstructured, drag and drop Read PDF With OCR.
Step 3. Fill in the Read PDF Text box with the path of your pdf located.
Step 4. Go to Properties, and in the Output, set the variable as string PDFText.
Step 5. Drag and drop the Write Line activity.
Step 6. Fill the Write Line with the output string PDFText of Read PDF Text.
To extract specific data from pdf, you have to make a regular expression in Java script, copy the expression, and follow the steps below:
Step 1. Go to UiPath Studio, drag and drop the Is Match.
Step 2. Click on Configure Regular Expression… Paste the regular expression you copied in the Value box and click Save.
Step 3. Under Properties, type Input as string PDFText. And in Misc, set the variable as isThereAnyMatch in Result.
Step 4. Drag and drop If activity. And type isThereAnyMatch in Condition.
Step 5. Drag and drop Matches activity to the sequence of Then.
Step 6. Under Properties, type Input as string PDFText. And Result will be a list, as there can be more than one result that matches the regular expression. So copy the pattern from Is Match and paste it into the Pattern of Matches.
Step 7. Drag and drop For Each to Then sequence. And finally, drag and drop the Write Line into the Body of For Each and fill it with item.ToString.Split(“:”c)(2).ToString.Trim.
Step 8. Click Run under Debug File.
Now you can see the expected data from the pdf.
Conclusion
Reading and extracting PDF data using regular expressions in UiPath is a powerful way to automate the extraction of specific patterns from a large PDF document. This can save time and reduce errors in data entry, document processing, and financial analysis tasks, increasing efficiency and accuracy.
Consult Metclouds Technologies to get the latest on PDF data processing with UiPath.