#!pip install tabula-py import tabula #read all table data df = tabula.read_pdf("sample.pdf",pages=[1,2]) df[1] #tabula.convert_into("sample.pdf", "sample.csv", output_format="csv")
Here is what the above code is Doing:
1. We’re using the read_pdf function from tabula-py to read the PDF.
2. We’re specifying the pages to read as a list.
3. We’re storing the data in a variable called df.
4. We’re printing the dataframe.
5. We’re using the convert_into function to convert the PDF to a CSV file.