extract x y coordinates from image in pdf python
import sys import pyPdf def extract(in_file, coords, out_file): with open(in_file, 'rb') as infp: reader = pyPdf.PdfFileReader(infp) page = reader.getPage(0) writer = pyPdf.PdfFileWriter() page.mediaBox.lowerLeft = coords[:2] page.mediaBox.upperRight = coords[2:] # you could do the same for page.trimBox and page.cropBox writer.addPage(page) with open(out_file, 'wb') as outfp: writer.write(outfp) if __name__ == '__main__': in_file = sys.argv[1] coords = [int(i) for i in sys.argv[2:6]] out_file = sys.argv[6] extract(in_file, coords, out_file)
Here is what the above code is Doing:
1. Open the input file for reading.
2. Create a PdfFileReader object.
3. Get the first page of the PDF.
4. Create a PdfFileWriter object.
5. Set the lower left and upper right coordinates of the page’s media box.
6. Add the page to the writer.
7. Open the output file for writing.
8. Write the writer to the output file.