Image Manipulation with magick in R
When I was interviewing electoral services staff for my PhD thesis I was told a story about how their software couldn’t take two scanned pages at once, so they were forced to physically cut and paste two pages together to fit on one side of A4 before they scanned it.
This sounded really annoying, but I couldn’t think of another way around it. I started looking into image processing using R and tried to figure out a way to automate this sort of task.
The first thing I needed to do was make a pdf at least 2 pages long. You can see my interesting pdf here. I have saved a few copies of this.
Now I want to load the magick package for image processing. This package is brilliant, and the linked vignettes are very helpful.
library(magick)
I need to keep a close eye on where I am saving everything. If you want to do the same, I would recommend setting your working directory to wherever your pdfs are saved.
Next I want to get a list of all my pdfs.
pdf_list <- list.files(path = "pdf/", pattern = "pdf")
print(pdf_list)
## [1] "2_pager_1.pdf" "2_pager_2.pdf" "2_pager_3.pdf" "2_pager_4.pdf"
This is exactly how I saved all of my two page pdfs.
See what I mean. Now I needed to remove the pdf extension so I can make a loop work in a second.
pdf_list <- gsub(".pdf", "", pdf_list)
print(pdf_list)
## [1] "2_pager_1" "2_pager_2" "2_pager_3" "2_pager_4"
Now I can work out how to do one, and then make a loop to do the rest after.
The first step is to read in the first pdf.
two_pager <- image_read_pdf("pdf/2_pager_1.pdf")
This gets the pdf opened and ready to manipulate.
I will now split the pdf into two pages using indexing.
page_1 <- two_pager[1]
page_2 <- two_pager[2]
The next step takes a bit of work, and involves using a different software to pinpoint the pixels I want to select. I used Paint because it’s free, which first involved writing this image as a png so I could actually open it in Paint.
image_write(page_2, "for_paint.png", format = "png")
I then opened the image in Paint and looked at the small pixel measurement tool on the bottom of the screen. Don’t worry, you should only need to do this once. I noted the four corners I was interested in grabbing. The pixels were:
Position | L-R | U-D |
---|---|---|
Upper-left | 500 | 500 |
Bottom-left | 500 | 1100 |
Top-right | 1900 | 500 |
Bottom-Right | 1900 | 1100 |
Armed with these, we can grab them from one .pdf using the image crop feature. We need to put the pixels in the format: Pixels wide x pixels high + LR coordinate of top left + UD coordinate of top left. You can play about with this as much as you like to get it exactly right.
cutting<-image_crop(page_2, "1400x600+500+500" )
Then we paste one image on top of the other. Again selecting where we want the box to be pasted.
pasting<-image_composite(page_1, cutting,offset = "+300+862")
Finally we can save this and then check it.
image_write(pasting, "pdf/one_page/final_1.pdf", format = "pdf")
Looks good to me.You can check out the whole thing here. It’s a one page pdf with all the right info on. I’m not worried about adjusting every little pixel to get it in exactly the right place.
Now on to the part that actually makes this worth doing. You can automate it to do this to lots of files in a row. We already made the pdf list, so we just need to iterate the different functions over the list.
for (i in pdf_list){
two_pager <- image_read_pdf(paste0("pdf/",i,".pdf")) # reads in pdf
page_1 <- two_pager[1]
page_2 <- two_pager[2]
cutting <- image_crop(page_2, "1400x600+500+500")
pasting<-image_composite(page_1, cutting,offset = "+300+862")
image_write(pasting, paste0("pdf/one_page/final_",i,".pdf"), format = "pdf")
}
This has now taken all of the two-page .pdfs and made them into single page .pdfs in the same way as we did the first one.
You can see I have the four .pdfs all saved where I wanted them. This only took a few seconds to run, and if every two-page document is the same then this single for loop should be able to convert them all in the same way.