The amount of innovation and new idea generation going on over at DIYBookscanner.org is just phenomenal, and we’re really starting to see some great development efforts, too. After a recent meetup with mathemechanical maker-genius Rob, I became a little obsessed with the idea of generating a 3-D depth map for dewarping images from scanned books.
In part, this was the goal of the Kinect hacking I’ve been doing, but for the moment, I’ve sidelined that effort to try out a bunch of other, simpler, cheaper approaches. I’m going to post some of them here to get them down on record and keep them from getting lost in the book scanner forum swell.
Although the forum post contains non-DIY Book Scanner methods, this post will only cover a few new things that we’ve developed in the forum, or that I’ve come up with myself. It’s not a complete list by any means. See the forum post for that.
Feel free to comment with new ideas or better resources.
1. Look at the lines of text or borders of images on a page and extract the page curvature from them.
Apps that do this:
Drawbacks: Not all books have clean lines to follow, and not all pages in all books have clean lines to follow. Not all lines of text are in the order you expect. Can’t work for concrete poetry or pages of drawings. However, this method does work well, when it works. Getting better all the time.
2. Using the Kinect for direct depth sensing of the book surface.
Apps that do this: Not exactly an app, but the libfreenect/OpenKinect driver gives the depth image.
Rob proposed the idea here and I got the first few depth images of books here — there’s a long way to go on this project and we could use a little help to see if the data straight from the device are worthwhile. It may also be possible to get a close-range PrimeSensor. I will be contacting PrimeSense to feel out the possibilities.
Drawbacks: Right now, the Kinect’s resolution is spread across a living-room size space. We’d like it spread across a few inches. I’m working on this.
3. Using Sharp sensors for extracting the curvature at several lines on a page.
Spamsickle proposed this here and though at first, I didn’t like the idea, after discussing it more with Spam and Rob, I have come to really like it, it is simple, efficient, and might work (if the Sharp sensors weren’t so awfully noisy/messy). I have the Sharp sensors laying around in a box and just need to build a rig for testing. The idea right now is to have a rod extending over the book with two of these sensors. By sweeping them across the surface of the book, you’d get the distance exactly.
Drawbacks: These Sharp sensors are noisy and they would need to be mechanically moved across the page to work.
4. Using a laser line to get a reliable line to follow for dewarping.
A laser pointer or diode can easily be made into a laser line by using a cylinder lens to expand the beam. The laser line, when projected on the book surface, distorts according to the page curvature. Using this laser line, we should be able to make a good guess at the 3D structure of the page and do dewarping. Or perhaps we could make a modified version of Scan Tailor that searches for bright lines. In any case, it is a promising area of research suggested by many including Rob, myself, and Vitorio.
I decided to try this out this morning (got up at 1AM, couldn’t sleep!) and the results looked very promising.
I didn’t have any cylinder lenses laying around (aaghhh!!!), so what I did was took a piece of “turning film” from the back of a cellphone display and put it in front of the laser pointer.
Laser pointer by itself:
Laser pointer plus turning film.
Then, I pointed the laser, from the side, toward the book. From straight down, obviously the laser beam will appear straight. However, if we project it from the side, we get something like this (actually this is two photographs of two projections superimposed on each other):
Laser image by itself (it’s noisy because I used the wrong camera settings but didn’t care to take the image a second time)
Image of the book:
Laser beams superimposed on book:
OK, the laser beam is not perfect because of the nature of turning film. A brighter laser with a better lens would give much better results. If you had two lasers, you could take just two shots — a laser beam shot, and a normal shot. Using the info from the two, you could obviously dewarp the page. I think this method is a winner. Cheap, handy, uses a single camera and a handful of solid state parts. Books which can lay flat are easy targets — not so sure about books in a cradle (that’s up next).
Drawbacks: Requires two lasers in a fixed position. Requires at least two photographs per page.
5. Using depth-from-defocus.
This technique is a bit subtle. Essentially it makes the assumption that what is in focus in a picture with shallow DoF is all in one plane. By shifting the the focus through a scene, the depth of each object can be recovered by watching for high frequency information. Unfortunately this method suffers for compact cameras because they do not have shallow DoF, and it fails in general because not all book pages contain high frequency content. An additional problem is that it requires many photographs of a page to work. EVEN SO, I was very, very excited to see Coded aperture imaging is explained here. I am building a coded aperture camera for other reasons, but honestly expect the depth resolution to be too coarse for book scanning. Among the many other drawbacks, that’s the big one.
7. Using RGB lighting to get the curvature of the book.
This is an idea I had just a week or so ago. If you mix a red, green, and blue light, you get white. White light is nice for scanning books, so we’re already +1. Now, if you put your lights at different points in space, when you interrupt them, you will get colored shadows. In this way, you can make colored shadows that reflect the shape of the book edge, and also identify the orientation of the lighting relative to the book. I think pictures show this idea best, so I mocked it up in Maya:
Drawbacks: Need RGB lights that are reasonably collimated to cast a sharp shadow. Setup would likely be physically large.
8. Difference-based lighting. Use light control to get better depth information from photographs.
Humans use the direction of light as a cue to depth. Most of our scanning rigs have two or more lights. There’s no reason we can’t use these lights in a smarter way to get better depth information. In particular, I’m thinking of Blender’s page splitter idea. The same idea has been proposed under numerous guises before, but I think it would work a lot better if we made better use of the lights.
So imagine that we have two lights.
Turn the left one on.
Then turn the right one on.
Now take the difference between the two — the page edges are clearly highlighted:
Now, you can make a virtual third light. Add the left and right images:
Looks pretty good!
Now you can play all kinds of games. Add the difference of each back to the original image, or something – edges and the center become highlighted.
Screwing around with contrast and stuff can get you even better data:
etc etc. The nice thing is that these are all easy to control (it’s easy to switch lights on and off) it’s only two shots per capture, and the image math is all dead-simple to start with, just addition and subtraction.
Drawbacks… hard to say! I think there are some exciting possibilities here… the combination of computation + cameras is unbeatable for this kind of task.