Tag Archive: opencv


Drone on!

Updates! So, among other things, I’m staying back on the BIRD project till December (The folks at UPenn graciously accepted my request to defer my admission by a semester) to do more fancy ML magic with quadrotors! :D (I promise loads of videos) The RISS program ended meanwhile, and I presented my very second academic research poster!

Among the things I’ve been working on this past July and August are working with the ARDrone 2.0, a very nifty off the shelf $300 quadrotor that can be controlled over WiFi. The first thing I set upon was reimplementing the ARDrone ROS driver for the second iteration of this drone, as the ardrone_brown ROS package wasn’t compatible with the 2.0. So started my arduous struggle of trying the 2.0 SDK to compile as a C++ ROS application, what with cascading make files and hundreds of preprocessor directives and undocumented code. Fortunately, with Andreas’ involvement in the project, we could make some sense of the SDK and got it to compile as a ROS driver. It turned out to be a complete rewrite and we managed to expose some additional functionalities of the SDK, namely a mode change for stable hover where the drone uses its downward facing camera to perform optical flow based stabilization amd publishing the IMU navdata at the native frequency (~200 Hz). To demonstrate the former we formulated a novel test we like to call The Kick Test

The first thing I attempted to implement was to get the ARDrone to follow trajectories as sets of waypoints. The major problem in implementing that was that the drone has no way to localise itself – the only feedback that we have are the IMU’s reading, and hence the only way to track movement was by integration, which meant unavoidable drift errors. An alternative could have been using the downward facing camera to perform optical flow, but since we don’t have access to the onboard board, the inherent latencies involved in streaming the video to the base station and sending modified controls back would have been prohibitively expensive. Fortunately, the planner that sends the trajectories replans after every delta movement and so as long as the drone doesn’t drift too much from the intended path, the error is accomodated in the next plan. Towards this end I implemented a PD controller that listened to tf data spewed by a tracker node. The tracker node listens to the navdata stream and continuously updates the drone’s frame’s pose wrt the origin frame. This is required since the command velocities sent to the drone are always relative to the drone’s frame of reference. After a few days of endless tweaking of the gains I finally managed to get reasonable waypoint following behaviour. You can get a short glance at the tracker tf data in RViz in this video


Here the origin is the starting location of the drone, drone_int is the current integrated location of the drone based on the IMU readings and lookahead distance is the location at the specified look ahead distance in the direction of the waypoint

However it was pretty jerky; whenever the drone reached a waypoint, it would suddenly speed up due to the significant change in error. Jerks are, of course, very bad since each jerk introduces a significant error peak as we integrate the velocities to obtain position. So, to mitigate this issue I implemented Drew’s suggested ‘dangling carrot’ approach – with the error term being restricted to a maximum look ahead distance along the direction to the closest waypoint. This finally resulted in the drone following trajectories pretty smoothly.

The next thing I’m working on now is to provide a more intuitive sense of control inputs to the classifier. The intuition being that a bird always heads to the most open region in an image, rather than avoiding oncoming obstacles by rolling right or left. Towards this end I implemented a new controller that allows the drone to be controlled via a mouse. The idea behind the visual controller is to get the drone to move along rays in the image – thus mouse clicks correspond to a pair of yaw and pitch angles. Since the drone can’t move forward while pitching upwards (it would then just move backwards), my controller ramps the altitude with the forward velocity proportional to the tan of the pitch angle. The next step is to now interface this new controller with the classifier and see what kind of results I obtain and possibly get a paper out (yay!).

I also beefed up the wxPython GUI that I worked on earlier the Summer to dynamically parse parameters from launch files (and launch files included within the launch files) and present all the parameters in one unified dialog to modify and save or load as required. This makes life much easier since a lot of parameters often need to be changed manually in the launch files during flight tests.

A sample screenshot of the parameters loaded from a launch file. The user can only launch a node if the parameters are loaded.

Exciting times ahead!

RI, CMU (And so should you!)

A lot’s happened over this month. I graduated (yay!), apparently, but I don’t get the degree until about 5 years. ( Shakes fist at DU. :-/ ) and arrived at the RI in the first week of June. Also, I couldn’t attend the Jed-i finals, but found a good Samaritan to present it in my stead. So here’s to you Sanjith, for helping me at least present my (reasonably incomplete) project at the competition! There was also a little heartburn – I got selected to the national interview of the NS Scholarship for $30K, but they were adamant that I was to be present in person at Mumbai on the 15th. No amount of talking or convincing helped there, and so I had to let it go. Sad.

Anyway, so out here the first couple of weeks and a half I dabbled in the untidy art of making a comprehensive wxPython GUI for starting all the ROS nodes that need to be fired while testing the MAV. Along the way, I figured out a lot of Python, and threading issues. One of the most annoying issue in implementing wxPython is that it requires all changes to the UI members to be done from its own (main) thread. So, for instance in my ROS image callbacks, I could not simply lock the running threads and assign the buffer data of the incoming openCV image to the staticBitmap – doing so let monsters run through the code, what with xlib errors and random crashes. After a lot of searching, I finally realised that ALL members should be driven ONLY through events, since the event methods are always called by wxPython’s main UI thread. Another interesting tidbit was to callAfter() the event, so that the event fired only after the subscriber thread’s control over the method left. It was also fun working with the subcommand process to initialize the ROS node binaries and using Python’s awesomeness to lookup the binary files in the ROS package path and generate dynamic drop down lists from it.

A first (and last) look at the GUI. The toggle buttons that are Red signify that the associated ROS node could not be fired successfully, and the Green one are those that could. The code keeps a track of the processes, and so even if the node is killed externally, it is reflected in the GUI.

The next thing I worked on over the past few days was trying to determine the Most Likely Estimate of the distribution of trees as a Poisson process. So the rationale behind it, as I understand it, is to have a better sense of planning paths beyond the currently visible trees in the 2D image obtained by the MAV.

The dataset provided is of the Flag Staff Hill at CMU and is a static pointcloud of roughly 2.5 million points. The intersection of the tree trunks with the parallel plane has been highlighted

So I had a dense pointcloud of Flag Staff hill that I used as my dataset, and after a simple ground plane estimation, I used a parallel plane 1.5 meters above this ground plane to segment out tree trunks, whose arcs (remember that the rangefinder can’t look around the tree) I then used in a simple contour detection process to produce a scatter plot of the trees. I then took this data and used approximate nearest neighbours to determine the distance to the farthest tree in a three tree neighbourhood. I then used the distance to characterize the area associated with each tree and its neighbours and used it for binning to determine the MLE. The MLE was then simply the arithmetic mean of these binned areas. I also wrote up a nice little report in Latex on this.

A plot of the recognized trees after the contour detection. Interestingly, notice the existence of tres in clumps. It was only later that I realized that the trees in the dataset were ghosted, with another tree offset by a small distance.

Not surprisingly, I’ve grown to like python a LOT! I could barely enjoy writing code in C++ for PCL after working on Python for just half a month. Possible convert? Maybe. I’ll just have to work more to find out its issues. /begin{rant} Also, for one of the best departments in CS in the US, the SCS admin is really sluggish in granting access. /end{rant}.

Picture n’ Pose

So, as I mentioned in my previous post, I am working on recreating a 3D photorealistic environment by mapping image data onto my pointcloud in realtime. Now, the camera and the laser rangefinder can be considered as two separate frames of reference looking at the same point in the world. After calibration, i.e. after finding the rotational and translational relationship (a matrix) between the two frames, I can express (map) any point in one frame to the other. So, here I first project the 3D point cloud points onto the camera’s frame of reference. Then, selecting only the indices which fall within the range of the image (because the laser rangefinder has a much wider field of view than the camera. The region that the camera captures is a subset of the region acquired by the rangefinder. So, I’ll get corresponding indices, say from negative coordinates like (-200,-300) to (800,600) for a 640×480 image) Hence, I only need to colour the 3D points which lie within the (0,0) and (640,480) indices using the RGB values at the corresponding image pixel. The resultant XYZRGB pointcloud is then published, which is what you see here. Obviously, since the spatial resolution of the laser rangefinder is much less than the camera’s, the resulting output is not as dense, and requires interpolation, which is what I am working on right now.

Head on to the images for a more detailed description. They’re fun!

Screenshot of the Calibration Process

So here, first I create a range image from the point cloud, and display it in a HighGUI window for the user to select corner points. As I already know how I have projected the 3D laser data onto the 2D image, I can remap from the clicked coordinates to the actual 3D point in the laser rangefinder data. The corresponding points are selected on the image similarly, and the code then moves onto the next image with detected chessboard corners, till a specified number of successful grabs are accomplished.

Screenshot of the result of extrinsic calibration

Here's a screenshot of Rviz showing the cloud in action. Notice the slight bleed along the left edge of the checkerboard. That's because of issues in selection of corner points in the range image. Hopefully in the real world calibration, we might be able to use glossy print checkerboard so that the rays bounce off the black squares, giving us nice holes in the range image to select. Another interesting thing to note is the 'projection' of the board on the ground. That's because the checkerboard is actually occluding the region on the ground behind it, and so the code faithfully copies whatever value it finds on the image corresponding to the 3D coordinate.

View after VGL

So, after zooming out, everything I mentioned becomes clearer. What this does is that it effectively increases the perceived resolution. The projection bit on the ground is also very apparent here.

Range find(h)er!

So the past few weeks I’ve been working on this really interesting project for creating a 3D mobile car environment in real time for teleop (and later autonomous movement). It starts with calibrating a 3D laser rangefinder’s data and a monocular camera’s feed which allows me, after the calibration is done, to map each coordinate in my image to a 3D point in the world coordinates (within the range of my rangefinder, of course). So, any obstacles coming up in the real world environs can then be rendered accurately in my immersive virtual environment. Now since everything’s in 3D the operator’s point of view can be moved around to any required orientation. So, for instance, while parking all that is needed to be done is to shift to an overhead view. In addition, since tele op is relatively laggy, the presence of this rendered environment gives the op a fair idea of the immediate environment, and the ability to continue along projected trajectories rather than stopping for resumption of connection.

So, as the SICK nodder was taking some time to arrive, I decide to play around Gazebo, and simulate the camera, 3D Laser rangefinder and the checkerboard pattern for calibration within it, and thus attempt to calibrate the camera-rangefinder pair from within the simulation. In doing so, I was finally forced to shift to ubuntu (Shakes fist at lazy programmers) which, although smoother, isn’t entirely as bug free as it is made out to be. So, I’ve created models for the camera and rangefinder and implemented gazebo dynamic plugins within them to simulate them. I’ve also spawned a checkerboard textured box, which using a keyboard manipulated node I move around the environment. So this is what it looks like right now

Screenshot of w.i.p.

So here, the environment is in the Gazebo window at the bottom, where I've (crudely) annotated the Checkerboard, camera and Laser Rangefinder. There are other bodies to provide a more interesting scan. The point cloud of the scan can be seen in the rviz window in the center, and the camera image at the right. Note the displacement between the two. The KeyboardOp node is running in the background at the left, listening for keystrokes, and a sample HighGui window displaying the detected Chessboard Corners at the top Left.

Looks optimistic!

Moving on to GNU/Linux

As had been envisioned much earlier, the move to Linux has started. As expected, it looked intimidating at first, but patient reading of the build docs and fixing the network back in the lab (read always-on internet) has allowed me to get the latest version of OpenCV (2.2) and build it. It required quite a few development packages, and although I also downloaded IPP and TBB, which seem to be good things (i.e., they’ll help run the code faster) I don’t know as yet if they would work on the BeagleBoard as and when it arrives. Anyway, the first priority being running the existing OpenCV2.0 code, I had to go about making the OpenCV build.

A few things that were learnt in the process:

  1. Fedora (yes, I prefer Fedora to Ubuntu – it’s more cutting edge, imho) does not come with gcc installed. I precluded its not being there in the vanilla build.
  2. gcc-c++ has to be installed as well. Even after installing gcc, cmake kept crying over some missing CXX_Compiler. A quick google search got me the solution to my quandary. Makes me wonder why they aren’t in the dependency list for cmake anyway.
  3. Having completed the build, I happily set about trying to compile the sample code using gcc. After repeated failures I realized that the apostrophes used in the configuring arguments are tilted (`) Whoa. Never seen anything like that earlier.
  4. I also realized that I had installed the repository version of OpenCV, which was still configured in cflags and clib. Erased it, only to realize that my PKG_CONFIG_PATH paths I added using in the 2.2 install process didn’t work.  Went about rebuilding the OpenCV code and performing all the OpenCV installation all over again. No dice. Haw.
  5. Figured out that the install path is at /usr/local/lib. So, that’s where to point PKG_CONFIG_PATH.
  6. Another annoying thing. After closing the terminal window, pkg-config’s PATH reverted to its previous value. Fixed it by copying the generated opencv.pc file in /usr/local/lib to /usr/lib64/pkgconfig along with the other default .pc files. Always loaded. Yay.