Exploring Sonic Pi: A Software For Live Coding Music

Exploring Sonic Pi: A Software For Live Coding Music

By Leela Li

Self-proclaimed “future of music”, Sonic Pi is an open source software aimed towards aiding musical instructors in engaging students in a different way: coding to make music. This software is pre-installed on Raspbian Jessie, but can also be run on other operating systems.

Having taken a few introductory computer science classes in the past, this concept of programming music – as opposed to the traditional sense of picking up a real instrument – is what pushed me to pick this up as my first Raspberry Pi project.

Raspberry Pi is a small single-board computer that comes with all of the essential functionalities to be a computer. One only needs to insert an SD card, plug in a monitor, mouse, keyboard, and power supply to get it up and running. One of its main purposes is so that the “average Joe” can have something small, powerful and flexible enough to allow for tinkering around, gaining knowledge in our increasingly digital world, and to hopefully contribute to it as well. My goal for its use today was to write a program on Sonic Pi that played “Canon in D”.

Getting started was simple enough. Once the Raspberry Pi was booted up, I opened up Sonic Pi and followed the instructions listed on the Raspberry Pi site. This provided me with the basic knowledge of things such as making sounds with MIDI note numbers and manipulating basic parameters of notes to change the way it sounds.

Now, came the hard part. My prior knowledge in programming did not help replace my lack of knowledge in music.

In Sonic Pi, there is a total of 128 (0 to 127) MIDI note numbers that get divided up by 11 (0 to 10) octaves. Having only taken three months of piano lessons back in the fifth grade, the word, “octaves” is nothing but foreign to me.

Staring at the online sheet music of the most basic version of “Canon”, I worked hard to translate each note to its corresponding MIDI note number. After about an hour or two, the conclusion was too obvious: this was well beyond the capacity of my now, musically untalented, 22-year-old self.

Overall, Sonic Pi is an amazing platform for those interested in creating music through a different medium. Just make sure you don’t make my mistake and remember to brush up on the musical aspect prior to diving in.

If you’re still not convinced, just look below to see what you could do with Sonic Pi!

Example of what one can do with Sonic Pi:

Aerodynamic” by Daft Punk

https://www.youtube.com/watch?v=cydH_JAgSfg

Resources:

Diving in on Image Stitching

As we’ve previously discussed, the Gigamacro works by taking many (many) photos of an object, with slight offsets. All of those photos need to then be combined to give you a big, beautiful gigapixel image. That process is accomplished in two steps.

First, all of the images taken at different heights need to be combined into a single in-focus image per X-Y position. This is done with focus-stacking software, like Zerene Stacker or Helicon. After collapsing these “stacks,” all of the positions need to be stitched together into a single image.

On its surface, this might seem like a pretty simple task. After all, we’ve got a precisely aligned grid, with fixed camera settings. However, there are a number of factors that complicate this.

First off, nothing about this system is “perfect” in an absolute sense. Each lens has slightly different characteristics from side to side and top to bottom. No matter how hard we try, the flashes won’t be positioned in exactly the same place on each side, and likely won’t fire with exactly the same brightness. The object may move ever so slightly due to vibrations from the unit or the building. And, while very precise, the Gigamacro itself may not move precisely the same amount each time. Keep in mind that, at the scale we’re operating at (even with a fairly wide lens), each pixel of the image represents less than a micron. If we were to blindly stitch a grid of images, even a misalignment as small as one micron would be noticeable.

To solve this, the Gigamacro utilizes commercial panorama stitching software – primarily Autopano Giga. Stitching software works by identifying similarities between images, and then calculating the necessary warping and movement to align those images. For those interested in the technical aspects of this process, we recommend reading Automatic Panoramic Image Stitching using Invariant Features by Matthew Brown and David Lowe. In addition to matching photos precisely, these tools are able to blend images so that lighting and color differences are removed, using techniques like seam carving.

While this type of software works well in most cases, there are some limitations. All off-the-shelf stitching software currently on the market is intended for traditional panoramas – a photographer stands in one place and rotates, taking many overlapping photos. This means they assume a single nodal point – the camera doesn’t translate in space. The Gigamacro is doing the opposite – the camera doesn’t rotate, but instead translates over X and Y.

Because the software is assuming camera rotation, it automatically applies different types of distortion to attempt to make the image look “right.” In this case though, right is wrong. In addition, the software assumes we’re holding the camera by hand, and thus that the camera might wobble a bit. In reality, our camera isn’t rotating around the Z axis at all.

Typically, we fool the panorma software by telling it that we were using a very, very (impossibly) long zoom lens when taking the photos. This makes it think that each photo is an impossibly small slice of the panorama, and thus the distoration is imperceptible.

However, Dr. Griffin from our Department of Geography, Environment & Society presented us with some challenging wood core samples. These samples are very long, and very narrow. Even at a relatively high level of zoom, they can fit within a single frame along the Y axis. Essentially, we end with a single, long row of images.

This arrangement presented a challenge to the commercial stitching software. With a single row of images, any incorrect rotation applied to one image will compound in the next image, increasing the error. In addition, the slight distortion from the software attempting to correct what it thinks is spherical distortion means the images end up slightly curved. We were getting results with wild shapes, none of them correlating to reality.

Through more fiddling, and with help from the Gigamacro team, we were able to establish a workflow that mostly solved the problem. By combing Autopano Giga with PTGui, another stitching tool, we were able to dial out the incorrect rotation values and get decently accurate-looking samples. However, the intention with these samples is to use them for very precise measurements, and we were unconvinced that we had removed enough error from the system.

As mentioned earlier, the problem appears, on its face, to be relatively simple. That got us to thinking – could we develop a custom stitching solution for a reduced problem set like this?

The challenging part of this problem is determining the overlap between images. As noted, it’s not exactly the same between images, so some form of pattern recognition is necessary. Fortunately, the open source OpenCV project implements many of the common pattern matching algorithms. Even more fortunately, many other people have implemented traditional image stitching applications using OpenCV, providing a good reference. The result is LinearStitch, a simple python image stitcher designed for a single horizontal row of images created with camera translation.

LinearStitch uses the SIFT algorithm to identify similarities between images, then uses those points to compute X and Y translation to match the images as closely as possible without adding any distortion. You might notice we’re translating on both X and Y. We haven’t yet fully identified why we get a slight (2-3micron) Y axis drift between some images. We’re attempting to identify the cause now.

At this point, LinearStitch isn’t packaged up as a point-and-click install, but if you’re a bit familiar with python and installing dependencies, you should be able to get it running. It uses Python3 and OpenCV3. Many thanks to David Olsen for all his assistance on this project.

Getting the hang of it

We’ve had the Gigamacro and the photogrammetry capture station up and running for about a week now. While both technologies are relatively straightforward in terms of the technology, it’s clear that both benefit from a lot of artistry to get the most out of them.

The Gigamacro is conceptually very straightforward. It’s just a camera that takes many close-up photos of an object, and then combines all those photos into a single high resolution output. The camera uses a fixed lens (rather than a zoom), and we have a variety of lenses with different levels of magnification. Two of the challenging parts of using the Gigamacro are focus and lighting. At very close distances, with the types of lenses we’re using, the “depth of field” is very narrow. Depth of field is something we’re familiar with from traditional photography – when you take a picture of a person, and the background is nicely blurred, that’s due to the depth of field. If everything in the photo is in focus, we say that it has a very wide depth of field.

With the Gigamacro, and using a high magnification lens, the depth of field is typically on the order of a few hundredths of a millimeter. The Gigamacro solves this by taking many photos at different distances from the object. Each of these photos can then be combined (“depth stacked”) into a single photo in which everything is in focus. Even on a surface that appears perfectly flat, we’re finding that a few different heights are necessary. With a more organic object, it’s not unusual to need to capture 40 or 50 different heights. Not only does this greatly increase the number of photos needed (we’re currently working on a butterfly consisting of 24,000 photos) but it increases the post-processing time necessary. All of this means we’re greatly rewarded for carefully positioning and preparing objects to minimize height variation when possible.

Another issue is light. The Gigamacro has two adjustable flashes. These mean there’s plenty of light available. At very close distances though, it’s important not to cast shadows or overexpose areas. We’re starting to get a better understanding of how to adjust the lighting to deliver quality results without losing detail, but we’re still learning. In addition, the package we have includes some filters for cross-polarizing light, which in theory will allow us to capture very shiny surfaces without reflections.

So, what’s it all look like? We’re still working on building a sample gallery, but below is one example. This is a small fish fossil, captured with a 100mm lens. The original object is approximately 3 inches across. In terms of the gigamacro, this is a very low resolution image – only approximately 400 megapixels. We’re finding that this technology is much more impressive when you’ve seen the physical object in person. Only then do you realize the scale of the resolution. We’ll be sharing more samples as we get more experience. Just click to zoom. And zoom. And zoom.

Photogrammetry

Photogrammetry is the other main technology we’re working with at this phase. Our new photogrammetry turntable is working well, and we’re continuing to explore different lighting setups and backdrop options for the space.

Because we’ve got a processing workstation near the photogrammetry station, we’re able to stream photos directly from the camera to the computer. There’s no need to manually transfer files. This, combined with the automation of the turntable, means we can do a basic photogrammetry pass on an object very quickly, then make adjustments and try again.

Below is one of our first finished objects, a small plastic dinosaur. This object is far from perfect – in particular, the tail needs additional cleanup, and there’s some “noise” around the base. This is a combination of two passes, capturing the top and the bottom of the dinosaur. It’s made of 159 individual photos, all processed with Agisoft Photoscan Pro. We’re excited to compare Photoscan with Autodesk Remake, but a recent Windows update broke Remake. Hopefully they’ll get that fixed soon.

First Light

We made good progress in the AISOS space today. The photogrammetry (and later, RTI) camera position is in place. We were lucky to inherit a nice copy stand that was already in the space, which makes camera positioning really easy. We were even able to capture our first object, albeit very roughly (the lightning needs plenty of adjustment).

2016-08-23 14.45.13

2016-08-23 13.58.04

Later in the day, we got the call that our GigaMacro had arrived. Anything that comes in a massive wooden crate is bound to be exciting.

2016-08-23 15.30.17

We don’t have our final work surface for the GigaMacro yet, but we did some initial assembly and testing. Everything seems to be working as expected. It’s definitely going to have a learning curve as we get familiar with all the variables that can be controlled on the GigaMacro. For now, you can watch it wiggle.

All cleaned up

The AISOS space has now been cleaned, painted, and the ethernet jacks are in. Most of our equipment is on-site, with the exception of the Gigamacro which should be arriving next week.

We’re going to begin experimenting with different arrangements for the space. We’ve also got a few more pieces of furniture to move in.

2016-08-19 10.12.45

Getting Rolling

We’re really excited to be moving forward with AISOS. We’re currently in the process of preparing our space, and ordering all of our equipment. We’re hoping to have most of the equipment installed and ready to go by the end of August.

If you’re curious to learn more, send us an email. Otherwise, look for a tour soon!

Building a photogrammetry turntable

As we explained in our photogrammetry introduction, photogrammetry fundamentally involves taking many photos of an object from multiple angles.  This is pretty straightforward, but it can also be time consuming.  A typical process involves a camera on a tripod, being moved slightly between each shot.  An alternative is to put the object on a turntable, so the camera can remain in one place.  This is still pretty fiddly though – moving a turntable slightly, taking a photo, then repeating 30 or 40 times.  It also introduces opportunity for error – the camera settings might be bumped, or the object might fall over.

To us, this sounded like a great opportunity for some automation.  As part of the LATIS Summer Camp 2016, we challenged ourselves to build an automated turntable, which could move a fixed amount, then trigger a camera shutter, repeating until the object had completed a full circle.

Fortunately, we weren’t the first people to have this idea, and we were able to draw upon the work of many other projects.  In particular, we used the hardware design from the MIT Spin Project, along with some of the software and electrical design from Sparkfun’s Autodriver Getting Started Guide.  We put the pieces together and added a bit of custom code and hardware for the camera trigger.

The first step was getting all the parts.  This is our current build sheet, though we’re making some adjustments as we continue to test.

2016-07-12 11.18.31We also had to do some fabrication, using a laser cutter and 3d printer.  Fortunately, here at the University of Minnesota, we can leverage the XYZ Lab.  The equipment there made short work of our acrylic, and our 3d printed gear came out beautifully.

With the parts on hand and the enclosure fabricated, it was mostly just a matter of putting it all together.  We started with a basic electronics breadboard to do some testing and experimentation.

Our cameras use a standard 2.5mm “minijack” (like a headphone jack, but smaller) connector for camera control.  These are very easy to work with – they just have three wires (a ground and two others).  By connecting one wire to ground, you can trigger the focus function.  The other wire will trigger the shutter.  A single basic transistor is all that’s necessary to give our Arduino the ability to control these functions.

The basic wiring for the motor control follows the hookup guide from Sparkfun, especially the wiring diagram towards the end. The only other addition we made as a simple momentary button to start and stop the process.

Once we were confident that the equipment we working, we disassembled the whole thing and put it back together with a solder-on prototyping board and some hardware connectors.  Eventually, we’d like to fabricate a PCB for this, which would be even more robust and compact. The spin project has PCB plans available, though their electronics setup is a little different. If anyone has experience laying out PCBs and wants to work on a project, let us know!

While building the first turntable was pretty time consuming, the next one could be put together in only a few hours.  We’re going to continue tuning the software for this one, to find the ideal settings.  If there are folks on campus who’d like to build their own, just let us know!

 

2016-07-15 12.39.54

2016-07-11 14.41.44

 

 

 

 

 

 

Introduction to Photogrammetry

Whether you’re working on VR, 3D printing, or innovative research, capturing real-world objects in three dimensions is an increasingly common need.  There are a lot of technologies to aid in this process.  When people think of 3D capture, the first thing the often comes to mind is a laser scanner – a laser beam that moves across an object capturing data about the surface.  They look very “hollywood” and impressive.

Another common type of capture is structured-light capture.  In structured light 3d capture, different patterns are projected onto an object (using something like a traditional computer projector).  A camera then looks at how the patterns are stretched or blocked, and calculates the shape of the surface from there.  This is also how the Microsoft Kinect works, though it uses invisible (infrared) patterns.

Both of these approaches can deliver high precision results, and have some particular use cases.  But they require specialized equipment, and often specialized facilities.  There’s another technology that’s much more accessible: photogrammetry.

Calculated Camera Positions

Calculated Camera Positions

In very simple terms, photogrammetry involves taking many photos of an object, from different sides and angles.  Software then uses those photos to reconstruct a 3d representation, including the surface texture.  The term photogrammetry actually encapsulates a wide variety of processes and outputs, but in the 3d space, we’re specifically interested in stereophotogrammetry.  This process involves finding the overlap between photos.  From there, the original camera positions can be calculated.  Once you know the camera positions, you can use triangulation to calculate the position in space of any given point.

The process itself is very compute-intensive, so it needs a powerful computer (or a patient user).  Photogrammetry benefits from very powerful graphics cards, so it’s currently best suited to use on customized PCs.

One of the most exciting parts of photogrammetry is that it doesn’t work only with photos taken in a controlled lighting situation.  You can experiment with performing photogrammetry using a mobile device and the 123d Catch application.  Photogrammetry can even be performed using still frames extracted from video – walking around an object capturing video for example.

For users looking to get better results, we’re going to be writing some guides on optimizing the process. A good quality digital camera with a tripod, and some basic lighting equipment can dramatically improve the results.

Because it uses simple images, photogrammetry is also well suited to covering 3D data from imaging platforms like drones or satellites.

Photogrammetry Software

There are a handful of popular tools for photogrammetry.  One of the oldest and most established tools is PhotoScan from AgiSoft.  PhotoScan is a “power user” tool, which allows for many custom interventions to optimize the photogrammetry process.  It can be pretty intimidating for new users, but in the hands of the right user it’s very powerful.

An easier (and still very powerful) alternative is Autodesk Remake.  Remake doesn’t expose the same level of control that PhotoScan has, but in many cases it can deliver a stellar result without any tweaking.  It also has sophisticated tools for touching up 3d objects after the conversion process.  An additional benefit is that it has the ability to output models for a variety of popular 3d scanners.  Remake is free for educational uses as well.

There are also photogrammetry tools for specialized used cases.  We’ve been experimenting with Pix4D, which is a photogrammetry toolset designed specifically for drone imaging.  Because Pix4D knows about different models of drones, it can automatically correct for camera distortion and the types of drift that are common with drones.  Pix4D also has special drone control software, which can handle the capture side, ensuring that the right number of photos are captured, with the right amount of overlap.

CORS: The Internet’s security “bouncer”

One of the realities of the modern web is that every new technology needs to balance between functionality and security.  In this article, we talk about one particular case of this balance that comes into play and how it may affect working with virtual reality technology on the web.

When building a website, it’s not unusual to embed resources from one website inside another website.  For example, an image on a webpage (loaded via the “img” tag) can point to a JPEG stored on an entirely different server.  Similarly, Javascript files or other resources might come from remote servers.  This introduces some potential for security issues for consumers of web content.  For example, if your website loads a Javascript file from another site, and a hacker is able to modify that file, you’ll be loading potentially malicious code on your website.  Browsers normally address the dangers of mixed-origin content by tightly controlling the ways in which scripts from different servers can talk to other servers on the Internet.  This area of security is called “cross origin protection.”

For example, let’s say we have a webpage, Foo.com.  This webpage loads in our browser, along with an associated Javascript file, Foo.js, which is responsible for loading additional assets and managing interactive content on the page.  This Foo.js file, then, attempts to load some image content from Bar.com and plop it into a <canvas> tag on our webpage.  This all seems innocent enough so far, right?… WRONG!

In fact, this is a major security risk.  For example, imagine Bar.com is a web server that displays scanned documents.  For illustrative purposes, let’s pretend Bar.com is actually “IRS.com”, and contains millions of users’ scanned tax records.  In the scenario above, without any security measures in place, our Foo.js file would be able to reach into Bar.com, grab a secret file, plop it into the page’s <canvas> tag, read out the contents, and store it back to the Foo.com server for further exploitation.  The server administrator at Foo.com would then have access to millions of users’ tax records data that had been maliciously sniped from Bar.com.  It’s easy to see, then, that scripts like Foo.js can quickly become a security risk.  Content that “crosses origins”–that loads from separate places on the Internet–needs to prevented, from co-mingling in potentially malicious ways.

The solution?  Browsers, by default, will block this type of cross-origin content loading altogether!  If your website and script are being loaded from Foo.com, then your browser forces the website to “stick to its lane”.  Foo.com webpages will only be allowed to load other content that comes from Foo.com, and will be blocked from loading–and potentially exploiting–content from Bar.com.

Cross-origin protection and WebVR

This basic situation plays out in many different ways within web technology.  A cross-origin image can’t be read back into an HTML <canvas>, and, importantly for our conversation, can’t be used as a WebGL texture.  WebGL is the technology underlying all of the web-based virtual reality tools like AFrame and ThreeJS.  The specifics of why cross-domain images can’t be used as textures in WebGL are pretty fascinating, but also pretty complicated.

In practice, what this means is that if your virtual reality Javascript is stored on one server, it can’t easily load images or videos stored on another server.  Unfortunately, this could be pretty restrictive in when trying to create WebVR content; even within the University, we often have resources split across many servers.

Cross-Origin Resource Sharing (CORS)

Fortunately, there’s a solution called Cross Origin Resource Sharing.  This is a way to tell your web servers to explicitly opt-in to cross-domain uses of their content.  It allows a webserver like Bar.com to say “I expect to send resources to scripts at Foo.com, so allow those requests to go through and load into the browser.”  It’s basically the Internet equivalent of telling the bouncer at your favorite club to put your buddy on a VIP access list, rather than leaving him standing at the door.  As long as the bouncer…erm, browser…sees that a specific source of data is vouched for, it will allow the requests to go through.

whiteboard drawing of a browser requesting content from bar.com and getting blocked by stick-figure CORS bouncer guy

Doing these CORS checks requires some extra communication between the browser and the server, so occasionally the browser skips CORS checks.  However, when creating VR content in particular, sometimes we want to explicitly ask the browser to perform a CORS check so that the content can be loaded into a secure element like an HTML <canvas> or WebGL texture for VR display.   In this case, the “crossdomain” attribute on HTML elements is necessary.  If we load an image using the HTML code <img src="http://bar.com/image.jpg" crossorigin="anonymous"/> the browser will perform a CORS check before loading the image for the user to view.  Assuming the server hosting the image (in this case, Bar.com) has CORS allowed for the target website (Foo.com), that image will be considered safe for things like loading into an HTML <canvas> or using as a WebGL texture on Foo.com.  In this way, VR websites hosted on one server can continue to load pre-approved resources from other remote servers, as long as those servers provide the necessary “OK” for CORS.

CORS troubleshooting

Even if you’re doing everything right in the browser and on the server, CORS can still provide some headaches.  When you encounter a CORS-related failure, the errors that are generated are often opaque and hard to unpack.  Things like caching within the browser can also make these errors feel sporadic and harder to track down: one minute it may look like your image is suddenly loading (or suddenly not loading) correctly, when in fact what you’re seeing is a previously-cached, older version of the image or associated Javascript that your browser has stored behind the scenes.

Even worse, some browsers have broken or unreliable implementations of CORS.  For example, Safari on Mac OS X cannot currently load videos via CORS, regardless of the browser and server settings.  As a general tip, if you see errors in your browser’s web developer console that mention any sort of security restriction, start by looking into whether you’ve come up against a CORS-related issue.

Detecting Spherical Media Files

In many ways, VR is still a case of a “wild west” as far as technology goes.  There are very few true standards, and those that do exist haven’t been implemented widely.

Recently, we’ve been looking at how to automatically identify spherical (equirectangular) photos and videos so they can be displayed properly in our Elevator digital asset management tool.  “Why is this such a problem in the first place?” you may be wondering.  Well, spherical photos and videos are packaged in a way that they resemble pretty much any other type of photo of video.  At this point, we’re working primarily with images from our Ricoh Theta spherical cameras, which saves photos as .JPG files and videos as .MP4 files.  Our computers recognize these file types as being photo and video files – which they are – but doesn’t have an automatic way of detecting the “special sauce”: the fact that they’re spherical!  You can open up these files in your standard photo/video viewer, but they look a little odd and distorted:

R0010012

So, we clearly need some way of detecting if our photos and videos were shot with a spherical camera.  That way, when we view them, we can automatically plop them into a spherical viewer, which can project our photos and videos into a spherical shape so they can be experienced as they were intended to be experienced!  As it turns out, this gets a bit messy…

Let’s start by looking at spherical photos.  We hypothesized that there must be metadata within the files to identify them as spherical.  The best way to investigate a file in a case like this is with ExifTool, which extracts metadata from nearly every media format.

While there’s lots of metadata in an image file (camera settings, date and time information, etc.), our Ricoh Theta files had some very promising additional items:

Projection Type : equirectangular
Use Panorama Viewer : True
Pose Heading Degrees : 0.0
Pose Pitch Degrees : 5.8
Pose Roll Degrees : 2.8

Additional googling reveals that the UsePanoramaViewer attribute has its origins in Google Streetview’s panoramic metadata extensions.  This is somewhere in the “quasi-standard” category – there’s no standards body that has agreed on this as the way to flag panoramic images, but manufacturers have adopted it.

Video, on the other hand is a little harder to deal with at the moment.  Fortunately, it has the promise of becoming easier in the future.  There’s a “request for comments” with a proposed metadata standard for spherical metadata.  This RFC is specifically focused on storing spherical metadata in web-delivery files (WebM and MP4), using a special identifier (a “UUID”) and some XML.

Right now, reading that metadata is pretty problematic.  None of the common video tools can display it.  However, open source projects are moving quickly to adopt it, and Google is already leveraging this metadata with files uploaded to YouTube.  In the case of the Ricoh cameras we use, their desktop video conversion tool has recently been updated to incorporate this type of metadata as well.

One of the most exciting parts of working in VR right now is that the landscape is changing on a week-by-week basis.  Problems are being solved quickly, and new problems are being discovered just as quickly.