CORS: The Internet’s security “bouncer”

One of the realities of the modern web is that every new technology needs to balance between functionality and security.  In this article, we talk about one particular case of this balance that comes into play and how it may affect working with virtual reality technology on the web.

When building a website, it’s not unusual to embed resources from one website inside another website.  For example, an image on a webpage (loaded via the “img” tag) can point to a JPEG stored on an entirely different server.  Similarly, Javascript files or other resources might come from remote servers.  This introduces some potential for security issues for consumers of web content.  For example, if your website loads a Javascript file from another site, and a hacker is able to modify that file, you’ll be loading potentially malicious code on your website.  Browsers normally address the dangers of mixed-origin content by tightly controlling the ways in which scripts from different servers can talk to other servers on the Internet.  This area of security is called “cross origin protection.”

For example, let’s say we have a webpage, Foo.com.  This webpage loads in our browser, along with an associated Javascript file, Foo.js, which is responsible for loading additional assets and managing interactive content on the page.  This Foo.js file, then, attempts to load some image content from Bar.com and plop it into a <canvas> tag on our webpage.  This all seems innocent enough so far, right?… WRONG!

In fact, this is a major security risk.  For example, imagine Bar.com is a web server that displays scanned documents.  For illustrative purposes, let’s pretend Bar.com is actually “IRS.com”, and contains millions of users’ scanned tax records.  In the scenario above, without any security measures in place, our Foo.js file would be able to reach into Bar.com, grab a secret file, plop it into the page’s <canvas> tag, read out the contents, and store it back to the Foo.com server for further exploitation.  The server administrator at Foo.com would then have access to millions of users’ tax records data that had been maliciously sniped from Bar.com.  It’s easy to see, then, that scripts like Foo.js can quickly become a security risk.  Content that “crosses origins”–that loads from separate places on the Internet–needs to prevented, from co-mingling in potentially malicious ways.

The solution?  Browsers, by default, will block this type of cross-origin content loading altogether!  If your website and script are being loaded from Foo.com, then your browser forces the website to “stick to its lane”.  Foo.com webpages will only be allowed to load other content that comes from Foo.com, and will be blocked from loading–and potentially exploiting–content from Bar.com.

Cross-origin protection and WebVR

This basic situation plays out in many different ways within web technology.  A cross-origin image can’t be read back into an HTML <canvas>, and, importantly for our conversation, can’t be used as a WebGL texture.  WebGL is the technology underlying all of the web-based virtual reality tools like AFrame and ThreeJS.  The specifics of why cross-domain images can’t be used as textures in WebGL are pretty fascinating, but also pretty complicated.

In practice, what this means is that if your virtual reality Javascript is stored on one server, it can’t easily load images or videos stored on another server.  Unfortunately, this could be pretty restrictive in when trying to create WebVR content; even within the University, we often have resources split across many servers.

Cross-Origin Resource Sharing (CORS)

Fortunately, there’s a solution called Cross Origin Resource Sharing.  This is a way to tell your web servers to explicitly opt-in to cross-domain uses of their content.  It allows a webserver like Bar.com to say “I expect to send resources to scripts at Foo.com, so allow those requests to go through and load into the browser.”  It’s basically the Internet equivalent of telling the bouncer at your favorite club to put your buddy on a VIP access list, rather than leaving him standing at the door.  As long as the bouncer…erm, browser…sees that a specific source of data is vouched for, it will allow the requests to go through.

whiteboard drawing of a browser requesting content from bar.com and getting blocked by stick-figure CORS bouncer guy

Doing these CORS checks requires some extra communication between the browser and the server, so occasionally the browser skips CORS checks.  However, when creating VR content in particular, sometimes we want to explicitly ask the browser to perform a CORS check so that the content can be loaded into a secure element like an HTML <canvas> or WebGL texture for VR display.   In this case, the “crossdomain” attribute on HTML elements is necessary.  If we load an image using the HTML code <img src="http://bar.com/image.jpg" crossorigin="anonymous"/> the browser will perform a CORS check before loading the image for the user to view.  Assuming the server hosting the image (in this case, Bar.com) has CORS allowed for the target website (Foo.com), that image will be considered safe for things like loading into an HTML <canvas> or using as a WebGL texture on Foo.com.  In this way, VR websites hosted on one server can continue to load pre-approved resources from other remote servers, as long as those servers provide the necessary “OK” for CORS.

CORS troubleshooting

Even if you’re doing everything right in the browser and on the server, CORS can still provide some headaches.  When you encounter a CORS-related failure, the errors that are generated are often opaque and hard to unpack.  Things like caching within the browser can also make these errors feel sporadic and harder to track down: one minute it may look like your image is suddenly loading (or suddenly not loading) correctly, when in fact what you’re seeing is a previously-cached, older version of the image or associated Javascript that your browser has stored behind the scenes.

Even worse, some browsers have broken or unreliable implementations of CORS.  For example, Safari on Mac OS X cannot currently load videos via CORS, regardless of the browser and server settings.  As a general tip, if you see errors in your browser’s web developer console that mention any sort of security restriction, start by looking into whether you’ve come up against a CORS-related issue.

Detecting Spherical Media Files

In many ways, VR is still a case of a “wild west” as far as technology goes.  There are very few true standards, and those that do exist haven’t been implemented widely.

Recently, we’ve been looking at how to automatically identify spherical (equirectangular) photos and videos so they can be displayed properly in our Elevator digital asset management tool.  “Why is this such a problem in the first place?” you may be wondering.  Well, spherical photos and videos are packaged in a way that they resemble pretty much any other type of photo of video.  At this point, we’re working primarily with images from our Ricoh Theta spherical cameras, which saves photos as .JPG files and videos as .MP4 files.  Our computers recognize these file types as being photo and video files – which they are – but doesn’t have an automatic way of detecting the “special sauce”: the fact that they’re spherical!  You can open up these files in your standard photo/video viewer, but they look a little odd and distorted:

R0010012

So, we clearly need some way of detecting if our photos and videos were shot with a spherical camera.  That way, when we view them, we can automatically plop them into a spherical viewer, which can project our photos and videos into a spherical shape so they can be experienced as they were intended to be experienced!  As it turns out, this gets a bit messy…

Let’s start by looking at spherical photos.  We hypothesized that there must be metadata within the files to identify them as spherical.  The best way to investigate a file in a case like this is with ExifTool, which extracts metadata from nearly every media format.

While there’s lots of metadata in an image file (camera settings, date and time information, etc.), our Ricoh Theta files had some very promising additional items:

Projection Type : equirectangular
Use Panorama Viewer : True
Pose Heading Degrees : 0.0
Pose Pitch Degrees : 5.8
Pose Roll Degrees : 2.8

Additional googling reveals that the UsePanoramaViewer attribute has its origins in Google Streetview’s panoramic metadata extensions.  This is somewhere in the “quasi-standard” category – there’s no standards body that has agreed on this as the way to flag panoramic images, but manufacturers have adopted it.

Video, on the other hand is a little harder to deal with at the moment.  Fortunately, it has the promise of becoming easier in the future.  There’s a “request for comments” with a proposed metadata standard for spherical metadata.  This RFC is specifically focused on storing spherical metadata in web-delivery files (WebM and MP4), using a special identifier (a “UUID”) and some XML.

Right now, reading that metadata is pretty problematic.  None of the common video tools can display it.  However, open source projects are moving quickly to adopt it, and Google is already leveraging this metadata with files uploaded to YouTube.  In the case of the Ricoh cameras we use, their desktop video conversion tool has recently been updated to incorporate this type of metadata as well.

One of the most exciting parts of working in VR right now is that the landscape is changing on a week-by-week basis.  Problems are being solved quickly, and new problems are being discovered just as quickly.

Sharing code in higher ed

The “sharing first” sentiment is gaining momentum across academia…

The Economist recently ran an article about commercial applications of code from higher education.  While LATIS Labs isn’t exactly planning to churn out million-dollar software to help monetize eyeballs or synergize business practices, we do want to be sharing software.

We believe that sharing software is a part of our responsibility as developers at a public institution.  Of course we’ll be releasing code – we’re in higher education.  This “sharing first” sentiment is also gaining momentum in other parts of academia, from open textbooks, to open access journals, to open data (see some related links below).

We also believe that releasing code makes for better code.  At a big institution like the University of Minnesota, it’s easy to cut corners on software development by relying on private access to databases or by making assumptions about your users.  Writing with an eye towards open source forces you to design software the right way, it forces to you document your code, and it forces you to write software you’re proud of.

As we work on things in LATIS Labs, you’ll find them at github.com/umn-latis.   Clone them, fork them, file issues on them.  We’ll keep sharing.

Resources & references

I’ve just seen a face…

For many of us, facial recognition in digital images may seem like one of Facebook and Google’s recent parlor tricks to make it easier for you to “tag” your friends in vacation photos.  But if you do any work in privacy law, ethics, etc., the spread of facial recognition technology may open up some interesting policy implications and research opportunities. Here, we dig a little deeper into how facial recognition technologies work “under the hood”.

Facebook, your iPhone, Google…they all seem to know where the faces are.  You snap a picture and upload it to Facebook?  It instantly recognizes a face and tells you to tag your friends.  And when it does that, it’s actually just being polite; it already has a pretty good sense of which friends it’s recognized–it’s just looking to you to confirm.

If you’re like me, you’ve probably had some combination of reactions, ranging from “Awesome!” to “Well, that’s kinda creepy…” to “How the heck does it do that?”  And if you do any work in privacy law, ethics, etc., the spread of facial recognition technology may be more than a mere parlor trick to you.  It has major policy implications, and will likely open up a lot of interesting research opportunities.

But how the heck does it work?  Well, we dug into this a bit recently to find out…

Fortunately, there happens to be a very nice open source library called OpenCV that we can use to explore some of the various facial recognition algorithms that are floating around out there.  OpenCV can get pretty labyrinthine pretty fast, so you may also want to dig into a few wonderful tutorials (see “Resources” below) that are emerging on the subject.

We explored an algorithm called Eigenfaces, along with a nifty little method called Haar Cascades, to get a sense of how algorithms can be trained to recognize faces in a digital image and match them to unique individuals.  These are just a few algorithms among many, but the exploration should give you a nice idea of the kinds of problems that need to be tackled in order to effectively recognize a face.

But first, let’s jump to the punchline! When it’s all said and done, here’s what it does:

And here’s how it does it, in both layman’s terms and in code snippets:

First, create two sets of images.  The first will be a set of “negative” images of human faces. These are images of generic people, but not those that we want our algorithm to be able to recognize. (Note: Thanks to Samaria & Harter–see “Resources” below–for making these images available to researchers and geeks to use when experimenting with facial recognition!)

The second is a set of “positive” images of the faces that we want our algorithm to be able to recognize. We’ll need about 10-15 snapshots of each person we want to be able to recognize, and we can easily capture these using our computer’s webcam.  And for simplicity’s sake, we’ll make sure all of these images are the same size and are nicely zoomed into the center of each face, so we don’t have to take size variation or image quality into account for now.

Then, feed all of these images into the OpenCV Eigenfaces algorithm:


USER_LIST = os.listdir(config.POSITIVE_DIR) for user in USER_LIST: for filename in walk_files(config.POSITIVE_DIR + "/" + user, '*.pgm'): faces.append(prepare_image(filename)) labels.append(USER_LIST.index(user)) pos_count += 1 # Read all negative images for filename in walk_files(config.NEGATIVE_DIR, '*.pgm'): faces.append(prepare_image(filename)) labels.append(config.NEGATIVE_LABEL) neg_count += 1 print 'Read', pos_count, 'positive images and', neg_count, 'negative images.'

Next, “train” the Eigenfaces algorithm to recognize whose faces are whose. It does this by mathematically figuring out all the ways the negative and positive faces are similar to each other and essentially ignoring this information as “fluff” that’s not particularly useful to identifying individuals.  Then, it focuses on all of the ways the faces are different from each other, and uses these unique variations as key information to predict whose face is whose. So, for example, if your friend has a unibrow and a mole on their chin and you don’t, the Eigenfaces algorithm would latch onto these as meaningful ways of identifying your friend.  The exact statistics of this are slightly over my head, but for those of you who are into that kind of thing, principal component analysis is the “special statistical sauce” that powers this process.

# Train model
print 'Training model...'
model = cv2.face.createEigenFaceRecognizer()
model.train(np.asarray(faces), np.asarray(labels))

When training the model, we can also examine an interesting byproduct–the “mean Eigenface”.  This is essentially an abstraction of what it means to have an entirely “average face”, according to our model:

mean eigenface

Kind of bizarre, huh?

Now, the real test: we need to be able to recognize these faces from a webcam feed.  And unlike our training images, our faces may not be well-centered in our video feed.  We may have people moving around or off kilter, so how do we deal with this?  Enter…the Haar Cascade!

The Haar Cascade will scan through our webcam feed and look for “face-like” objects.  It does this by taking a “face-like” geometric template, and scanning it across each frame in our video feed very, very quickly.  It examines the edges of the various shapes in our images to see if they match this very basic template.  It even stretches and shrinks the template between scans, so it can detect faces of different sizes, just in case our face happens to be very close up or very far away.  Note that the Haar Cascade isn’t looking for specific individuals’ faces–it’s just looking for “face-like” geometric patterns, which makes it relatively efficient to run:

haar_faces = cv2.CascadeClassifier(config.HAAR_FACES)

def detect_single(image):
"""Return bounds (x, y, width, height) of detected face in grayscale image.
If no face or more than one face are detected, None is returned.
"""
faces = haar_faces.detectMultiScale(image,
scaleFactor=config.HAAR_SCALE_FACTOR,
minNeighbors=config.HAAR_MIN_NEIGHBORS,
minSize=config.HAAR_MIN_SIZE,
flags=cv2.CASCADE_SCALE_IMAGE)
if len(faces) != 1:
return None
return faces[0]

Once the Haar Cascade has identified a “face-like” thing in the video feed, it crops off that portion of the video frame and passes it back to the Eigenfaces algorithm.  The Eigenfaces algorithm then churns this image back through its classifier.  If the image matches the unique set of statistically identifying characteristics of one of the users we trainted it to recognize, it will spit out their name.  If it doesn’t recognize the face as someone from the group of users it was trained to recognize, it well tell us that, too!

# Test face against model.
label, confidence = model.predict(crop)

if label >= 0 and confidence < config.POSITIVE_THRESHOLD:
print 'Recognized ' + USER_LIST[label]
else:
print 'Did not recognize face!'

Interested in exploring this further with a class or as part of a research project?  Get in touch and we’re happy to help you on your way!

Related resources

  • Sobel, B. (11 June 2015). “Facial recognition technology is everywhere. It my not be legal.” The Washington Post. https://www.washingtonpost.com/news/the-switch/wp/2015/06/11/facial-recognition-technology-is-everywhere-it-may-not-be-legal/
  • Meyer, R. (24 June 2014). “Anti-Surveillance Camouflage for Your Face”. The Atlantic. http://www.theatlantic.com/technology/archive/2014/07/makeup/374929/
  • “How does Facebook suggest tags?” Facebook Help Center. https://www.facebook.com/help/122175507864081
  • “OpenCV Tutorials, Resources, and Guides”. PyImageSearch. http://www.pyimagesearch.com/opencv-tutorials-resources-guides/
  • “Face Recognition with Open CV: Eigenfaces”. OpenCV docs 2.4.12.0. http://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial.html#eigenfaces
  • “Face Detection Using Haar Cascades”. OpenCV Docs 3.1.0. http://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_face_detection.html
  • “Raspberry Pi Face Recognition Treasure Box”. Adafruit tutorials. https://learn.adafruit.com/raspberry-pi-face-recognition-treasure-box/overview
  • F. Samaria & A. Harter. “Parameterisation of a stochastic model for human face identification” 2nd IEEE Workshop on Applications of Computer Vision December 1994, Sarasota (Florida).

Interaction via Google Cardboard

While devices like the Oculus Rift and HTC Vive generate a lot of the headlines in the world of VR, for most users, lower-cost and lower-fidelity devices like the Google Cardboard and Samsung GearVR are the more typical form of interaction.

The advanced headsets have many modes of interaction; at a minimum, they support gaming controllers or hand controllers, and many of them accurately track movement within a room via infrared technology.  When users are viewing content via Google Cardboard, the options for interaction are much more limited.

Cardboard provides a few methods for getting input from your user, depending on how creative you’re willing to be.  First off, all Cardboard units have a single button.  This button translates as a “tap” on the screen.  You can’t use it to track a specific touch location on the screen, but combined with gaze detection (figuring out what the user is looking at) you can build simple interactivity.  The “gaze – tap” interaction holds some nice potential as the basic “bread and butter” interaction for Google Cardboard.

We’ve been exploring this as a way to do simple interactive walkthroughs. In addition to the obvious “pick the direction you want to go” options, we believe additional controls can often be “hidden” at the top and bottom of the view sphere.  For example, your users may be able to look down to trigger a menu, or look up to go to a map.

Google Cardboard also gives you access to the various sensors available within a smartphone, like the accelerometer and gyroscopes.  These are useful, first and foremost, for tracking the movement of the headset itself, but these could also potentially be used for gesture control.  For example, you could watch for sudden impacts – such as tapping the side of the cardboard, or having the user jump up and down – to trigger certain interactions.

If you’re building a local iOS or Android app (as opposed to a web application) you’ll also have access to the device’s microphone.  If that’s the case, speech recognition could also provide a lot of interesting flexibility.  For example, even basic detection of loud noises can allow for start/stop controls.

While the Cardboard technology is limiting in many ways, the limitations can actually be exciting, because they spur you to think of new ways to leverage the technology.  We’re excited about what’s possible, and we’re excited to hear from others!

Parsing and plotting OMNIC Specta SPA files with R and PHP

[This is a repost of an article that was originally published at DiscreteCosine.]

This is a quick “how-to” post to describe how to parse OMNIC Specta SPA files, in case anyone goes a-google’n for a similar solution in the future.

SPA files consist of some metadata, along with the data as little endian float32. The files contain a basic manifest right near the start, including the offset and runlength for the data. The start offset is at byte 386 (two byte integer), and the run length is at 390 (another two byte int). The actual data is strictly made up of the little endian floats – no start and stop, no control characters.

These files are pretty easy to parse and plot, at least to get a simple display. Here’s some R code to read and plot an SPA:

pathToSource <- "fill_in_your_path";
to.read = file(pathToSource, "rb");

# Read the start offset
seek(to.read, 386, origin="start");
startOffset # Read the length
seek(to.read, 390, origin="start");
readLength

# seek to the start
seek(to.read, startOffset, origin="start");

# we'll read four byte chunks
floatCount

# read all our floats
floatData

floatDataFrame floatDataFrame$ID<-seq.int(nrow(floatDataFrame))
p.plot p.plot + geom_line() + theme_bw()

In my particular case, I need to plot them from PHP, and already have a pipeline that shells out to gnuplot to plot other types of data. So, in case it’s helpful to anyone, here’s the same plotting in PHP.

<!--?php function generatePlotForSPA($source, $targetFile) { $sourceFile = fopen($source, "rb"); fseek($sourceFile, 386); $targetOffset = current(unpack("v", fread($sourceFile, 2))); if($targetOffset > filesize($source)) {<br ?--> return false;
}
fseek($sourceFile, 390);
$dataLength = current(unpack("v", fread($sourceFile, 2)));
if($dataLength + $targetOffset > filesize($source)) {
return false;
}

fseek($sourceFile, $targetOffset);

$rawData = fread($sourceFile, $dataLength);
$rawDataOutputPath = $source . "_raw_data";
$outputFile = fopen($rawDataOutputPath, "w");
fwrite($outputFile, $rawData);
fclose($outputFile);
$gnuScript = "set terminal png size {width},{height};
set output '{output}';

unset key;
unset border;

plot '<cat' binary filetype=bin format='%float32' endian=little array=1:0 with lines lt rgb 'black';"; $targetScript = str_replace("{output}", $targetFile, $gnuScript); $targetScript = str_replace("{width}", 500, $targetScript); $targetScript = str_replace("{height}", 400, $targetScript); $gnuPath = "gnuplot"; $outputScript = "cat \"" . $rawDataOutputPath . "\" | " . $gnuPath . " -e \"" . $targetScript . "\""; exec($outputScript); if(!file_exists($targetFile)) { return false; } return true; } ?>

VR, 360, and 3D oh my!

“Virtual reality” is here to stay, and the tools for authoring virtual reality content are becoming increasingly easy to access.  Whether it’s a basic spherical image or a fully interactive “real” virtual reality simulation, the semantics matter less than the perspective shift it can offer to learners and researchers in the liberal arts.

There’s a new technology on the horizon!  Quick, let’s have an argument over semantics!

If you spend any time with someone embedded in the world of virtual reality, at some point they’re likely to comment that such-and-such technology “isn’t actual virtual reality, it’s just spherical video.”  (The author, in fact, has been guilty of this on several occasions.)

In this post, we’ll break out the different terminology in the space. But first, let’s be clear: “Virtual Reality” has already won the semantic smackdown.  Just like we spent the 90s arguing about the difference between “hackers” and “crackers”, this argument has already been lost.

Spherical Imaging

Spherical imaging involves capturing an image of everything around a single point.  Think of it as a panorama photo, except the panorama goes all the way around you.  For those viewing the resulting image, spherical imaging lets you place your viewer in a position, and then they decide where they want to look.  It’s a great way to give someone a sense of a place without actually being there.  Spherical imaging can capture either still images or video, and can be viewed on either a normal computer screen, or using some type of VR viewing headset.  Here’s an example of a spherical image, as it’s captured by the camera.  You’ll notice its raw form is kind of stretched and distorted.  This is called an “equirectangular” image:

R0010056

And here’s an example of how you can interact with it.  Go ahead and poke, click and drag it – it won’t bite!

[sphere 9]

There are a few important distinctions to think about with spherical imaging.  First off, it’s not three dimensional.  Even though you can look all around, you can’t see different sides of an object.  This is particularly noticeable when objects are close to the camera.  Additionally, your viewer can’t move around.  The perspective is stuck wherever the camera was when the image was captured.

Many folks would argue that these two facts disqualify spherical images from being considered “virtual reality.”  We disagree, but we’ll get to that later.  If you’re interested in capturing your own spherical images, LATIS currently has two Ricoh Theta360 cameras available to borrow.  These are a simple, one-button solution for capturing these types of images.  If you’d like to give them a try, get in touch!

3D Imaging

At its most basic, 3D is just a matter of putting two cameras side by side, in a position that mimics the distance between human eyes.  Then, you simply capture two sets of images or videos set apart at this distance.  When displaying, you just need to send the correct image to the correct eye, and the viewer will have a 3D experience.  However, that’s a pretty limited experience, as the “gaze” remains relatively fixed.  The viewer can’t turn their head and look elsewhere, and they certainly can’t move around.  The more interesting type of 3D combines 3D with spherical imaging.

In order to capture spherical 3D, you need two spherical images, offset just like they’d be in the human head.  It’s a lot more complicated than putting two spherical cameras next to each other, though.  If you did that, you’d only get a 3D image when looking straight ahead or straight behind. At any other position, the cameras would block each other.  This is where things get math-y.

When folks capture spherical 3D today, they often do so by combining many traditional two-dimensional cameras in an array, with lots of overlap between the images.  Afterwards, software builds two complete spherical images with the right offsets.  This is a very processing-intensive approach.  Most of the camera arrays available on the market use inexpensive cameras like the GoPro, but require many cameras to generate the 3D effect.

If you’ve got something like a Google Cardboard viewer, you can see an example of a 3D Spherical video on YouTube.

Unfortunately, we don’t currently have any equipment for this type of capture.  Later in 2016, we expect a variety of more affordable 3D spherical cameras will begin shipping, and we’re excited to explore this space further.

Virtual Reality

When purists use the term “virtual reality,” they’re thinking about a very literal interpretation of the term.  “Real” virtual reality would be an experience so real, you wouldn’t differentiate it from actual reality.  We’re obviously not there yet, but there are a few basic features that are important to think about.

The first, most important factor in “real” virtual reality is freedom of movement.  Within a given space, the viewer should be able to move wherever they want, and look at whatever they want.  In a computer generated environment, like a video game, that’s relatively easy.  If you to provide that sort of experience using a real location, it’s a lot harder – after all, you can’t place a camera at every possible location in a room (though some advanced technology is getting close to that.)

Today, virtual reality means creating a simulation, using technology similar to what’s used when making video games or animated films.  The creator pieces together different 3D models and images, adds animation and interactivity, and then the viewer “plays” the simulation.  While free or inexpensive software like Unity3d makes that feasible, it’s still a pretty complicated process.

Another important part of the “real” virtual reality experience is the ability to manipulate objects in a natural way.  Some of the newest virtual reality viewer hardware on the market, like the Oculus Rift and HTC Vive offer hand controllers which allow you to gesture naturally in space.  Some technologies even track your movement within a room, so you can walk around.

We’re just getting started exploring these technologies, and are learning to build simulations with Unity3d.  If you’d like to work with us on this, please get in touch!

Parsing and plotting OMNIC Specta SPA files with R and PHP

This is a quick “howto” post to describe how to parse OMNIC Specta SPA files, in case anyone goes a-google’n for a similar solution in the future.

SPA files consist of some metadata, along with the data as little endian float32. The files contain a basic manifest right near the start, including the offset and runlength for the data. The start offset is at byte 386 (two byte integer), and the run length is at 390 (another two byte int). The actual data is strictly made up of the little endian floats – no start and stop, no control characters.

These files are pretty easy to parse and plot, at least to get a simple display. Here’s some R code to read and plot an SPA:

pathToSource <- "fill_in_your_path";
to.read = file(pathToSource, "rb");

# Read the start offset
seek(to.read, 386, origin="start");
startOffset <- readBin(to.read, "int", n=1, size=2);
# Read the length
seek(to.read, 390, origin="start");
readLength <- readBin(to.read, "int", n=1, size=2);

# seek to the start
seek(to.read, startOffset, origin="start");

# we'll read four byte chunks
floatCount <- readLength/4

# read all our floats
floatData <- c(readBin(to.read,"double",floatCount, size=4))

floatDataFrame <- as.data.frame(floatData)
floatDataFrame$ID<-seq.int(nrow(floatDataFrame))
p.plot <- ggplot(data = floatDataFrame,aes(x=ID, y=floatData))
p.plot + geom_line() + theme_bw()

In my particular case, I need to plot them from PHP, and already have a pipeline that shells out to gnuplot to plot other types of data. So, in case it’s helpful to anyone, here’s the same plotting in PHP.

<?php

function generatePlotForSPA($source, $targetFile) {

    $sourceFile = fopen($source, "rb");

    fseek($sourceFile, 386);
    $targetOffset = current(unpack("v", fread($sourceFile, 2)));
    if($targetOffset > filesize($source)) {
        return false;
    }
    fseek($sourceFile, 390);
    $dataLength = current(unpack("v", fread($sourceFile, 2)));
    if($dataLength + $targetOffset > filesize($source)) {
        return false;
    }

    fseek($sourceFile, $targetOffset);

    $rawData = fread($sourceFile, $dataLength);
    $rawDataOutputPath = $source . "_raw_data";
    $outputFile = fopen($rawDataOutputPath, "w");
    fwrite($outputFile, $rawData);
    fclose($outputFile);
    $gnuScript = "set terminal png size {width},{height};
        set output '{output}';

        unset key;
        unset border;

    plot '<cat' binary filetype=bin format='%float32' endian=little array=1:0 with lines lt rgb 'black';";

    $targetScript = str_replace("{output}", $targetFile, $gnuScript);
    $targetScript = str_replace("{width}", 500, $targetScript);
    $targetScript = str_replace("{height}", 400, $targetScript);
    $gnuPath = "gnuplot";
    $outputScript = "cat \"" . $rawDataOutputPath . "\" | " . $gnuPath . " -e \"" . $targetScript . "\"";
    exec($outputScript);
    if(!file_exists($targetFile)) {
        return false;
    }
    return true;
}
?>

Transcoding Modern Formats

Since I’ve been working on a tool in this space recently, I thought I’d write something up in case it helps folks unravel how to think about transcoding these days.

The tool I’ve been working on is EditReady, a transcoding app for the Mac. But why do you want to transcode in the first place?

Dailies

After a day of shooting, there are a lot of people who need to see the footage from the day. Most of these folks aren’t equipped with editing suites or viewing stations – they want to view footage on their desktop or mobile device. That can be a problem if you’re shooting ProRes or similar.

Converting ProRes, DNxHD or MPEG2 footage with EditReady to H.264 is fast and easy. With bulk metadata editing and custom file naming, the management of all the files from the set becomes simpler and more trackable.

One common workflow would be to drop all the footage from a given shot into EditReady. Use the “set metadata for all” command to attach a consistent reel name to all of the clips. Do some quick spot-checks on the footage using the built in player to make sure it’s what you expect. Use the filename builder to tag all the footage with the reel name and the file creation date. Then, select the H.264 preset and hit convert. Now anyone who needs the footage can easily take the proxies with them on the go, without needing special codecs or players, and regardless of whether they’re working on a PC, a Mac, or even a mobile device.

If your production is being shot in the Log space, you can use the LUT feature in EditReady to give your viewers a more traditional “video levels” daily. Just load a basic Log to Video Levels LUT for the batch, and your converted files will more closely resemble graded footage.

Mezzanine Formats

Even though many modern post production tools can work natively with H.264 from a GoPro or iPhone, there are a variety of downsides to that type of workflow. First and foremost is performance. When you’re working with H.264 in an editor or color correction tool, your computer has to constantly work to decompress the H.264 footage. Those are CPU cycles that aren’t being spent generating effects, responding to user interface clicks, or drawing your previews. Even apps that endeavor to support H.264 natively often get bogged down, or have trouble with all of the “flavors” of H.264 that are in use. For example, mixing and matching H.264 from a GoPro with H.264 from a mobile phone often leads to hiccups or instability.

By using EditReady to batch transcode all of your footage to a format like ProRes or DNxHD, you get great performance throughout your post production pipeline, and more importantly, you get consistent performance. Since you’ll generally be exporting these formats from other parts of your pipeline as well – getting ProRes effects shots for example – you don’t have to worry about mix-and-match problems cropping up late in the production process either.

Just like with dailies, the ability to apply bulk or custom metadata to your footage during your initial ingest also makes management easier for the rest of your production. It also makes your final output faster – transcoding from H.264 to another format is generally slower than transcoding from a mezzanine format. Nothing takes the fun out of finishing a project like watching an “exporting” bar endlessly creep along.

Modernization

The video industry has gone through a lot of digital formats over the last 20 years. As Mac OS X has been upgraded over the years, it’s gotten harder to play some of those old formats. There’s a lot of irreplaceable footage stored in formats like Sorensen Video, Apple Intermediate Codec, or Apple Animation. It’s important that this footage be moved to a modern format like ProRes or H.264 before it becomes totally unplayable by modern computers. Because EditReady contains a robust, flexible backend with legacy support, you can bring this footage in, select a modern format, and click convert. Back when I started this blog, we were mostly talking about DV and HDV, with a bit of Apple Intermediate Codec mixed in. If you’ve still got footage like that around, it’s time to bring it forward!

Output

Finally, the powerful H.264 transcoding pipeline in EditReady means you generate beautiful deliverable H.264 more rapidly than ever. Just drop in your final, edited ProRes, DNxHD, or even uncompressed footage and generate a high quality H.264 for delivery. It’s never been this easy!

See for yourself

We released a free trial of EditReady so you can give it a shot yourself. Or drop me a line if you have questions.

2006 Lotus Elise For Sale

I’m selling a 2006 Lotus Elise in Magnetic Blue. It’s got 40,450 miles on it. The car has the touring package, as well as the hardtop and soft top and starshield. All the recalls are done. All the fluids (coolant, oil, clutch/brake) were done in 2013. The brakes and rear tires have about 4000 miles on them.

I bought the car from Jaguar Land Rover here in the Twin Cities in December of 2010. They sold the car originally, and then took it back on trade from the original owner so I’m the second owner of the car and it’s always been in the area.

The car is totally stock – no modifications whatsoever. No issues that I’m aware of. Cosmetically, I think it’s in very nice shape – the starshield at the front has some wax under one of the edges that kind of bothers me, but I’ve always been afraid to start picking at it.

If you’ve got questions about the car, or would like to take a look, let me know. I can be reached at cmcfadden@gmail.com or at 612-702-0779.

Asking $32,000.