phoenix /danm 210 /proposal_0.1

An augmented reality framework for layering media over the contemporary landscape

I propose creating an augmented reality framework that has the ability to layer media and text over the contemporary landscape using ubiquitous computing devices.  Initially the framework will be created for the iPhone, but the library will be created in such a way that it can be ported easily to other smart phones, portable computers, and hand-held devices.  The framework would allow the placement of text, imagery, sounds, and 3D media as an “illusory” overlay of camera and map data on the device.  Each of these elements can be mapped onto specific geographic places and made available to the user when in close proximity to the virtual objects.  For instance, an audio source can be placed near the entrance to a park, and as a user approaches that entrance they begin to here  music coming from the physical real world position.  Or an image can be placed to overlay a physical location.  The user will see the image through their devices camera as if it were physically present in the landscape.  This framework has great potential for the creation of socially mediated content contextualized by geographic location data, games that interact with the physical environment, and virtual artistic performances that are virtually attached to a real world physical space.

“Augmented reality” is the latest technology buzzword, although it has been around for a number of years.    There are many ways in which the term “augmented reality” is used, but I think the most useful one is “ a live direct or indirect view of a physical real-world environment whose elements are merged with-, or augmented by virtual computer-generated imagery - creating a mixed reality." (http://en.wikipedia.org/wiki/Augmented_reality)  The one thing I would add to this definition is that imagery is not the only thing that can be used, any sort of media can be layered with the “physical real-world environment”.  With this definition, the standard desktop environment is not an “augmented reality” environment, but it may well contain an application that can be defined such.  In car navigation systems are a good example of a current “augmented reality” application, mapping your current location to the navigation system contained within the device.  Perhaps one of the more interesting uses of augmented reality is the Reactable (http://mtg.upf.es/reactable/).  Reactable is an audio controller where specially designed blocks are placed on a lit table which are then tracked by a camera.  Images are projected on the table, and audio is heard in response to placement and movement of the blocks.

The iPhone has been a major player in the recent push towards augmented reality applications, although it is certainly not the first nor the only device that has augmented reality possibilities.  But the iPhone has created a critical conjoining of several factors, namely a uniform, fast interface that is relatively easy to program, a large user base, the hardware (such as accelerometers, gps, digital compass, and camera) and support to allow interaction with the physical world, and, of course, the marketing muscle that Apple is so well known for. Certainly other devices, like the Google Android phones, have very similar capabilities.  But no other device has gained such a large reputation and following as the iPhone.  

New programs with whole or partial augmented reality interfaces show up everyday on the iPhone.  One program, Pocket Universe, allows a user to point their device at the night sky, and the program will show the layout and names of popular stars, planets and constellations.  Numerous programs have shown up that will let a user point their iPhone around and the device will tell them the nearest points of interest, like the nearest coffee shops or subway station entrances.  The company Acrossair (http://www.acrossair.com/) distributes a number of applications that aggregate localized data, generally from publicly available sources, and make that data available to users through their simple interface.  They have an application called “Acrossair Wiki Browser” , where a user turns on their camera, and as they point it around little windows pop up showing local places of interest that it finds on Wikipedia (http://wikipedia.org/).

Augmented reality has been around for a while, but it has only begun to gain real traction in recent years.  Experimentation is happening in gaming, social, scientific realms, and artistic realms.  But there is yet to be any real standardization of methods or frameworks.  This is a particular problem with handheld devices, as most handheld devices have not only significant differences in hardware, but also in the software and libraries they are capable of running.  Additionally, developing an application for one handheld operating system, does not guarantee that it will run on every handheld that uses that system, as manufacturers have significant leeway on deciding what options they offer for each device.  The newest breed of smart phones, as they are being called now, are beginning to change this, but there is still far to go.  The Palm Pre, Google Android, and the iPhone have very similar features  (accelerometer, gps, compass, camera), but they do all run their own operating system underneath.  But the signs are good that things will be getting easier, as iPhone and Google Android can both run compiled c/c++ code, and Palm has hinted that they will support c/c++ in the future.  So while one compiled program will not run on the other, libraries have the potential of being made cross-platform, with only mild tweaks for differences that can be hidden behind wrapper functions.

The project I propose will be open source so that others can contribute, as well as use the framework for their own applications.  The framework will include core functionality for creating views, loading and saving 3d models, text, loading and playing audio, and other media, and the ability to place any of the previously mentioned media in a virtual geographic location.  Raw accelerometer, GPS, and compass data will be exposed.  There will be an automatically computed transformation matrix for the camera position based on the raw accelerometer and compass data.  All objects will have the ability to transmute their parameters.  For instance, an audio source can be animated so that it moves around, or a 3d model can morphed.  The framework will take care of the mapping of the physical real world coordinates to a virtual coordinate system used internally.  This will allow a program designer to focus more on the content and desired functionality rather than the low level details.  The framework will be programmed in standard ANSI C/C++ for portability.  Wherever there is a language dependent issue where ANSI C/C++ cannot be used outright, appropriate wrapper functions will be created in ANSI C/C++ to encapsulate access to those features.

A scripting layer created with LUA will sit on top of the framework, providing simplified access to the more complex lower level code.  The scripting language will expose much of the functionality available in the lower level framework, but will allow simplified access to higher level functionality.  Predefined behaviours will be available for scripting such things as simple animations, event handling for button presses, displaying and manipulating 3D models, and more.  The scripting language will allow for complex behaviours to be added without the need to recompile and reissue the entire program whenever additional data is added to an application. Scripts can be stored in an offline database that is accessed when needed, and delivered to the device transparently.

Finally, an XML file will be defined that allows the saving of reloadable fragments without the need of a full script.  The file format will include the ability to create, define and name new objects.  The format will also include the ability to embed scripts that add functional behaviour to the objects loaded in the XML file.  The XML file will be based on the Apple Plist format, since it is well defined and simple to use.

This framework is a necessary step towards the creation of several applications that I have in mind.  These applications all have a similar core of necessary functions, but use that core in different and interesting ways.  I am fascinated with finding new and novel ways of using technology to interact with our environment, and encouraging others to engage with the environment they are in.  Augmented reality is one of many ways that technology and the real world can interact.  My desire to keep this an open source project is a desire to encourage others to contribute and experiment as well.  I wish to make building augmented reality applications on a smart phone nearly as easy as building a webpage, but a whole lot more fun.

An example of the user experience with an application that could be made with this framework may look like the following:

The user is standing at a corner and sees an old house that holds historical interest.  The user starts a program, and immediately the camera comes on.  The user holds the camera up and points it at the house.  An image appears of the house taken from a 100 years ago, overlaying the house as it stands today.  Buttons and text appear allowing the user to look at further information.  Clicking on one of the buttons might start a virtual tour of the surrounding area and it's historical relationship with the building in question.  If the user points the camera towards the ground, an arrow appears that shows them the direction to walk to continue the tour.  As the user approaches the next landmark, audio begins to play of a first person account of a significant historical event.  The user presses a button on the device, and a map appears with pins showing the locations of the tour points and other points of interest.  The user presses another button that brings up a list of historic maps of the area.  The user presses the button marked 1856 and the map view comes back, but with a map of 1856 overlaying the current geography.

Another example:

A musical performance is composed and “placed” at a geographic location in a forest glen.  The user runs a program that is created with my framework.  When the user is close enough to the audio source, s/he begins hearing music coming from the direction of the location the music was originally “placed”.  The user is able to walk around the audio source and hear it get louder as they get closer, as well as hear the source move from the left ear, to center, to the right ear as they rotate themselves in place.  At a certain point more audio sources begin to become audible coming from different places near the user, but one comes from behind, and another off to the side and down the path a little way.  Soon the audio sources all begin to move, sounding as if the musicians are all dancing down the path.  The user begins to follow the sources as they move.  Upon entering an area where the trees are much closer together, the user starts to hear whispers coming from the trees, as if the trees themselves are talking amongst each other.  Eventually the audio sources stop moving at a stream with a large pool of water, and then the music slowly fades away.  As the music fades one last sound becomes apparent coming from the center of the pool, the sound of children swimming, splashing and playing.  And yet the pool is empty, and the water still.


Page Details
Contact DANM  |  Digital Arts and New Media  |  Arts Division  |  Grad Division
login