Compressing elements of an XML file on Windows Phone

I recently came across the need to build search into my Windows Phone application, Study The Word. In most cases, searches are performed by querying a reverse index. A reverse index is basically a table which catalogs, in my case, terms found within a body of text and their locations. So one term, say “cars”, may have x amount of locations. If you are talking about the web, a term may be associated with a particular web address, or in the case of a book, a term may be associated with a particular chapter (in this case you may want to capture, line index or paragraph, etc). In my case, what I was indexing, the Bible, can easily be associated with a book, chapter, then verse.

As stated above, my index is for a Windows Phone application, so you immediately run into the need to save space. After initially converting my index to XML, the file was over 25MB. I initially trimmed it down a bit by doing various tricks, but ultimately, the file was still too large for a mobile device.

Next was to find a way to compress my index, then decompress it at will. Windows Phone does not come with a built in way to compress and decompress data. Fortunate for myself, as well as other Windows Phone developers out there, there is SharpCompress, a compression library for .NET/Mono/Silverlight/WP7.

Next we have to think about how we are going to compress our file. We still have to make the decision on compressing the file as a whole, or individual parts. In my case, I don’t want to have to spend the time to initially decompress the entire file at first start-up. Although this would in turn make things faster later, it still would have an added execution time cost of decompression, in turn making the user experience less enjoyable. So, I came up with compressing indexes associated with certain terms, based on the length of the index, decompressing them on the fly when needed. There is about a three hundred character threshold to where compression becomes effective, therefore I decided to compress anything over this threshold and store it in an XML attribute of that particular term.

The resulting compressed elements file was under 3MB. Yay!

Below is source code of my full String Compress method that uses the SharpComprss tools, which is based off some code I found around the net.

String Compression Source

    public class StringCompression
    {
        public static string Compress(string text)
        {
            if (text.Length < 300)
                return text;
 
            //Transform string into byte[]  
            byte[] byteArray = new byte[text.Length];
 
            int index = 0;
 
            //Redo this w/o ToCharArray conversion
            foreach (char item in text.ToCharArray())
                byteArray[index++] = (byte)item;
 
            //Prepare for compress
            MemoryStream ms = new MemoryStream();
            BZip2Stream gzip = new BZip2Stream(ms, CompressionMode.Compress, true);
 
            //Compress
            gzip.Write(byteArray, 0, byteArray.Length);
            gzip.Close();
 
            //Transform byte[] zip data to string
            byteArray = ms.ToArray();
 
            ms.Close();
            gzip.Dispose();
            ms.Dispose();
 
            return Convert.ToBase64String(byteArray);
        }
 
        public static string Decompress(string compressedText)
        {
            byte[] byteArray;
 
            //Transform string into byte[]
            try
            {
                byteArray = Convert.FromBase64String(compressedText);
            }
            catch
            {
                return compressedText;
            }
 
            //Prepare for decompress
            MemoryStream ms = new MemoryStream(byteArray);
            BZip2Stream gzip = new BZip2Stream(ms, CompressionMode.Decompress);
 
            //Decompress
            byte[] buffer = StreamToByteArray(gzip);
 
            //Transform byte[] unzip data to string
            StringBuilder sb = new StringBuilder();
 
            //Read the number of bytes GZipStream red and do not a for each bytes in resultByteArray;
            for (int i = 0; i < buffer.Length; i++)
                sb.Append((char)buffer[i]);
 
            gzip.Close();
            ms.Close();
 
            gzip.Dispose();
            ms.Dispose();
 
            return sb.ToString();
        }
 
        public static byte[] StreamToByteArray(Stream input)
        {
            byte[] buffer = new byte[16 * 1024];
            MemoryStream ms = new MemoryStream();
 
            int read;
 
            while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
                ms.Write(buffer, 0, read);
 
            return ms.ToArray();
        }
 
    }

Compressed XML Example

<Term name="BEGINNING" locations="QlpoOTFBWSZTWU+hQPYAAD6YAAAEf/BAAgdoM3ZDU8mjRSBp5JRlAJNRTJ6IEmUlQR6fOgs+7tv3OtGj5R3mOpnPJPLkMaI0s6rWd58ru94tvqTxKSi969Ei63mZ5qHvj5gVSVxfAt1KV3QI5Q86DJgSALkrtNxw0VYy1sbVIbaLtztsKbenDXm3kvVlOTEmJMfbjhTnzRLhQwE+oOhQbQph0nMvP1//YPDl82Lo4XLyVBholYrMpeQ57mFeIiYogkYriBoZhEVCiJIPh8zRxI+SAqgF9c78mBk2sqjT8HyDuwZPz0wiBb4jHgoKFeJLyia0xLdqjuR8UL2tgAGeEj6m09WVed7zs3nd8PI7S8L9ryJHECoykU1qpjlLsLKVRyRgVu5JfPi6ehkCyBKdpBsKEBpo+JK1BWiHmZ3QJa7sIUjpyrdCwxky8Yj+LuSKcKEgn0KB7A==" />

SharpCompress

Around the Net – May 16, 2012

And the winner is… Game of Thrones, for most pirated show of 2012. Also in this edition: Thor’s bad joke, a new Tron, the XBOX gets explored –and a subscription plan, CW gets green, DVD’s and Windows 8, another Spider-Man preview, and no streaming for you!

  • Game of Thrones –projected as most pirated show of 2012
    HBO’s Game of Thrones has been all the craze since it’s launch last year sometime. With a barrier to entry of subscribing to HBO through a cable or satellite provider, it wasn’t far-fetched at all to assume such mass-piracy would have happened. Cable and satellite providers only offer HBO or other premium channels with their most expensive tier of services, and with most people trying to trim costs these days, premium cable is probably the first thing to go. HBO, please consider this an opportunity to offer up a streaming-only plan.
    –By the way, if you are a big a fan of this series as I am, and know nothing about the books or have no plan of reading them at all, you should check out A Cast of Kings podcast. They breakdown each episode scene by scene, and discuss things that may be different from the books.
  • Avengers joke ignites petition
    In The Avengers movie, after Black Widow states that Loki has killed eighty people in two days, Thor replies, “He’s adopted.” A petition has actually been started in opposition to the joke, which in my opinion, was just an attempt of Thor to distance himself from the actions of him brother. Then again, I’m not adopted (or at least that I know of), so I can’t truly know how an adopted individual would feel upon hearing this.
  • Watch the 30-Minute Tron: Uprising prelude online
    This one was not on my radar at all, but seems interesting none the less. I’m not a big Tron fan so I’m not sure how much actual story is behind the movies, which is probably why it would lend itself well to an animated series format.
  • The $99 XBOX is official, only at Microsoft stores
    The rumors from the past few weeks are true. If you are anywhere near one of the few Microsoft stores and really want to game on a budget, this may be a good solution. The package include the core console, a Kinect, and XBOX Live membership for a mere fifteen bucks a month. Plenty of people have done the math, and it actually is only about forty bucks more than it would cost you to purchase the same package outright. I guess the real issue here is getting locked into a contract, much like we have seen with cellular companies.
  • Green Arrow show picked up by CW
    Although I don’t give a crap about Green Arrow, this project does have some interesting names attached to it. With the relative unpopularity of Green Arrow as a brand, this project almost seems doomed from the start. This one will probably go the way of ill-fated Wonder Woman reboot.
  • DVD/Blu-Ray Playback, not built into Windows 8
    No DVD or Blu-Ray playback in Windows 8, no worries. This is not really an issue, although that has not prevented various media outlets from sounding off. You see, a lot of people stream media these day. Netflix isn’t thirty-seven percent of internet traffic for no reason. But still, not everyone has a high-speed internet connection. Again, no worries. There are a myriad of free and paid media players that provide such capability. Let’s face it, physical media is going the way of the dodo bird.
  • Internet Explorer finally coming to Xbox
    PlayStation has had the leg-up on this one for years now, and while currently only being internally beta-tested phase, I hope to see this one soon. About time Microsoft.
  • Four Minute Amazing Spider-Man Preview
    Everything I see about this movie only makes me more and more exited. This preview features a bridge scene in which Spider-Man attempts to say a boy who is about to take a swim in the Hudson.
  • C-Spire Streaming Tier Revisited
    I have been hearing rumblings that C-Spire has finally started charging for their streaming tier, which was free since the release of the iPhone late last year. Needless to say, customers are furious over this and it will be interesting to see how it all plays out. With the main feature of most handsets starting to focus more around the ability to stream media, i see that only becoming a bigger issue going forward.

Thanks for reading, see you next week. Direct all feedback email to fans@techpedition.com  or leave a voicemail at 1.601.329.0636.

Around the Net – May 03, 2012

Another edition of Around the Net is here! This weeks edition includes: dirt-cheap XBOX bundles, HULU sucking more, Drawing Something less, weird Steve Jobs video, and the Black Ops reveal trailer!

Update


  • Microsoft readies $99 Xbox 360 Kinect bundle, with two-year subscription
    Looks like owning a console is starting to look a lot like owning a cellphone, with a $99 buy-in and and a $15 monthly fee. Although I consider $15 a month a lot of money, the gaming community has been paying this for years on the massively multi-player online game front. This is great for those wanting to game on a budget, I would love to see Apple do this with the iPad though.
  • Rumor Mill: Hulu will soon require you to have a cable subscription
    Pressed by content partners, Hulu may soon start requiring cable subscriber authentication for content. If it hasn’t already been said, Hulu is dead.
  • Draw Something daily users rapidly declining
    Just a month ago, this game was all the craze, getting scooped up by Zynga for a reported 200 million dollars. Well, turns out the buyout was the best move the up and coming development studio Omgpop could have made, as daily usage is on a steady decline. Honestly, just the other day I heard my wife say, “Nobody’s playing anymore.” I guess the drawing is on the wall for the highly popular app.
  • Steve Jobs as… Franklin D. Roosevelt?
    Oh how those internal videos can come back to haunt yuo. This is just weird, but definitely worth checking out for the Apple fan boys.
  • PlayStation Smash All-Stars: Battle Royale Revealed
    Simply put, this looks like a rip-off of Nintendo’s Super Smash Brothers, at least at a glance. I’ve heard from other pundits who have had “hands-on” that there are subtle differences about this title that sets it apart.
  • Black Ops II Trailer
    So there was a Call of Duty: Black Ops II Trailer. Check it out!

I’m sure I missed something. Send your tech news to fans@techpedition.com.

LateView – Homefront

A review of Homefront can be summarized in just a few words: big on promise, low on return. The preceding words may sound harsh, but then again, playing this game may feel harsh at times, at least in the earlier parts of the campaign. Let me start off by explaining the promise of Homefront.

Homefront was developed by Kaos Studios and published by THQ, running on Unreal Engine 3 and slated as a triple-A first-person shooter. With a story written by John Milius (Apocalypse Now, Red Dawn), Homefront pits players in a resistance movement against the North Korean military, who have occupied the United States.

The world of Homefront is by far the most appealing part of the game. From the start of the campaign, we are met with a cinematic that crosses the bounds of fact and fiction. Live-action footed mixed with computer-generated motion graphics, paints a very believable narrative of how a Korean occupied United States could come to pass. As strange as that may sound, the story is not as farfetched as it may initially seem. These cut scenes start off great, but unfortunately they become less and less interesting as the story progresses, which undoubtedly attributes to the narrative falling apart.

From the start of the campaign, we are met with a cinematic that crosses the bounds of fact and fiction.

Homefront controls exactly like a Call of Duty, which is clearly the audience it aims to capture. Any player of said franchise will feel right at home with the controls, which are pretty standard first-person shooter fare. That being said, at times the controls just feel ‘a bit off’. I played on the Xbox 360, so it could have been something to due with console-specific tuning or tweaking that was failed to be addressed. Whatever that case, you do eventually get use to the feel, but it never feels great in my opinion.

The graphics of Homefront seem average at best for the Xbox 360. I wouldn’t call this a pretty game by any means, at least in the first half of the game anyway. Graphics seem to take a turn for the better in the latter levels, concluding the game with a rather interesting battle at a historic landmark.

The initial gameplay is where Homefront begins to fall apart for me. This could be due to the guerilla-style of warfare in which you are contributing to, or maybe even the lack of intelligent level design. You see, I would figure a game which is based on a more guerrilla-style of gameplay would lend itself to more player choice and less scripted sequences. Instead you are just shuffled from section to section, with little player choice. Although this linear gameplay is not that surprising, I kind of see it as an opportunity missed. With the premise of Homefront, a ragtag-get-it-done-how-you-want-to style would have felt more appropriate.

Fortunately, Homefront picks up tremendously by the latter levels, both in gameplay and graphics. Honestly the last level almost feels like a different game. Intentional or not, the environment chosen to conclude the game automatically lends itself to good level design, while earlier stages just feel somewhat thrown together.

Homefront also suffers from poor scripting. Instead of allowing the game to be dynamic and adjust to what the player is doing, you have to go down the linear path that has been presented for you, often times waiting for a non-player character to do a specific task to advance. This could have been alleviated by better que’s or onscreen direction, but no such thing exists in this game.

Lastly, this game is extremely short. On normal difficulty, it can be beaten in about four to six hours. For a $59.99 price point at the time of initial retail release, that’s kind of unacceptable. People have come to expect much more out of a triple-A title. As a more than a year old get it on the cheap title, at twenty bucks, its short-ness is more forgiving.

The game does have a multiplayer component, which I only briefly explored. I specifically played team deathmatch mode, and I have to say it felt pretty good. Controls felt better than they did in the actual single player campaign. With the campaign being as short as it is, this is where anyone who purchases this title will be spending the bulk of their time. Again, any Call of Duty player should feel right at home here. If you buying it used, be ready to ante up a few dollars to be able to play past level five or unlock guns, as THQ has implemented an online digital rights management system, much like Electronic Arts.

Ultimately, I’m really conflicted about this title. I really wanted it to be good and I was drawn into the initial promise of the story. Despite mixed reviews, overall sales were okay and THQ has a sequel in the works. And this is where it gets interesting. After the demise of Kaos Studios, the sequel has been handed off to Crytek’s Nottingham UK studio. This move bodes well for Homefront 2, as it will now be developed on the latest CryEngine technology, rather than what feels like an outdated Unreal Engine 3.

I’m not even sure if I can recommend getting this game on the cheap unless you are into alternate versions of history and what that could mean. Multiplayer feels solid, while the campaign is just unacceptably short. This title was very ambitious on story, but failed to deliver on the most important part, gameplay.

Stay away, even at twenty bucks.

Totally Random: Dreams

For some weird reason, I woke up today with a very famous Langston Hughes poem on my mind. Now I am certainly not the poetic type (although my wife may wish I was sometimes), nor am I one who is constantly consumed by the perceived american pursuit of happiness, but there is something profound about dreams and the pursuit thereof. Mr. Hughes put it best:

Hold fast to dreams
For if dreams die
Life is a broken-winged bird
That cannot fly.
Hold fast to dreams
For when dreams go
Life is a barren field
Frozen with snow.

– Langston Hughes

I was first introduced to this poem while attending Upward Bound back in the mid-1990’s. For some reason this poem has stuck with my over the years: short, sweet, and concise.