You know how you put things off, and then when you finally make yourself do it you discover it's actually great fun, and you wonder why you ever waited? I had a week like that with WebGL and the FFT.
WebGL is another of the HTML5 technologies which is poised to make an enormous impact on how we use the internet, and how browser code can accesses our hardware. It is essentially a binding of the OpenGL ES 2.0 spec within Javascript.
Web pages that can do OpenGL. Think about that for a minute.
Then consider that WebGL isn't some wonderful future standard, it's already in most major browsers. (IE people have to wait until 11.) If you have Chrome or Mozilla and a 3D Card, you already have it. Really. Mobile devices also support it (that's really important.) depending on their hardware. (Most new phones have a 3D chip, if only for fast 2D compositing.)
So yes, MineCraft and Quake could be written in Javascript now, and run in the browser on Android. That's great. Lots of people are writing new games, as you would expect.
There's also a second reason why you want to access the GPU from Javascript - it's an enormous chunk of optimized processing power that we can use to do cool stuff. On many machines, it's actually more powerful than the core CPU.
Texture Shaders, in particular, are special programs that are downloaded to the GPU and run per-pixel, in parallel, on perhaps hundreds of 'texture units'. They can perform massive parallel computations that are fast on a lot of useful algorithms besides drawing animation, like Fourier transforms.
The first texture shader I wrote in WebGL computed mandelbrot sets. In a 1024x1024 canvas element, I was generating 1024 iteration mandelbrots at real-time framerates over 10fps. At say 8 math operations per iteration, that calculation consumed 80GFlops, easily. And I wasn't pushing it.
80GFLOPS. Eighty Billion Floating-point operations per second. The original Cray Supercomputers weren't that fast.
And another thing... I've written OpenGL code before, and WebGL is just better. First, there's no mucking around with video capabilities or window handles or bit planes. You put a canvas element on a HTML page. Then you ask for it's OpenGL context. The browser does all the rest. Compared to how it used to be done, getting a GL context is sheer ease.
Then it all gets confusing again when you can't find how to render simple triangles, until you realize the big normalization they did in the OpenGL 'ES' specs is to take away all the 'simplified' methods, and leave only the 'useful' methods, whenever there was a duplication of functionality. You can't draw single triangles anymore. Only triangle strips. (of which the simplest strip is a single triangle, so you can still get the same outcome, just with the 'more useful' method)
Odd enough, this is also a joy to use, once you get your head around it. It makes writing the equivalent of 'hello world' a little harder, but once you've coded minimal vertex and fragment shaders just to get a single quad on the screen, you're already most of the way towards your end goal. Since you're always using the 'hard shader language' and not 'easy shader language', you don't have to stop and re-code once your program exceeds the capabilities of the simple primitives.
All shaders - simple and complex - fit the same basic pattern and become more interchangeable, while shortening the full spec. In the normally expanding world of standards, (to the point they can't be implemented in a single human lifetime anymore) this is an amazing and refreshing achievement.
It does, however, come at the cost of a bigger initial learning curve. GPUs compute in a different way. (That's kind of the point) And it's not necessarily obvious why they function the way they do... GPU card are more a collection of "hacks that worked" than any coherent plan to create a platform. I know a lot about the history of how they evolved to this point, and I've done a lot of parallel programming on SIMD machines too, so most of the shader concepts are already familiar to me.
WebGL is getting a 'Dark Arts' reputation because of this, and not without good reason. We leave the comforting worlds of Paragraps and CSS for matrix math and rendering pipelines.
What I'm discovering is this; it's a Dark Art that's well worth learning. You can perform some very powerful juju.
Assuming your browser can take it, look at some of these: http://glsl.heroku.com/
The Unorthodox Engineers
A one-man development team's cries of warning from the cutting edge, or a new-age corporate communications channel?
Sunday, May 12, 2013
Saturday, March 2, 2013
Busy, busy. Went to Sydney and came back to Brisbane. There was a flood in one of those places, and the other was trying very hard also.
The fire map has been quite 'successful', in the sense that most of Christmas was spent keeping it going, tweaking it for the sometimes odd conditions under which it was used. For example, Fairfax local media sites started embedding it, but their restricted column width was squishing some of the controls over each other.
There needs to be times where you - as the developer - should try to understand your user's dilemmas: they can't hack their CMS in one day to do special things just for my map to look better, those limitations are pretty hard set in their systems.
But I could change my code. Faster. It took the entire day, by the next smoky morning I had polished the app so it fit their iframe. It even detected that it had been 'framed' and offered a new menu option to link out to the fullscreen map.
The result - thousands of random people got a slightly better user experience. That's much more useful than a grumpy developer mumbling "but you're using it wrong!"
Fire season has mostly given way to flood season, and traffic has waned to almost nothing. I'll need to revisit it come next fire season, but for now its easy to keep it running. I expect a few feeds will break over the coming months as the agencies mess with their data in the 'off' season.
The fire map has been quite 'successful', in the sense that most of Christmas was spent keeping it going, tweaking it for the sometimes odd conditions under which it was used. For example, Fairfax local media sites started embedding it, but their restricted column width was squishing some of the controls over each other.
There needs to be times where you - as the developer - should try to understand your user's dilemmas: they can't hack their CMS in one day to do special things just for my map to look better, those limitations are pretty hard set in their systems.
But I could change my code. Faster. It took the entire day, by the next smoky morning I had polished the app so it fit their iframe. It even detected that it had been 'framed' and offered a new menu option to link out to the fullscreen map.
The result - thousands of random people got a slightly better user experience. That's much more useful than a grumpy developer mumbling "but you're using it wrong!"
Fire season has mostly given way to flood season, and traffic has waned to almost nothing. I'll need to revisit it come next fire season, but for now its easy to keep it running. I expect a few feeds will break over the coming months as the agencies mess with their data in the 'off' season.
First Preview of New LOGO Language
Argh. So many half-written updates that I still haven't finished. For the moment, here's a sample of something pretty that I've been working on that just came alive today:
The above is not a mock-up, but a result of running the following chunk of Javascript:
lisp(
["meta",
{ en: "First LOGO Program" },
["do",
["import","logo"],
["to","square",["fn", ["length"],
["pendown"],
["repeat",4, ["fn",[],
["forward",["length"]],
["right",90]
]],
["penup"]
]],
["canvas","drawing"],
["setpen",2,"red"],
["square",50],
["right",45],
["setpen",3,"black"],
["square",30]
]]
);
What you're seeing is:
- A LISP-like language encoded in JSON
- LOGO-like expression library which draws SVG content
- An interpreter (written in Javascript but designed to be portable) which runs the code, generating the coloured squares.
- An IDE 'editor' which pretty-formats the code for display, intended for use on a touch screen.
- Everything running in the browser, client-side, in a sandbox.
Why? Long story. "Parametric content" is about the pithiest explanation I can give. (eg: animation!) And perhaps a little pining for the 80's thrown in. These are old ideas. But they were also considered "too far ahead of their time". I think it might be time.
Monday, January 7, 2013
Now we're cooking
Man, I hate doing that. Hacking up a last-minute feature and pushing it out to the live site with nowhere near enough testing. There needs to be a damn good reason to do that.
Sometime in the last week or so the NSW Rural Fire Service added a new field to their data feed. It's a field called "Instruction" that contains rather useful text like "There is a heightened level of threat. Conditions are changing and you need to start taking action now to protect you and your family."
I swear that field wasn't there before. Looks like I'm not the only one still coding hard and fast for fire season.
It's useful and important information. My framework is built to be adaptive in the face of changing data. It's taken longer to blog about than implement the code. But I still hate pushing features out that fast. It's way too easy to find yourself in a bugfix spiral or take down the site entirely.
I think I made the right decision.
Now if only the Vic fire service would add a similar field... Or the NT Fire service would provide a feed at all.
Sometime in the last week or so the NSW Rural Fire Service added a new field to their data feed. It's a field called "Instruction" that contains rather useful text like "There is a heightened level of threat. Conditions are changing and you need to start taking action now to protect you and your family."
I swear that field wasn't there before. Looks like I'm not the only one still coding hard and fast for fire season.
It's useful and important information. My framework is built to be adaptive in the face of changing data. It's taken longer to blog about than implement the code. But I still hate pushing features out that fast. It's way too easy to find yourself in a bugfix spiral or take down the site entirely.
I think I made the right decision.
Now if only the Vic fire service would add a similar field... Or the NT Fire service would provide a feed at all.
Thursday, December 20, 2012
Just a regular expression
It's taken me a little time to really get my head around the Javascript regular expressions. You have to be pretty on the ball when you're writing a parser.
In fact I managed to double the decoder speed with a couple of deft new functions, by thinking a little more carefully about the Javascript RegEx objects and their peculiars.
At first, Javascript's RegEx object seem particularly bad for the task of generalized parsing, because parsing generally requires checking the next token and the Javascript function calls seem to do either first or global searches, but not from a given index.
Then they seem like a great idea when you discover the 'sticky' option in the MDN documentation which distinctly allows you to do just this. Then they seem like a terrible idea again when you discover that only Firefox implements them.
Most people get disheartened here and go back to parsing the string themselves character by character. But regular expressions execute as compiled, highly optimized code. You're never going to beat that with 'case' statements.
So, back to the code mines we go, and some more digging turns up the "lastIndex" property on the RegEx objects, which is one of the more badly named javascript methods. (and there are a lot, trust me) It has some quite unexpected behavior, and it's got at least two purposes, but this property in combination with the standard .exec() call, turns out to be exactly what we need.
exec() seems inappropriate (or at least very inefficient), because a LALR parser wants to check if the next chunk of string is the token it wants, and standard exec() with a global option can't be stopped from zooming away into the string, looking for it anywhere. Right to the end if need be.
But that's OK, because if we let it, and then remember where (or if ) it found a match, then we don't have to ask again until we get past that point. It's still very inefficient at the point we first call it, but it gives us enough information to not have to call it again for a while.
It's obvious when you think about the situation when you get near the end of the file. By then, many of the match expressions will have worked out they never appear again. They can just hang up their coats and say 'nope' if they're asked if there's anything left to do. It's called having a functional memory.
Yes yes, standard RegEx optimization tricks. Nothing new. That's not the point.
Once I put the optimization in the code doubled in speed. (whoo!) But only on Safari (iOS) and FireFox (windows). Chrome (windows) and IE (windows) continued running at exactly the same speed as before.
Now that's interesting.
What it suggests is that the Javascript engines in Chrome and IE had already implemented the identical optimization, but at a language interpreter level. I assume they detect the exact same RegEx object being applied to the exact same string object and they just re-start the matching from internally stored state.
But Safari and Firefox clearly haven't implemented this "same regex, same string" optimization, so when I explicitly do it myself it saves an enormous amount of time.
Here's the relevant bit of code. Don't worry about the lambdas.
function decode_rex(type, empty, rex) {
rex_registry.push(rex);
return function(s,i) {
var r = {ok:empty?true:false, pos:i, len:0, type:type};
// are we past the point where we need to look for a new match?
if(rex.lastIndex<=i) {
// match the expression from here
rex.lastIndex = i;
var m = rex.exec(s);
if(m instanceof Array) {
// found it
rex.matchIndex = m.index;
rex.matchLength = m[0].length;
rex.lastIndex = rex.matchIndex + 1; // safe consumer
} else {
// no more
rex.lastIndex = s.length;
rex.matchIndex = -1;
rex.matchLength = 0;
}
}
// is the next match supposed to be now?
if(rex.matchIndex===i) {
// consume the match
r.ok = true; r.len = rex.matchLength;
}
return r;
}
}
In fact I managed to double the decoder speed with a couple of deft new functions, by thinking a little more carefully about the Javascript RegEx objects and their peculiars.
At first, Javascript's RegEx object seem particularly bad for the task of generalized parsing, because parsing generally requires checking the next token and the Javascript function calls seem to do either first or global searches, but not from a given index.
Then they seem like a great idea when you discover the 'sticky' option in the MDN documentation which distinctly allows you to do just this. Then they seem like a terrible idea again when you discover that only Firefox implements them.
Most people get disheartened here and go back to parsing the string themselves character by character. But regular expressions execute as compiled, highly optimized code. You're never going to beat that with 'case' statements.
So, back to the code mines we go, and some more digging turns up the "lastIndex" property on the RegEx objects, which is one of the more badly named javascript methods. (and there are a lot, trust me) It has some quite unexpected behavior, and it's got at least two purposes, but this property in combination with the standard .exec() call, turns out to be exactly what we need.
exec() seems inappropriate (or at least very inefficient), because a LALR parser wants to check if the next chunk of string is the token it wants, and standard exec() with a global option can't be stopped from zooming away into the string, looking for it anywhere. Right to the end if need be.
But that's OK, because if we let it, and then remember where (or if ) it found a match, then we don't have to ask again until we get past that point. It's still very inefficient at the point we first call it, but it gives us enough information to not have to call it again for a while.
It's obvious when you think about the situation when you get near the end of the file. By then, many of the match expressions will have worked out they never appear again. They can just hang up their coats and say 'nope' if they're asked if there's anything left to do. It's called having a functional memory.
Yes yes, standard RegEx optimization tricks. Nothing new. That's not the point.
Once I put the optimization in the code doubled in speed. (whoo!) But only on Safari (iOS) and FireFox (windows). Chrome (windows) and IE (windows) continued running at exactly the same speed as before.
Now that's interesting.
What it suggests is that the Javascript engines in Chrome and IE had already implemented the identical optimization, but at a language interpreter level. I assume they detect the exact same RegEx object being applied to the exact same string object and they just re-start the matching from internally stored state.
But Safari and Firefox clearly haven't implemented this "same regex, same string" optimization, so when I explicitly do it myself it saves an enormous amount of time.
Here's the relevant bit of code. Don't worry about the lambdas.
function decode_rex(type, empty, rex) {
rex_registry.push(rex);
return function(s,i) {
var r = {ok:empty?true:false, pos:i, len:0, type:type};
// are we past the point where we need to look for a new match?
if(rex.lastIndex<=i) {
// match the expression from here
rex.lastIndex = i;
var m = rex.exec(s);
if(m instanceof Array) {
// found it
rex.matchIndex = m.index;
rex.matchLength = m[0].length;
rex.lastIndex = rex.matchIndex + 1; // safe consumer
} else {
// no more
rex.lastIndex = s.length;
rex.matchIndex = -1;
rex.matchLength = 0;
}
}
// is the next match supposed to be now?
if(rex.matchIndex===i) {
// consume the match
r.ok = true; r.len = rex.matchLength;
}
return r;
}
}
Finite Loop
I was going to post more details about JSON-U yesterday, but right about then the entire concept was in pieces around my feet after discovering a few things about how the colon and semicolon are handled in reality. Don't ask.
But that's OK, because I reworked the grammar and managed to remove both special characters, two more, and added new capabilities besides. To the point of making URL's nearly Turing-complete. Ignore that for the moment.
There are a few last things I'm trying to figure out. I've got my encoding to reliably survive most kinds of common mangling, and it really takes some determined effort to make the parser fail, but there's always the lure of handling one more case now so you never have to worry about it again.
Oddly, by being so free to rip things out of the parser, I'm discovering was of putting them back but using only the things remaining. For example, I had an array constructor syntax, and a function call syntax. Then I removed arrays as an explicit data, which is unhead of, but I kept the named function space (now called 'locallizers") and defined the "unnamed function" to "return an array consisting of the parameters passed to the function".
Boom, explicit arrays are gone. Replaced with a "pseudo-call" to a function that creates them. So functions are more primal than arrays, in a data coding. And since the functions use the round brackets, we save two characters from ever being used.
I've gone round and round, pulling things out and replacing them again. (quoted strings are back, but only to read in user-hacked values, never as output.) and it's like an infinite Rubik's cube where a couple more twists leaves a scrambled mess, but then a few more moves and everything is harmonious and better matched than ever.
I'm down to the point where I have a test page that generates hundreds of random data structures, encodes them all, checks they go backwards again properly, and offers up a list for clicking. I can type random crap into the parse-as-you-type text boxes, and JSON-U tends to cope with this better (spends more time in a 'parsed' state that correctly represents the data) than JSON does.
Along the road I've made some funny choices. But they hopefully combine their powers in a good way. For example, here's one consequence:
If you repetitively 'unescape' canonical JSON-U, you get the same string.
if you encodeURI() canonical JSON-U, you get the same string.
That's actually a major ability, and can't be said for any encoding that has a backslash, percent sign, or space in it. Ever.
("Canonical" just means "the one true blessed way, not the way we do it when we're in a rush.")
The single remaining annoying issue turns up when someone uses encodeURIComponent() on something that ends up in the hash fragment. From what I can tell of the standard, the fragment requires only the equivalent of encodeURI() and all possible component characters, including the hash, are allowed from that point on.
Therefore, doing a blanket encodeURIComponent() or escape() on anything destined for the hash fragment is de facto wrong. But that won't stop people from doing it, because who really knows the standards that well? How far do I go to accept the incorrect behavior? I think the answer is actually "not at all". But then, it might be easy to make it work with just a few more twists of the cube.
At the moment my encoding survives most mangling, Can I make it survive all? Perhaps.
Why do I care so much? Because shortly I'll be handing out URLs in the new format that I'll be expected my server to honor for months. Years even. While the code goes through development cycles, and I'm sure I'll want to change the parameters every few damn months. I need something like JSON-U up and running before I even hand out my first (versioned, metadata enhanced, expiry timed) links.
I have to be able to accurately predict all possible futures for the data I'll want to transmit and consume. As they say, prediction is very hard, especially about the future.
Fortunately, I am a computer scientist. And predictions of those kinds are one of my superpowers.
But that's OK, because I reworked the grammar and managed to remove both special characters, two more, and added new capabilities besides. To the point of making URL's nearly Turing-complete. Ignore that for the moment.
There are a few last things I'm trying to figure out. I've got my encoding to reliably survive most kinds of common mangling, and it really takes some determined effort to make the parser fail, but there's always the lure of handling one more case now so you never have to worry about it again.
Oddly, by being so free to rip things out of the parser, I'm discovering was of putting them back but using only the things remaining. For example, I had an array constructor syntax, and a function call syntax. Then I removed arrays as an explicit data, which is unhead of, but I kept the named function space (now called 'locallizers") and defined the "unnamed function" to "return an array consisting of the parameters passed to the function".
Boom, explicit arrays are gone. Replaced with a "pseudo-call" to a function that creates them. So functions are more primal than arrays, in a data coding. And since the functions use the round brackets, we save two characters from ever being used.
I've gone round and round, pulling things out and replacing them again. (quoted strings are back, but only to read in user-hacked values, never as output.) and it's like an infinite Rubik's cube where a couple more twists leaves a scrambled mess, but then a few more moves and everything is harmonious and better matched than ever.
I'm down to the point where I have a test page that generates hundreds of random data structures, encodes them all, checks they go backwards again properly, and offers up a list for clicking. I can type random crap into the parse-as-you-type text boxes, and JSON-U tends to cope with this better (spends more time in a 'parsed' state that correctly represents the data) than JSON does.
Along the road I've made some funny choices. But they hopefully combine their powers in a good way. For example, here's one consequence:
If you repetitively 'unescape' canonical JSON-U, you get the same string.
if you encodeURI() canonical JSON-U, you get the same string.
That's actually a major ability, and can't be said for any encoding that has a backslash, percent sign, or space in it. Ever.
("Canonical" just means "the one true blessed way, not the way we do it when we're in a rush.")
The single remaining annoying issue turns up when someone uses encodeURIComponent() on something that ends up in the hash fragment. From what I can tell of the standard, the fragment requires only the equivalent of encodeURI() and all possible component characters, including the hash, are allowed from that point on.
Therefore, doing a blanket encodeURIComponent() or escape() on anything destined for the hash fragment is de facto wrong. But that won't stop people from doing it, because who really knows the standards that well? How far do I go to accept the incorrect behavior? I think the answer is actually "not at all". But then, it might be easy to make it work with just a few more twists of the cube.
At the moment my encoding survives most mangling, Can I make it survive all? Perhaps.
Why do I care so much? Because shortly I'll be handing out URLs in the new format that I'll be expected my server to honor for months. Years even. While the code goes through development cycles, and I'm sure I'll want to change the parameters every few damn months. I need something like JSON-U up and running before I even hand out my first (versioned, metadata enhanced, expiry timed) links.
I have to be able to accurately predict all possible futures for the data I'll want to transmit and consume. As they say, prediction is very hard, especially about the future.
Fortunately, I am a computer scientist. And predictions of those kinds are one of my superpowers.
Sunday, December 16, 2012
A digression to the heart of the matter
I "invented" a new micro-language over the weekend. I put the word in air quotes, because I was actually trying very hard not to do any such thing, but was forced by dreadful necessity to do some inventing anyway.
Why the odd programmer self-loathing? Because there are already too many microformats in use, and I'm not sure that adding "yet another unifying standard" to the mix will help. But I need it.
Here's the problem: URLs. That's it in a nutshell. Bloody URLs.
There is a spec for them, but hardly anyone reads it. Even when they have, they usually just chop URL strings up with Regular Expressions intended to get the bit they want, and fail utterly if given anything that doesn't start with http:// or https://. So it's a minefield.
The latest crazy thing to do is to make extensive use of what they call the 'hash fragment'; everything that comes after the '#' sign in the URL, which is a special fragment for two reasons:
Remember, making the browser focus on an element half-way down the page was the original intention of the hash fragment, so all browsers support it and are 'tolerant' of this abuse. (ie: they don't throw errors or 'correct' the URL because a weird fragment ID doesn't actually exist in the page) We are just extending that metaphor, which just happens to work consistently on almost every browser ever made.
A classic modern use of the hash fragment is to put database article identifiers in it that correspond to AJAX calls to obtain that 'page fragment'. When your script detects the user has clicked on a new 'fragment link', the corresponding article is loaded. Sites like Twitter and Facebook have reached the point where there is only one 'page' for the entire site, and everything you see is dynamically loaded into that container.
A consequence has been such AJAX-driven sites were difficult for search engines (ie: Google) to index properly as none of the 'content' was on any real pages anymore. So they came up with a scheme: When the search engine (which acts like a big automatic browser) sees URLs with hash fragments, why not call a 'special' url with that fragment as a normal server GET parameter (basically, juggle the URL around a bit to put the client part in the server part) and then the site can tell the search engine all about the text and paragraphs that the 'hash fragment' corresponds to.
The search engine can even publish those links, and since they still contain the article id, your site will load up that content and the user will see what they expect!
So long as everyone does their jobs right.
Just to be sure that they don't accidentally index sites which can't cope with this behavior Google (and therefore all search engines) use the presence of the "!" symbol at the start of the fragment to indicate "this is an indexable hash link" Since the 'bang' character must be the first one following the 'hash', it is informally referred to as a "hashbang". (For the sound made by primitive parsers mangling your parameter strings just before they crash)
Why does this matter? Well, let's say I have a map application (hey I do!) and I want to record the co-ordinates of their window location into the URL in such a way that if they copy the link and sent it to someone, or just bookmark it, then going back to that link reloads the map to that same geographic location.
These cannot be ordinary server GET parameters, because we can't change the browsers URL to that without causing a page reload. It has to stay in the hash fragment, which means the client has to do all the work decoding it's own parameters.
In fact, wouldn't it be nice if we could just append an entire serialized Javascript object up there in the URL? And then deserialize it again with the same ease as JSON? Hey, why don't you just turn Javascript objects into JSON strings, then URLEncode them? Well:
So, that's the why. Next time, the how.
Why the odd programmer self-loathing? Because there are already too many microformats in use, and I'm not sure that adding "yet another unifying standard" to the mix will help. But I need it.
Here's the problem: URLs. That's it in a nutshell. Bloody URLs.
There is a spec for them, but hardly anyone reads it. Even when they have, they usually just chop URL strings up with Regular Expressions intended to get the bit they want, and fail utterly if given anything that doesn't start with http:// or https://. So it's a minefield.
The latest crazy thing to do is to make extensive use of what they call the 'hash fragment'; everything that comes after the '#' sign in the URL, which is a special fragment for two reasons:
- It is never sent to the server during page requests, it's only available to the client browser and scripts.
- Clicking on a page link that differs only in it's hash fragment from the current page does not reload the page, and in fact does nothing at all (except notify your scripts) if the fragment id doesn't match up with a real element id.
Remember, making the browser focus on an element half-way down the page was the original intention of the hash fragment, so all browsers support it and are 'tolerant' of this abuse. (ie: they don't throw errors or 'correct' the URL because a weird fragment ID doesn't actually exist in the page) We are just extending that metaphor, which just happens to work consistently on almost every browser ever made.
A classic modern use of the hash fragment is to put database article identifiers in it that correspond to AJAX calls to obtain that 'page fragment'. When your script detects the user has clicked on a new 'fragment link', the corresponding article is loaded. Sites like Twitter and Facebook have reached the point where there is only one 'page' for the entire site, and everything you see is dynamically loaded into that container.
A consequence has been such AJAX-driven sites were difficult for search engines (ie: Google) to index properly as none of the 'content' was on any real pages anymore. So they came up with a scheme: When the search engine (which acts like a big automatic browser) sees URLs with hash fragments, why not call a 'special' url with that fragment as a normal server GET parameter (basically, juggle the URL around a bit to put the client part in the server part) and then the site can tell the search engine all about the text and paragraphs that the 'hash fragment' corresponds to.
The search engine can even publish those links, and since they still contain the article id, your site will load up that content and the user will see what they expect!
So long as everyone does their jobs right.
Just to be sure that they don't accidentally index sites which can't cope with this behavior Google (and therefore all search engines) use the presence of the "!" symbol at the start of the fragment to indicate "this is an indexable hash link" Since the 'bang' character must be the first one following the 'hash', it is informally referred to as a "hashbang". (For the sound made by primitive parsers mangling your parameter strings just before they crash)
Why does this matter? Well, let's say I have a map application (hey I do!) and I want to record the co-ordinates of their window location into the URL in such a way that if they copy the link and sent it to someone, or just bookmark it, then going back to that link reloads the map to that same geographic location.
These cannot be ordinary server GET parameters, because we can't change the browsers URL to that without causing a page reload. It has to stay in the hash fragment, which means the client has to do all the work decoding it's own parameters.
In fact, wouldn't it be nice if we could just append an entire serialized Javascript object up there in the URL? And then deserialize it again with the same ease as JSON? Hey, why don't you just turn Javascript objects into JSON strings, then URLEncode them? Well:
- The standard javascript encode/escape routines don't quite do all the edge-case characters right.
- JSON transforms really badly when URLEncoded. As in, every bracket and apostrophe turning into three or six characters bad.
- Let's not even get into the unicode/utf8 conversion issues.
- The result is usually an unreadable mess.
- A lot of people do exactly that anyway.
Well, if the standard JSON format encodes badly because of its syntax character choices, why not just make some different choices that are URL-friendly? And then deal with the other character encoding issues?
...oh, and it would be nice if the data didn't change format when in the hash fragment compared to when it gets used as a server parameter, since different characters are allowed...
... and it would be nice if we could save some bytes by leaving out some of the JSON 'syntactic sugar' (like quoted identifiers) that aren't totally necessary...
... and if there were a more efficient way of encoding binary strings than 'percent' escaping, that would be good....
... and it would be nice if the user (ie, the developer) could still hand-edit some parameters, but then hide them away without affecting anything else...
That's pretty much what I have done. I'm calling the result JSON-U.
So, that's the why. Next time, the how.
Subscribe to:
Posts (Atom)
