Sunday, January 3, 2016

Web Crypto API - First Impressions

So, who knew that the W3C was, among other things, defining a Javascript API to do hard crypto?

I've had weeks messing around with the API, listening to the lectures, and reading the other blogs. First thing you notice... there's not a lot of detail, and not a lot of stories of people using it. I assume they're all quietly working hard.

The best I've found is probably Daniel Roesler's webcrypto-examples - an essential resource for anyone writing real code

And actually, there's a lot of blustery crap in blogs about how you can never possibly do real crypto in the browser. Because why? Um, because you can't trust the browser, that's why. I mean, sure, otherwise you have to trust the operating system, but that's completely different.

And besides, Javascript isn't a real language anyway, they'll grumble. Because nobody likes losing the language wars.

So first, the good. Here's what it looks like when you've got a modern (Chrome / Mozilla) browser and you can use all the HTML5 :

var crypto_subtle = window.crypto.subtle || window.crypto.webkitSubtle;

function create_identity(meta,algo) {
  meta = meta || {
        name: "anonymous",
        created: (+new Date),
  algo = algo || {
        name: "ECDH",
        namedCurve: "P-521", //can be "P-256", "P-384", or "P-521"
  var ident = { algo:algo, meta:meta };
  return crypto_subtle.generateKey(
      true, //whether the key is extractable (i.e. can be used in exportKey)
      ["deriveKey", "deriveBits"] //can be any combination of "deriveKey" and "deriveBits"
  .then( function (key) {
      return Promise.all([
          "jwk", //can be "jwk" (public or private), "raw" (public only), "spki" (public only), or "pkcs8" (private only)
          keys.privateKey //can be a publicKey or privateKey, as long as extractable was true
          ident['private'] = keydata;
          "jwk", //can be "jwk" (public or private), "raw" (public only), "spki" (public only), or "pkcs8" (private only)
          keys.publicKey //can be a publicKey or privateKey, as long as extractable was true
          ident['public'] = keydata;
    ]).then(function() {
      return ident;

That code is complete, no library dependencies. You could cut-and-paste it into any script. What does it do? It generates an Elliptic Curve public/private keypair, serializes the keys, and returns the whole chunk of JSON-y goodness in a Promise. 

It's the core operation to create a "cryptographic identity" for future operations like signatures or encryption or link security.

If you don't know about EC6 Promises that are now standard in all browsers, go read about that instead. Go, go! That's even more important!

So, what's the bad news? Well, Apple and Microsoft are being predictably slow in implementing the good and useful (the less kind would say the "not horribly broken and unsafe") algorithms that we badly need, such as the Elliptic Curve Diffie Hellman (ECDH, use above) or Elliptic Curve Digital Signature Algorithm (ECDSA) that are about 100 times faster, and yet 10 times more secure, than the previous generation of RSA-based algorithms.

You may have heard that everyone wants to deprecate SHA-1 as being utterly broken now. Guess which algorithm has near-universal support across all browsers?

There's a reason for this, and that reason is at the center of a large fight. The W3C crypto API does what the W3C usually does - it standardizes, but without mandating a standard. If that sounds odd, it's basically the process of writing down what everyone currently does (the "state of the art") and saying "It's ALL legal!".

I personally don't mind this approach. It recognizes the fact that you can't force the browser makers to do squat. No matter how many "thou shalls" you put in the spec, they won't if they don't want to. So they best you can do is get all the documentation out into the open.

After a couple of years, all the crap shakes itself off, and we get left with a minimal core of useful tools that actually achieve a purpose, and work in the real world. Then there's usually a V2 of the specification which normalizes that.

The browsers already have extensive crypto stacks, SSL layers, key vaults, etc. So the API makes it easy for the browsers to expose a lot of that existing machinery, even though most of it isn't that useful inside the sandbox. It's what they've got for now, and they get to show it off. That's why all the browsers offer SHA-1 despite it being horribly broken - 'cause it was already there.

With the Web Crypto API, that means 80% of the spec is dead and pointless on arrival, Old crap that should have been left to rot, but got dragged along because some company wanted the backwards compatibility with some obscure password system, and didn't want to spend time and money working on writing the good stuff. 

The spec is almost unreadable, even for a W3C recommendation, and I've been doing this a while. hundreds of pages long, and it still excludes a lot of the important technical detail via references to IEEE documents.

But that's OK. This is Javascript. Just dance around the landmines, and you'll be fine.

(For a full map of the landmines, try or the spreadsheet of Browser Support for the Web Cryptography API )

The browser makers that don't come up to snuff fast enough, they'll just get a polyfill. Sure, their browsers still won't be 'secure', but they'll be able to talk to our browsers, which suddenly are. The weak links will get pressure applied until they crack.

To be honest. Microsoft Edge is looking pretty good here. For most of the new HTML5 features, I'm pretty impressed with The Edge, and I'm happy to put it close 3rd after tied Chrome and Mozilla, but clearly closing the gap. They don't have ECDH yet, but it smells like they're close... certain error messages have the feeling of "not implemented yet" rather than "Not a feature!", if you know what I mean. They're even talking about protecting CryptoKeys stored in indexedDB on Edge at the OS level, which is the sort of thing that shows real thoughtfulness. 

Safari is becoming the great standards holdout, especially so on iOS. Which is a big shame, because most of these new technologies are especially useful on mobiles. The fact that you can generate a secure Elliptic Curve key in milliseconds, even on low-CPU mobiles, (rather than seconds for RSA) means they're more useful and save battery life. (And that's the last platform you want to polyfill math routines on.)

The lack of ECDH/DSA on iOS is probably the biggest thing holding back HTML5 apps from communicating securely with each other, as general practice. But Apple's walking a tightrope there... if HTML5 pages become too capable, (with offline storage and crypto and media systems) who will visit the App Store?

But I'm not worried. This standard is being implemented surprisingly quickly, and as I said, there's a core of about 20% of the API which does some wonderful, fantastic, critically important stuff that the web has been waiting for, for decades

That's the reward for anyone willing to dive deep into the API. I'm sure over time, there'll be some handy jQuery functions that make it all easy. But remember, they named it "subtle" for a reason

Sunday, December 20, 2015

You can't take the sky from me

Oh, but the FAA is trying hard!

So full disclosure first - I'm not a US citizen, and I really, really like flying robots. (I've built four of them.) I'm also a big fan of logic and consistency, which used to be a popular band back in the day but not so much anymore.

In the US, the Federal Aviation Administration has announced it will require all "drone" operators to register with the agency. Their definition of "drone" includes any remote controlled flying device over .55lb. (250 grams) That means quads, multis, foam planes, helicopters, blimps, balloons, and possibly Dune Buggies if they have too much 'hang time after sweet jumps. (it's unclear)

Paper planes are still fine. Any uncontrolled thrown thing is fine. 50-foot Frisbees are allowed. Rockets are cool. If you put foam wings on your iPhone, you're in a grey area deeper than shadow.

I'm not in-principle against enhanced safety, but this doesn't do that. The word "Overreach" is being used a lot. Congress explicitly said they couldn't do this. This moves the FAA from regulating a few dozen major airlines, to regulating the behavior of millions of private US citizens.

Quick review of what the FAA is: It got it's major "powers" in the 1960s, at a time when passenger planes were colliding over New York and dropping flaming wreckage on sleeping people in their apartments. People didn't like that. So the response was to invent the Air Traffic Control system and give the FAA powers over civil aviation, instead of letting the Airlines make up their own rules.

This has done a great deal to improve air safety. But it should be noted that planes still crash on New York quite a lot. There were the famous 11th September 2001 incidents, but who remembers Flight 587? which two months later crashed onto Queens because Air Traffic Control had told them to take off into the backwash of another plane, and some "aggressive piloting" caused the tail stabilizer to snap off. At a time when you'd think they'd be paying attention.

In fact, if you look at the big accidents (rarely deliberate), they're all caused either by pilots crashing into things they couldn't see, (like mountains) or Air Traffic Control directing them to crash into things they didn't know were there (like other planes).

Not a single aviation fatality has actually occurred because of RC hobby planes. Which have also been flying since the 60's, long before modern brushless motors and batteries. (The 'gasser' era.)

Military drones have caused crashes, it's true... but not Hobbyists. In fact, there have been 400 major accidents caused by US military drones. (which are the size of a car, and often armed) They once hit a C-130 Hercules. (literally, the broad side of a barn) But the FAA doesn't regulate military air traffic. And it likes to exaggerate the civilian threat.

One of these things is not like the other.

So, the FAA has announced it will create a "Drone Registry", so that anyone who intends to do bad things with a drone will helpfully write their details on the device, and this will help police track them down and arrest them for bad behavior.

No, really! That's their cunning plan. Some cynical observers suggest this is just stage 1, and future stages will require anyone buying an RC device to provide registration at Point Of Sale, otherwise the whole concept is utterly useless. And then they'll have to regulate batteries, motors, and computers, because otherwise you just buy the parts off eBay and build it yourself.

Or alternately, if a Policeman sees you flying in a park, they can ask for your registration and thereby keep the sky safe from bad people.

So, all we have to do to eliminate the "drone threat" is to put millions of US citizens (many of them children) into a huge database that will be publicly accessible by anyone who wants their phone number and home address. The FAA will have enforcement powers over every family in the country.

One of my favorite things is the $5 registration fee. That doesn't sound like much, true, but that's also the same cost to register a full size Boeing 747 Jumbo Jet. Another sign that the FAA doesn't really distinguish between a hundred tonnes of flying metal and a piece of motorized foam-board.

This also costs $5 to register with the FAA.
It's a real one.

Amazingly, the US congress told the FAA they couldn't do this. The FAA went and did it anyway. Despite long-standing legislation that reads:

Notwithstanding any other provision of law relating to the incorporation of unmanned aircraft systems into Federal Aviation Administration plans and policies, including this subtitle, the Administrator of the Federal Aviation Administration may not promulgate any rule or regulation regarding a model aircraft, or an aircraft being developed as a model aircraft, if—
(1) the aircraft is flown strictly for hobby or recreational use;
(2) the aircraft is operated in accordance with a community-based set of safety guidelines and within the programming of a nationwide community-based organization;
Meanwhile the AMA (The Academy of Model Aeronautics, one of those "nationwide community-based organizations" the legislation mentions) has told all it's members to hold off on drone registration while they try to sort through all the conflicting reports. Latest news is that they intend to take legal action to fight the new rules.

And now, these people might get to weigh in.
Including the Notorious RBG!

So, in summary: the FAA wants every hobbyist over 13yo to put their details in a public database, (because, y'know, privacy of the general public is important...) contrary to existing law, and the leading community organisation wants to take it to court. Hobbyists are furious. None of the new rules will make the skies any safer.

It's a path that treats RC craft purely as a threat to "real airspace users", and ignores the immense opportunities. And it also puts the FAA on a collision course with civil liberties for the American public, and that's the kind of thing that gets them hauled before the Supreme Court which might strip them of their powers as unconstitutional overreach, (you can't even force Americans to register their guns!) and we'll have no oversight, which is even worse than bad oversight.

It's a shambles. A hypocritical, pointless, mess. Years will be lost fighting the "freedom vs order" civil war, instead of just pushing for technological solutions to what are essentially technological problems. (hint: GPS broadcast beacons & official listed "crashing zones" for RC craft that need to get out of the way of emergency crews. So models can automatically go "If I sense a medivac chopper nearby, I'll crash myself in the nearest zone".)

Instead, I'm sure everyone is busy stripping off their backups, flight loggers, and safety gear - to fit under the 0.55lb weight limit. Those parachute systems are heavy, y'know.

Thursday, December 3, 2015

Don't Do This #102 - GPS and raspivid

If you're like me, you've often thought, "I really need GPS on a high-resolution camera, and probably  accelerometers too, so I can do photogrammetry."

OK, maybe you don't. Even my spellchecker doesn't like the word "photogrammetry", which is when you take a whole bunch of photos of something with the intention of creating a 3D model (or other measurement) from the imagery.

Like what land surveyors do when they fly over with cameras to create topological maps. And like that, it really helps to know exactly where you were, and how the camera was positioned. A lot of the new algorithms can get by without it, but there's a time cost, and a lot of pathologies can be avoided if we start with a good bundle estimate.

Here's what I did as a first go:

An Aerial Photograph of my Aerial Photography Machine

That's a Raspberry Pi model A, with the 2k camera and WiFi modules, connected to a UBlox Neo6 GPS I got from Hobbyking last year. Less than $90 of stuff, most of which has been used in other projects. (And will again)

Techno-periscopes Up!

So here's what you need to know first: It doesn't work.

Well, I mean all the independent bits work fine, but not all together. That's the point. To spoil the ending: When the camera is operating, so much multi-megahertz digital interference is generated by the flat cable connecting the camera module, that the GPS loses signal lock.

Thar's yer problem right thar, boyo! The big flat white thing
what's right near the little square doodad. And all bendy, too!

I'm sure I could also make a gripping yarn about how I bravely tracked down and cornered the bug, and how developing many features at once (streamed low-latency WiFi video, plus GPS) is a great way to find the problems, but leave yourself very confused about what causes them.

Close-up of the connections, showing how easy it is to wire a 3.3v GPS to the Pi.
Standard linux 'gpsd' is used to decode the signals.
The plastic cap of the left is just to protect that end of the connector from physical damage/shorts.

For a long time I assumed it was the WiFi streaming part, since it's an RF transmitter, and the GPS is an RF receiver, and all of it is cheap as beans, so logically... but no! Those parts are well engineered and stay out of each others bandwidth. You can WiFi and GPS just great. But the moment you start recording video to /dev/null, the GPS lights go out. That was the clincher.

If you're taking still photos, it's mostly fine. The GPS can stay locked on, and the brief static bursts during the camera shots are ignorable.

But I wanted video. And the moment you open up the throttle, it all fails.

Now, the obvious potential solution is to wrap a foil shield around the flat ribbon cable, especially where it bends, but that's something I'll need to do with great care, otherwise I've potentially got bits of foil flapping against the main electronic components and that's when the magic smoke comes out. There's also the question of how much of the digital path is exposed on the board. That would be harder to fix.

Perhaps a ground plane to shield them from each other; but shoving sheets of tin or copper in between is going to cause other issues, like making the WiFi directional, and other near-field effects. Argh.

So, you're saying the correct solution is a tiny Tinfoil Hat for the electron gnomes?

Also, while the GPS and cable are pretty much right next to each other for illustrative purposes, I can assure you I tried moving the modules as far as I could (cable allowing) and it didn't help. I'm sure I could manage it with a long enough GPS extension cord, but if it can't fit in the one box, It's not very convenient.

But it you have a choice, plan on spreading the pieces out. That's probably your best bet.

So, alas, I don't have any guaranteed solutions to the problem yet. But I wanted to warn 'ye anyways.

Wednesday, November 25, 2015

The Infinite Reality Show

I tend not to spend too much time talking about the long-term intentions of the work I'm trying to do with Astromech. Partly because I'm working it out as I go, and mostly because I've heard no end of "It's gonna be great!" exhortations in my life that turned out to be vaporware and I don't want to be That Guy.

I wanted Astromech to get to the point where it could provably do the essentials, before talking about the possibilities. In the last few weeks, those essentials have come together.

Data acquisition from commodity image sensors, check. Real-time DSP of the data using Fourier transforms, check. FITS Compression of the data using Harr transforms, finished a few days ago. Distributed comms working. Video comms working. Thousands of lines of '3D visualization' code.

So, what's the point? It's all about mapping reality. Let's take this in stages:


If you want to see the universe, you need to point a telescope at the sky. There's a great deal of optical and mechanical engineering involved, but you can shortcut that and buy a surprisingly good 6-inch Maksutov-Cassegrain off ebay for a few hundred bucks.

Most of science is about taking picture of fuzzy blobs.
Here's my first image of Saturn.

For most people, this is where the hobby ends. Every now and again the dust gets blown off, they observe Saturn, get their Wows, and no actual science is really achieved.

A smaller cadre of 'serious' amateur astronomers are out every night they can get, some with automated telescopes of surprising power and resolution. Some treat it like a professional photography shoot with less catwalk models and more heavenly bodies, and get quite a good income. But the vast amount of that data just sits on hard drives, not doing science either.


For science to happen, you have to write all the numbers down. Eyes are terrible scientific instruments, but it also turns out JPEG or the h264 compression algorithms are equally bad, literally smoothing out the most important data points.

It's why professional photographers currently make the best amateur astronomers, because they have access to acquisition devices (eg $4k Nikon cameras) which don't apply this consumer-grade degradation.

When you look into the details, what stands out is that the same hardware is often involved, it's the signal processing chain that's different.

Here is where we start having to consider our 'capabilities', in terms of how much CPU, memory, and bandwidth you have. If you point a high-resolution camera at the sky and just start recording raw, you will very quickly overwhelm your storage capacity, even if you have a RAID of terabyte drives.

Trust me on this. Been there, still haven't got the disk space back.

And the sad thing is, if like me you used "commodity" image capture hardware then the data is scientifically useless. Just pretty pictures.

Video imagery of the moon, taken through my telescope in '13.
You need 'raw' access to the pixel data, which is coming in a torrent. A flood. 4000x4000 images, one per few seconds, if you're using a DSLR camera and lucky imaging. Some people who look for meteors use 1024x720 video streams at high framerates. When you see a professional observatory, you're looking at a cool digital camera all right, but one that's literally sitting on it's own building-full of hard drives. That's a big memory card.

Signal Processing

If you want to turn that raw video into useful data, you have to bring some fearsome digital signal processing to bear. Just to clean up the noise. Just to run 'quality checks'. Then there's the mission-specific code (the meteor or comet detector algorithms, if that's what you're doing) and the compression you'll need to turn the torrent into a manageable flow you can actually keep.

Not just to store it to your hard drive. But also to "stream-cast" it to other observers. Video conferencing for telescopes.

Why? Because when strange things happen in the sky, the first thing astronomers do is call each other up and ask "Are you seeing this?". Some of those events are over with in seconds, and some of the greatest mysteries in astronomy persist because, basically, we can't get our asses into gear to respond fast enough.

Have we learned nothing from Twitter?

We can't wait for the data to get schlepped back home, and processed a week later. We need automated telescopes that can get excited, and call in help, while we're over the far side of the paddock having tea with the farmer's daughter.

Ground Truth

Up until now we've just been talking about slight improvements to the usual observer tasks. Stuff that's done already. Making the tools of Professional Astronomers more available to amateurs is nice (and as we've seen, that's really all in the DSP.) but what's the point?

Here we could diverge into talking about an algorithm called "Ray Bundle Adjustment", or even "Wave Phase Optics" but Ray's a complicated guy, so I'll sum it up:

If you want to reconstruct something in 3D, you need to take pictures of it from multiple angles. You probably guessed that already. There's a big chunk of math for how you combine all the images together, and reconstruct the original camera positions and errors. Those are the "bundles" that are "adjusted" until everything makes sense.

The more independent views you have on something, the better. Even for 2D imagery. Even aberrations in the sensors become useful, so long as they're consistent. It can create 'super-resolution".

Beyond that, there's "Light Field Cameras", which use a more thorough understanding of the nature of light to produce better images - specifically that traditional image sensors only record half the relevant information from the incoming photons.

Most camera sensors record - for each square 'pixel' of light - how much light fell on the sensor (intensity) and it's colour (wavelength). What you don't get is the direction of the incident photon, (it's just assumed) or its phase. 

For a very long time we thought those other components weren't important, mostly because the human eye can't resolve that information. Insects can perceive these qualities, though. Bees can see polarization, and compound eyes are naturally good at encoding photon direction. We couldn't, so we didn't build our telescopes or optical theory with that in mind.

Plus, the math is hard. You have to do the equivalent of 4D partial Fourier transforms. Who wants that?

But when you work through it, you realize that you can consider every telescope pointed at the sky to be one element of a planet-wide wave-phase optics "compound eye" with the existing hardware. (and maybe a polarization filter or two)

All we need to do (ha!) is connect together the computers of everyone currently pointing a telescope at the sky, and run a global wave-phase computation on all that data, in real-time. (I might be glossing over a few minor critical details - learn enough to prove me wrong.)

This is not beyond the capability of our machines. Not anymore. The hardware is there. The software isn't. This is what I've been working on with Astromech. A social data acquisition system that assumes you're not doing this alone.

What you get out of this is "Ground Truth", a term that mostly comes from the land-surveyors who are used to pointing fairly short-ranged flying cameras at a very nearby planet. But it's the same problem.

This is the stage we can finally say we're "Mapping." Once we got enough good photos of the asteroid Ida, we constructed a topographical map. Once we got enough information on it's orbital mechanics, we could predict where it would be.

Fundamentally, that allows us to prove our mastery of the maps by asking questions like "If I point a telescope at Ida right now, using these co-ordinates, what would I see?"

ie: Can I see their house from here?


To really answer that question means you have a 3D-engine capable of rendering the object using all known information. If we assume Ida hadn't changed much in terms of surface features, then it's pretty easy to "redraw" the asteroid at the position and orientation that the orbital mechanics says.

Then you just apply all the usual lighting equations, and you'll have a damn passable-looking asteroid on your screen.

But it's not 'real' anymore. Not exactly. It's not an image that anyone has taken in reality. It's a simulation. A computer-enhanced hallucination. A flight of the imagination.

Good simulations encode all the physically relevant parameters, and the main point of them is to provide a rigorous test of the phrase "That's funny..." ("How funny is it, exactly?")

Because by now humans are pretty good at predicting the way rocks tumble. It's kind of our thing. When rocks suddenly act in a way other than predicted (than simulated) it indicates that we've got something wrong. Or something interesting is going on.

And being wrong, or finding something interesting; that's Science!

Simulations are also the only way that most of us are ever going to "travel" to these places. Thankfully our brains are wired such that we can hang up our suspenders of disbelief long enough to forget where we are. Imagination plays tricks on us. There are people right now (in VR headsets viewing Curiosity data) who've probably forgotten they're not on Mars.

Used the right way, that's a gift beyond measure.


About the most interesting events we can see is stars blinking on and off when they shouldn't be.

Yes, this happens. A lot. Sometimes stars just explode.

Then there's all kinds of 'dimming' events that have little to do with the star itself, just something else passing in front of it. We tend to find exoplanets via transits, for example. Black holes in free space create 'gravitational lenses' that distort the stars behind them like a funhouse mirror, and we might like to know where they are, exactly.

Lets say we wanted to watch all the stars, in case they do something weird. That's a big job. How big?

Hell, if we just assign the stars in our galaxy and we get every single person on the planet trained as an astronomer, then each person has to watch vigil over 20 stars. (assuming they could see them at some wavelength.) If we're assigning galaxies too, then everyone gets 10,000 of those.

Please consider that a moment. If every human were assigned their share of known galaxies, you'd have 10,000 galaxies to watch over. How many do you think you'd get done in a night? How long 'till you checked the same one twice and noticed any upset?

We're gonna need some help on this.

And really, there's only one answer. To create little computer programs, based on all our data and simulations and task 200 billion threads to watch over the stars for us, and send a tweet when something funny happens.

We can't even keep up with NetFlix, how the hell are we going to keep up with the constantly-running terra-pixel sky-show that is our universe?

I've got a background in AI, but I'll skip the mechanics and go straight to the poetics; we will create a trillion digital dreamers -  little AIs that live on starlight, on the information it brings, who are most happy when they can see their allocated dot, and spend all their time imagining what it should look like, and comparing that against the reality. Some dreamers expect the mundane, others look for the fantastic, and bit by bit, this ocean of correlated dreamers will create our great map of the universe.

Every asteroid. Every comet. Every errant speck of light. Every solar prominence or close approach. We are on the verge of creating this map, and the sentinels who will watch over the stars for us, to keep it accurate.

There's not a lot of choice.

Tuesday, November 17, 2015

Harr Harr!

Here's today's development screenshot of Astromech: (from the virtual DSP desk.)

An image that will give your PNG decompressor conniptions, no doubt. The middle screen-full of leafy trees is a live webcam feed from out my window. The pink lines all across it are because it's a shitty webcam that cost $6 off ebay.

The left-hand screen is a 256x256 real-time Fast-Fourier Transform of the webcam luminance. That's not big news, Astromech has always done that. Its first trick.

The right-hand screen is the new thing for today. It's a 512x512 "H-Transform" which likely originally stands for "Two-Dimensional Harr Transform". I also call it the "Hubble Transform", because it's the basis of the compression format the Hubble data team invented in order to distribute 600Gb of their pretty pictures.

The full text I'm following here is Tiled Image Convention for Storing Compressed Images in FITS Binary Tables  published by NASA.

Don't let that NASA appellation fool you into thinking there's anything hard about the H-Transform. Compared to the FFT or Cosine Transform or Huffman coding it's very, very simple. And the best thing about the H-Transform is that it's parallelizable on WebGL. (as you can see.)

That's how I'm doing this in real time, (about 12 frames per second I'd guess, limited by webcam speed) in my browser, and my CPU usage is 20%.

Why the H-Transform? Why not just use something browser-supplied like h264, or V8 or a stream of JPEG/PNG images? (MJPEG!) which is built into most modern browsers? Well, in a nutshell, because nice as they are, they're not "scientific".

There a really big difference between a compressor that optimizes for the human perceptual system, and a compressor that tries to preserve the scientific integrity of the source data. The H-Transform is the second type.

Similar to a 'MIPMap', the H-Transform encodes a pyramid of lower-resolution (but higher entropy)
versions of the source image into the lower-left corner, like a fractal.
The larger 'residual' areas become easier to compress.

That's why NASA trusts data that has been stored in that format. It has certain very nice mathematical properties. It's a 'lossless' compressor, but one with a tuneable 'noise floor'. If that seems a contradiction, welcome to the magic of the quantized H-transform, where 60:1 compression ratios are possible.

There's a couple of stages to go before that image on the right is turned into a FITS file, but the hard part is done and the rest is just shuffling bits around. Well, assuming the browser will let me save a stream of data to a file. That's really tricky, it seems.

Update: 22/Nov

The whole thing provably works now, since I've also implemented the inverse H-transform. (there were a few bugs)

The inverse of the transformed cat is also a cat. Well, you'd expect that, surely?
Basically in this example, the middle webcam image is encoded into the right-hand image (which looks fractal yet empty - that's the H- transform) and then that is run through the Inverse transform (a separate bit of code that does everything in a different order, using the big mostly-empty texture) to go in the left-hand window.

It's almost too easy.

And so the fact that the two left images look boringly identical is a good thing, given the (poor quality to begin with) data has been mangled twice in between by me. Cat sitting on warm computer staring at cursor. It's a common test case around here.

Friday, November 13, 2015

Fire Map 2015

First of all, sorry to everyone who uses my Fire Map. I hadn't noticed that the NSW dots were no longer showing up. I just fixed that. Apologies.

Since I'm no longer connected with departments that produce internal reports on the coming fire season, I really don't have any idea what the forecast for this Christmas is. However, I've just noticed the fire-line across the Northern Territory:

That's a lot of red dots, big enough to be visible from space. Those are not good dots.

And from what I remember with my flawed brain, that staggered line is a trend-setter that continues south in waves over the next few months, as those heat conditions spread.

So, it's probably a good time to consider the history of the Fire Map.

The Past
When I started the map, each service ran their own state incident map (and still do) and I knew that there was no federal agency that had the remit to make a country-wide map. Google had a fairly good relationship with NSWRFS and had just added Victoria, but had no plans to add Queensland or beyond. There seemed only one thing to do...

The Present
Frankly, I've been neglecting the Fire Map for Astromech and day jobs. It just kind of sits there, working away, and really only needs occasional checks and updates when one of the services changes their servers in some way. Fortunately, you can always depend on the inertia of government departments, and I've gone for years without to much issue.

The Future
I'm not sure. I just noticed that the Northern Territory finally has their own incident map / feed, and the perfectionist in me always felt annoyed that there was one big empty chunk on the map. I may have to fix that, at which point the tapestry will be complete.

But long term, the map has no future. I nearly shut it down last year when the hotspots went away for a while, meaning there was no effective difference between my map and the Google Crisis Response map, which has come along enormously in the last few years. I was a momentary expert three years ago, but I really haven't kept up. 

I shouldn't be doing this. The only reason I do is because there doesn't seem to be anyone else with the same focus, and the needed abilities.

I'd love to hand it on to some official organization that can form relationships with the state agencies that source the data, and thereby have more warning of server changes than "crap, the dots aren't working today." But that doesn't seem likely either. I once had grand ideas about doing bushfire prediction, (which I still think is very possible) but that was when I had access to data, and the people for whom those answers were relevant.

I'd be interested in advice for what I should do in the future. Is the map still needed?

Update: NTFRS data online

It would have to be Humpty Doo.
Adding the Northern Territory to the fire map took almost 1/3 of a packet of Anzac biscuits to accomplish (and two years of waiting). That means the map is complete. Every state is on there, making it, truly, the "Australian Fire Map" at last.

My work here is done.

Why Video Games were Not Art

I always thought that computer games were a form of Art. Today, I learned why I was wrong.

First, let's start with a definition of "Art" that is mostly "That which is shown in museums and galleries and whatnot." Statues are shown in museums and galleries, therefore statues are art. Paintings, sculpture, movies, songs. Note these are all forms, rather than specific things.

So, why aren't computer games Art? Because, in a nutshell, Museums couldn't exhibit them, without fear of prosecution from the rabid armies of copyright lawyers engaged by the industry to protect their products.

That's it. QED.

It didn't matter how many soulful designers cried that they were doing more than just extracting quarters by addicting teenagers to blinky lights; they were being shivved in the back by their own legal departments, who enjoyed wielding the power of the precious DMCA.

If video games were Art, or were recognized to have cultural significance, then they would have status beyond that of mere "product" and society at large would have a different relationship, with more rights of re-use. Can't have that.

I didn't realize a lot of the internals, before I read the excellent article:
Understanding this years biggest video game copyright ruling at Gamasutra.

The good news, since the US Copyright Court (ugh, from the "why is this even a thing" category) has now said that Museums can show off old games without fear, there might one day be an exhibit of 'classic games' at your local museum, as perhaps should have been possible all along?!

Sometimes you only notice how bad the stupid got when it takes a step back. And you think "good start, and another please?"

Sunday, November 8, 2015

Astromech - The Road to Beta2

Going quiet means I've been getting things done on Astromech. I have a set of specific features I want in the next beta, and in the last week nearly all of them have reached a semi-stable point.

Probably the best single demo of this is my little homage to the Caffeine molecule:

But first, the setbacks. The big one is; I've had to seriously question my use of Google Drive.

Anyone who saw the original video noticed how heavily I relied on Google Drive for the task of storing the bulk assets in each 'level' as well as building a collaborative editor (using the Docs realtime API) to set up the scene / load script.

Here's the stage I got to with that, before some rather bad news broke:

A mobile-friendly way to edit assets and  DSP 'circuits', backed by the Google real-time collaborative API.
Shame it will probably never see the light of day.
And that's just part of it. There's also the Panel designer - a hierarchical 'box layout' editor for all those cool LCARS-like consoles that litter the Astromech levels.

But unfortunately, Google has announced that they will disable file hosting from Google Drive shortly. I ranted a little about that in my previous post.

That has two very specific impacts on the Astromech 'GUI' editor. It means the files that it creates can't be read anonymously anymore. So. any Astromech levels based on a script stored in Google Docs will not be accessible to everyone. That's bad enough on it's own to kill that part of the project dead.

What's the point of collaboratively editing a "public world" file if the world's public can't read it?

And where do you put all the resources files it references? On another service? Then what's the point of using all the Google Docs API's if the 'real' data is elsewhere?

Shoving everything into one directory made things nice and manageable, using relative links. Once you've got your resources scattered across half the internet using absolute URI's, it makes things so much harder.

It's not just that I'd have to add a "Save As..." button to the Google Drive app, I have to re-think the entire premise of how users collaboratively store and work with terabytes of data. Instead of a central dumb (but reliable) fileserver and peer-to-peer clients, I'll probably need a peer-to-peer _server_ layer as well. ie: I need to replace Google by Christmas.

The old levels still work for now, but the access they depend on is deprecated and goes away next year. But it's the wasted effort in that direction that really hurts. Hopefully I can salvage most of the UI and editors, while backing them with a different datastore.

Then, there was the whole getUserMedia http:// deprecation thing I had to deal with. Within months, "powerful browser features" (which is basically everything I use.) will not work from http:// servers. Only https://.

This really broke me for a couple of days (I even got into a discussion with the list) because it implied that running Astromech would only be possible from the Internet by paying money to Versign-derivatives. Not, for example, on your own goddamn own computer.

I settled down a bit when it was pointed out that localhost: is supposed to be considered "secure", (even if it only uses http:// without the SSL) So you will still be able to download and run Astromech on your own machine and use all the features. You can imagine how the alternative would have been maddening.

However, this still leaves the localnet in limbo. It's no longer clear how you'd run the software on your own desktop and access it, for example, from your own iPad over your own WiFi network. (Why, it's just as simple as creating your own SSL certificate authority for signing local machines, and then installing the certificates on each device, of course... why are you complaining?)

It was always in the back of my mind that a piece of Astronomy software that only really worked indoors while connected to high-speed internet might not be as useful as it could be. (instead of, say, in a dark paddock filled with amateur astronomers bristling with advanced imaging equipment and local bandwidth but poor global internet connectivity.)

I really don't want Astromech to be a "local webserver" install, for every individual user/machine. It should be more like running a minecraft server. If you need to install local servers everywhere to get a browser app to work, then what's the point of doing it in a browser? Why not just write a full application?

And besides, it seems really counter-intuitive that the only way to work with/around the 'increased browser security' is to start installing local code (eg, a node.js micro-server) with full binary access to the machine. That's just security whack-a-mole. If the machine gets boned through an exploit in my code, then it's not their fault for leading me down that path, obviously.

But the browser makers are determined to deprecate http:// and that's that. It doesn't matter that https:// is flawed, costly, inefficient, and creates barriers to entry.

OK, so now the good stuff that's made it into Astromech in the last few weeks:

iOS + Edge support
Astromech now 'works' on iOS9, to the extent it will load and render the scene using WebGL. What it doesn't do very well (or at all) is replace the keyboard/mouse control scheme with something that functions equivalently using touch.  I'm probably going the little "thumbsticks in the screen corners" route there, as soon as I get the time.

Microsoft Edge is running Astromech fairly well, better than IE did, but it also has some feature gaps (like getUserMedia / WebRTC) that effectively disable some of the more advanced features.

Chrome and Mozilla are still the 'preferred' browser, but all roads are slowly leading to HTML5 compatibility across all devices.

Improved Blender/WebGL shaders
The first-gen model loader did well with geometry, but badly with surfaces. For a start, only the first texture worked, and there was no real lighting model. So, scenes looked very different in Astromech from how they originally looked in Blender, even if the geometry was correct.

The 'shader compiler' I wrote has been extended with a full multi-source specular lighting model, with 'sun' and 'point' lights. Technically it does a Lambert/Blinn-Phong pass with fixed lights.

So, now you can export a fairly generic existing Blender model (instead of carefully building one specifically for Astromech) and it will mostly work as you expected. Common surface materials work. Multiple scene textures work. Bump maps sort-of (they have the common view-independence problem because of the lack of tangent vectors in the collada export, so the bumps always point 'up' instead of 'out', though there's probably a way to solve that. Good for floors though.)

I still don't have a great solution for transparent surfaces, but then, neither does anyone else.

Multiple Scene Models
Version 1 only properly loaded a single 3D model as the 'primary scene'. That's been fixed, so you can load an arbitrary number of collada files, and instance them multiple times within the scene at multiple locations.

eg: In the "Atomic Caffeine" demo, each of the four atoms was modelled/coloured in Blender, and then instanced into the scene as many times as the entire molecule needed.

New features sometimes magnify minor old problems; in this case the lack of a global lighting model. Since each 'scene' model carries its own lights in its own reference frame, obvious visual inconsistencies occur when you put several models together and rotate some of them. (Although, less jarring than I'd have thought.)

Fully dynamic lighting is a major overhead with diminishing returns. So, I'll probably go for a compromise, with only a few global dynamic lights.

Cannon.js Physics Engine
The other side of the multiple-model system is the ability to define a 'physics proxy' (usually a box or sphere with properties of mass and friction) to which the position of the 3D models is attached.

I've chosen the cannon.js physics system to do the heavy lifting. It can connect the proxy objects together with 'hinges', 'springs', and other physical constraints like gravity, and then model the physics over time and update the objects.

It's extremely efficient (the solver it uses is very advanced) although there are severe practical limits to just how much you can do in real-time. But a little physics is a great way to add some life to an otherwise static scene, and give the user the sense that they're there, and bumping into things.

Scriptable UI
I've just about finished exposing all the things that Astromech can do as scriptable elements - as opposed to my early examples that used lots of  hard-coded javascipt.

It's slightly less flexible than the raw javascript - at least until I create a 'module' system capable of safely loading arbitrary code. It's still just a set of pre-approved LEGO blocks you can arrange in various ways, but at least the set of blocks is getting bigger.

Scripts don't all have to run on load. The script can define UI "command buttons" which run parts of the script later... which might load more resources and create new buttons. A common use of command buttons is to provide "teleport" options which can jump you around the map.

In practice, you can already build a 'conversation tree' system which offers choices dependent on previous choices. (All the buttons would be pre-defined, just shown and hidden using commands activated by other buttons.)

Social 'presence' and messaging.
The chat system has been functioning for a couple of versions now, based on a websocket 'pub-sub' server that I'm running on an OpenShift cartridge. (Thanks, RedHat!) I've gone through a couple of revisions of this system, and it's been stable and reliable for months.

Previously, you'd get a 'chat message' when someone connected to the channel ("hailing frequencies open") but in the background the networking code always had the full list of the other participants this entire time, you just couldn't see them. Now, the right-hand of the screen is one long column of everyone else in the level with you.

This makes everything feel a lot more MMORPG, and future extensions will be things like "friends lists" and private instances that build on this social side, since there's going to be obvious problems if a 'level' gets too popular.

File Transfer & Videoconferencing
The first features the social list made possible was inter-user private chat (easy) followed by file transfer (not so easy, but mostly working) and video conferencing. (just got the prototype working)

It's a core idea of astromech that you should be able to exchange data with other people. This is an essential part of that plan.

The file transfer I'm particularly proud of. To 'transmit', the sender just drags a file out of the File Manager and drops it on the button for the intended recipient.  An 'offer' is then sent to the receiver, and turns up in their corresponding list of options for the sender.

If the receiver clicks on this offer button, the file is downloaded to the browser's "Temporary FileSystem" (you get a little progress message while the transfer is in progress) and then the recipient can either click on the button a second time which open the (now local) file in the browser, or they can drag the file icon back out of the browser to the filesystem again.

In summary, one user drags the file into their browser. The other user accepts the offer, and then can drag the file back out of their browser. (Well, Chrome) I don't think I can make it simpler.

Remember, this is peer-to-peer. Although right now all the comms goes through the chat relay, (as private messages) but I have the RTC channels working, so I intend to make that the preferred transport to make it truly peer-to-peer, and reserve the relay server as the 'fallback'.

Voice Recognition
This was nearly a 'freebie', in that I went - in one morning - from not knowing that browsers offer a full voice-recognition engine javascript API, to having it working by lunchtime.

Any 'command button' can be given an array of "speech" strings that, if heard by the engine, activates that command button. It's that easy.

It's good to have an optional prefix word that wakes up the engine but can be missed, because that happens a lot. Originally I used "computer" (duh) but soon changed it to "Scottie!" after shouting at my machine for a little while to transport me to new locations and switch on parts of the engine. Feels much more natural, somehow.

I could go on for pages about all the things I want to do next, the improvement and changes, but I think we'll just stick with what I've actually done so far.

The major features are now mostly in. There's a ton of clean-up work and major bugfixes that need doing before release, but no more super-bleeding-edge experimental features. I did the hard stuff first.

A 'Beta2' release isn't far off now. I'm trying to be quick, before the ground shifts under my feet again. It's not easy doing all this single-handed, but I'll get there.

Monday, October 26, 2015

Astromech Updates - Speculars & Physics

Time for some more pretty pictures, screenshotted just now from Astromech in the other window:

In Soviet Russia, Caffeine goes inside of you!
Recognize it? It's my second favorite molecule, the one that's powered much of Astromech's development so far. The still shot doesn't entirely do it justice, so I'll have to make a video shortly. Watching the molecule "fold up" from its usual flat schematic is a small revelation.

And yes, that's the "Starship Imagination" lurking in the background. You'll be seeing a lot of it. (You gotta have somewhere comfy to sit, when contemplating the universe.)

The colours? Blue is nitrogen, red is oxygen, grey/white is carbon. Most of what you're looking at are the 'P' orbitals overlapping. Oxygen has two coloured 'lobes' available for bonding, nitrogen has Three, Carbon of course has the full set of four. Things are scaled so a hydrogen atom's 1S orbital would be about a meter across, so the entire molecule is about the 'size' of a small building.

This is not your normal ball-and-stick molecular model, and yes it's harder to see the core structure as a result... but nature isn't as neat and tidy as our schematics would prefer. This is my own small attempt to better show the reality of what a molecule "looks like", if you could shrink yourself to its scale. Ghost-like waves of probability dominate, not little billiard balls.

Sensors are reading Wake-up-juice, Captain! We're saved!
There's been several advances in the code required to make all this happen:

- Scripting System
- Multiple scene models
- Physics Engine
- Specular Shaders (for that 'shiny' look)

In particular, the molecule is a huge "hinge and spring" system built with the cannon.js physics engine, with imported Blender 3D assets for each atom.

In fact, since I like to show the code, here's the part of the script that assembles the molecule:

{"collada":{ "id":"hydrogen", "url":"asset/collada/atomic/hydrogen.dae", "transform":"scale(0.75)" }},
{"collada":{ "id":"oxygen",   "url":"asset/collada/atomic/oxygen.dae",   "transform":"scale(0.75)" }},
{"collada":{ "id":"nitrogen", "url":"asset/collada/atomic/nitrogen.dae", "transform":"scale(0.75)" }},
{"collada":{ "id":"carbon",   "url":"asset/collada/atomic/carbon.dae",   "transform":"scale(0.75)" }},

{"collada":{ "id":"carbon",   "model":"translate(  0 -1 -20)" }}, {"physics":{ "model":"carbon.1", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"carbon",   "model":"translate(  0  9 -20)" }}, {"physics":{ "model":"carbon.2", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"carbon",   "model":"translate( 16  4 -20)" }}, {"physics":{ "model":"carbon.3", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"nitrogen", "model":"translate(  8 12 -20)" }}, {"physics":{ "model":"nitrogen.1", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"nitrogen", "model":"translate(  8 -4 -20)" }}, {"physics":{ "model":"nitrogen.2", "proxy":{ "type":"sphere", "radius":2 } }},

{"collada":{ "id":"carbon",   "model":"translate( -8 12 -20)" }}, {"physics":{ "model":"carbon.4", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"carbon",   "model":"translate(-16  0 -20)" }}, {"physics":{ "model":"carbon.5", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"nitrogen", "model":"translate( -8 -6 -20)" }}, {"physics":{ "model":"nitrogen.3", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"nitrogen", "model":"translate(-16  8 -20)" }}, {"physics":{ "model":"nitrogen.4", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"oxygen",   "model":"translate( -8 20 -20)" }}, {"physics":{ "model":"oxygen.1", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"oxygen",   "model":"translate(-20 -6 -20)" }}, {"physics":{ "model":"oxygen.2", "proxy":{ "type":"sphere", "radius":2 } }},

{"collada":{ "id":"carbon",   "model":"translate(-8 -12 -20)" }}, {"physics":{ "model":"carbon.6", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"carbon",   "model":"translate(-20 12 -20)" }}, {"physics":{ "model":"carbon.7", "proxy":{ "type":"sphere", "radius":2 } }},
{"collada":{ "id":"carbon",   "model":"translate( 16 18 -20)" }}, {"physics":{ "model":"carbon.8", "proxy":{ "type":"sphere", "radius":2 } }},

{"physics":{ "constraint":[
{"type":"point", "force":0.1, "from":{"model":"carbon.1", "point":[0,4,0]},  "to":{"model":"carbon.2", "point":[0,0,4]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.1", "point":[0,0,4]},  "to":{"model":"carbon.2", "point":[0,4,0]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.1", "point":[4,0,0]},  "to":{"model":"nitrogen.2", "point":[0,0,4]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.1", "point":[-4,0,0]}, "to":{"model":"nitrogen.3", "point":[4,0,0]}},

{"type":"point", "force":0.1, "from":{"model":"carbon.2", "point":[0,4,0]},  "to":{"model":"nitrogen.1", "point":[0,0,4]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.2", "point":[-4,0,0]}, "to":{"model":"carbon.4", "point":[4,0,0]}},

{"type":"point", "force":0.1, "from":{"model":"carbon.3", "point":[0,4,0]},  "to":{"model":"nitrogen.1", "point":[4,0,0]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.3", "point":[0,0,4]},  "to":{"model":"nitrogen.2", "point":[0,4,0]}},

{"type":"point", "force":0.1, "from":{"model":"carbon.4", "point":[0,4,0]},  "to":{"model":"oxygen.1", "point":[0,4,0]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.4", "point":[0,0,4]},  "to":{"model":"oxygen.1", "point":[4,0,0]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.4", "point":[-4,0,0]}, "to":{"model":"nitrogen.4", "point":[4,0,0]}},

{"type":"point", "force":0.1, "from":{"model":"carbon.5", "point":[0,0,4]},  "to":{"model":"nitrogen.3", "point":[0,4,0]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.5", "point":[0,4,0]},  "to":{"model":"nitrogen.4", "point":[0,0,4]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.5", "point":[-4,0,0]}, "to":{"model":"oxygen.2", "point":[4,0,0]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.5", "point":[4,0,0]},  "to":{"model":"oxygen.2", "point":[0,4,0]}},

{"type":"point", "force":0.1, "from":{"model":"carbon.6", "point":[0,4,0]},  "to":{"model":"nitrogen.3", "point":[0,4,0]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.7", "point":[0,0,4]},  "to":{"model":"nitrogen.4", "point":[0,0,4]}},
{"type":"point", "force":0.1, "from":{"model":"carbon.8", "point":[-4,0,0]},  "to":{"model":"nitrogen.1", "point":[0,4,0]}},

{"type":"spring", "length":40, "force":0.001, "from":{"model":"carbon.1"},  "to":{"model":"carbon.3"}},
{"type":"spring", "length":40, "force":0.0001, "from":{"model":"carbon.1"},  "to":{"model":"carbon.4"}},
{"type":"spring", "length":40, "force":0.0001, "from":{"model":"carbon.1"},  "to":{"model":"carbon.5"}},
{"type":"spring", "length":40, "force":0.0001, "from":{"model":"carbon.1"},  "to":{"model":"carbon.6"}},
{"type":"spring", "length":40, "force":0.0001, "from":{"model":"carbon.1"},  "to":{"model":"carbon.7"}},
{"type":"spring", "length":40, "force":0.0001, "from":{"model":"carbon.1"},  "to":{"model":"nitrogen.1"}},
{"type":"spring", "length":40, "force":0.0001, "from":{"model":"carbon.1"},  "to":{"model":"nitrogen.4"}},
{"type":"spring", "length":30, "force":0.001, "from":{"model":"carbon.6"},  "to":{"model":"oxygen.1"}},
{"type":"spring", "length":40, "force":0.001, "from":{"model":"carbon.6"},  "to":{"model":"carbon.3"}},
{"type":"spring", "length":20, "force":0.001, "from":{"model":"carbon.7"},  "to":{"model":"oxygen.1"}},
{"type":"spring", "length":20, "force":0.001, "from":{"model":"carbon.7"},  "to":{"model":"oxygen.2"}},
{"type":"spring", "length":40, "force":0.001, "from":{"model":"carbon.8"},  "to":{"model":"oxygen.2"}}

A little chunky, but that's it. Really. And yes, this is a rather fake system using springs to 'prop open' the molecule rather than simulating all the interatomic repulsive forces, but hey, you need to have room to improve.

Thursday, October 15, 2015

Google Drive Hosting? Not anymore.

So, this has been a kick in the teeth...

"Beginning August 31st, 2015, web hosting in Google Drive for users and developers will be deprecated. You can continue to use this feature for a period of one year until August 31st, 2016, when we will discontinue serving content via[doc id].

In the time since we launched web hosting in Drive, a wide variety of public web content hosting services have emerged. After careful consideration, we have decided to discontinue this feature and focus on our core user experience."

This is enormously bad. Not just for me, for everyone. I don't know how to communicate what the knock-on effects could be, but this is probably the end of "free" file hosting on the internet. It's certainly a major problem for my Astromech project, which currently uses Google Drive for users to store and publish their 3D assets.

If you're happy with an entirely corporate internet landscape, where every byte is bought and paid for by someone and the prerequisite for getting published is a credit card, then welcome to the new world.

But to me, that's not what the internet was for. The people who built the 'net were scientists and students, educators and children, a segment of the population which pretty much by definition doesn't have any money. Just time, and passion, and knowledge.

Now if you think "hang on, there's still lots of free file hosting services", then remember that none of them want you serving files out the 'back end' to all an sundry, like a website does. Filenames are obfuscated. Relative paths are broken. Logins are required. On Amazon, a whole eight machines can access the data! Eight!

Sure there's and MediaFire, but they want the chance to put splash advertising in front of the download. That doesn't work for AJAX includes.

The keyword here is "direct links". That's what lets you copy a directory of HTML and images to a server and all the hyperlinks work as intended. So /doc.html can refer to /image.png and the server knows to get it from the same place just next-door. If you copy a directory to DropBox, for example, each file gets 'stored' in what seems like a different randomly named directory, (although the 'base' filename is there) And don't even _try_ pulling down the public directory list and extracting the file paths. They're heavily obfuscated.

What really concerns me is that once Google kick everyone off their free hosting, the 'parasites' will move on to DropBox or Amazon or OneDrive and work out how to gank those systems. Then they will fall like dominoes playing a game of hot potato. And the abusers won't stop, because they have a vested interest. Basically, all the porn will just shift to the next service, and the next. Or into hacked accounts, since they're now valuable for their storage space. (And the problem will get twice as hard to fix. It's easy to blow away sock-puppet accounts, not so easy when they're hijacking real ones.)

And you can bet that other services, like blogger, will have to clamp down on their file uploads too. Or go away entirely. Otherwise it's just Google playing whack-a-mole with itself.

What will vanish is all those little one-person technical sites that served a useful purpose, but had no revenue streams. $200 a year is a lot for a vanity site. They will all just quietly turn off August 16 next year, and it will be like GeoCities shutting down, all over again. Vast chunks of unique content will disappear. Individuals have no place anymore. The porn and cat videos and corporations will remain. Sort of like a neutron bomb for the internet.

So, thanks Google. You've finally gone off the rails from your core promise of "We want more and better internet for everyone!" and added the caveat "So long as you can pay." And you'll have to go back to downloading exabytes of data that you used to store 'locally'. Hope that works out for you.

I'm sorry, but monetizing our friendship means you ain't my friend anymore. You pretended, offered me some nice tools, and used my hard work as the bait for the corporate switch, And you gave a one-paragraph transparently fictitious reason for doing so. That's not something I forgive. If you're going to carelessly destroy months of work, then I - at least - deserve the real reason.

"After careful consideration, we have decided..." Fuck you, Google.

Monday, October 12, 2015

Astromech Beta2 - coming soon

With all projects, you reach that point where it's all downhill from here. In the Sisyphean sense that you rolled that bloody rock all the way up the bloody hill in the expectation that it would, eventually, get easier all of a sudden.'re always looking for that point where you can give it a good shove, and the momentum builds on it's own, enough to maybe get you half-way over that next hill so long as you run fast enough to keep up. That next hill which you only just now can actually see, but really, you always knew was going to be there.

Well, Astromech is reaching that stage. Several highly-experimental chunks of code are coming together nicely. The first release wasn't much more than a poorly implemented asset manager, allowing a few disconnected media files to be dumped into a 3D environment from common editors.

None of that connected particularly well with the inner workings of Astromech, which has modules that do everything from Delaunay triangulation to Keplerian orbital mechanics to real-time digital signal processing of data coming from telescope sensors.

These are all hidden away in javascript modules that aren't finished, let alone documented. There's a scripting interface I haven't had time to work on that's supposed to connect to the inter-client comms system that's only used for chat, right now.

But everything is working, in it's way. All the prototype problems are solved. The Google Realtime API was giving me some grief until I rewired my head and figured out how it's meant to be used. (quick version: DO NOT store tree-like structures using the API. Flatten them into unordered tabular lists, and keep lots of lists. Otherwise, the multi-user undo/redo system will give you a bad time.)

What's coming soon is the new Astromech "editor" app, which is a near-complete rework of the old list of lists system. The new set of available objects is too large to maintain this approach, and so new ideas are needed. (Or perhaps, some old ideas need revisiting. Duh duh daaa!)

Also, the GUI-based editors for the DSP chain and Panel systems are done at last. Getting these two vast wodges available in the pointy-clicky interface has been a long road, but they are absolutely core to the idea of what Astromech is... not just a pretty face, but a tool for accomplishing some nifty science.

The DSP editor has the job of allowing the user to connect together processing blocks like 2D Fourier transforms, convolutions, etc. similar to the "node editor" in Blender, which creates complicated material and animation effects by mixing together an "algebra" of core operators.

Blender struggles to implement this functionality in a simple way, and it's got full control over the user's screen, mouse, and keyboard. My editor is usable on an iPad.

So is the Panel Layout editor, which creates hierarchical box layouts for UI and text displays that are applied to surfaces in the scene. Panels are a kind of sprite-driven micro-language for information displays. They made some early appearances in this blog, but have been missing from the recent demos, until I could make them available for everyone. In essence, chunks of JSON text are rendered to 'compressed' textures, which make use of "superpixels" (or "sprites" in the old days) that come from a common font/symbol swatch used by every panel in the system.

That means the first display console takes a fair chunk of GPU texture memory for the font swatch, but after that each (high resolution) text panel uses as much extra memory as an old-school character-based Teletext display. There's a massive global saving in texture memory, while still getting crisp-edged text on every surface.

There's more than a few cutting edges still sticking out, though, so I have to make another smoothing pass or two, a process of simply grinding away until - well, not until it's perfect - but at least has a chance to solve more problems than it would cause,  To be worth the time cost of learning how to work it.

One day, I hope to completely erase the line between "editor" and "scene", and have all these tools available while inside the 3D environment. That's a few hills over, though.

Monday, September 7, 2015

Apache Spark Overview

Let me tell you a secret.

Well, it shouldn't be a secret, but given how few people seem to know about it, we may as well be talking about Higgs-Englert physics.

It's Apache Spark.

"WTF is Spark?" you will probably say. You'll have heard about Apache, the venerable webserver software. You might not have really being paying much attention to their background projects which have been merging together over the last year like the arms and legs of some kind of Voltron. ("And I'll form the Head!")

Apache is making the transition from being the kind of software you run your old website on, to being the kind of software you run Twitter, or Facebook, or eBay on. Or Netflix, which is possibly the best case study for the software we'll be talking about, although that's more Apache Cassandra, a topic for another time.

At that level, there are new problems that are oddly different from the old ones. All these guys use cloud computing resources.. they don't really depend on physical machines. They rent them 'out of the air' for as many hours (or minutes) as they need. This is so they can increase the size of their clusters from say 20-50 to a few hundred for a couple of hours in order to handle peak loads.

eg: They don't repair servers. For most of those machines, no human will ever ssh into the box. In fact, they are usually put on a countdown for a rolling 'refresh' which shuts down the most ancient servers and replaces them with fresh ones, in the equivalent of a slow A/B testing transfer. Really clever systems stop the rollout of the new software and clone copies of the old, as automated reliability statistics come in.

But if a box is giving you trouble, you don't spend any of your time on it whatsoever. You mercilessly sent it back to the cloud, after getting a replacement.

At that level, it's all about "devops". Specifically what they call either orchestration or choreography. Depending, I suppose, on whether you listen to chamber music, or prefer dancing the samba.

Here's the Netflix devops problem: In each country, there are daily viewing peaks. There are weekly viewing peaks. These peaks are 10x the baseline, and last a couple of hours. Then most people go to bed. This is the predictable part.

Then there's the unpredictable side. When a beloved actor like Leonard Nimoy dies, there is a tendency for millions of people to go home via the bottle-shop and queue up every movie he's ever done as a kind of binge-tribute. I've heard.

And that's the kind of situation that your scalable internet service has to handle, if you're going to serve movies on demand to 100 million people. Very rarely, you have to be able to service everyone at once. And you cannot FailWhale. That was funny once, when it was Twitter. Once.

The most amusing thing about Twitter is that once they got past the FailWhale, their company value went from merely silly to completely ludicrous. We are talking 10 million to 200 million, because they proved they were finally in the big league. What was the technical miracle which banished the white Whale? It was Scala... the primary language behind Spark.

So, what's Spark? In a nutshell, it's next-gen Hadoop.

"What was hadoop again?" you probably ask, since you probably never used it. Well, it was a giant hack that allowed hundreds of computers to be ganged together and carry out various file-processing tasks across the entire cluster.

What for? Logfile processing, mainly. The daily re-indexing of Wikipedia. Places like eBay and Amazon used it for their first recommender systems ("other people also bought this!") and all because of the simple necessity of churning through more gigabytes of text than any single computer can manage.

You have to realize that, to a large extent, the billions of dollars that eBay and Amazon are worth are because of their "people also bought" recommender systems. That list of five other things (five is the optimum psychological number) absolutely must be the best possible, where "best" is defined as "most likely to buy next". This is not advertising, this is lead generation. There are metrics.

The point of lead generation is to turn each sale into an opportunity for another sale. "Accessorize, accessorize, accessorize!" and when those system break, or just degrade, then the bottom line impact is direct and palpable. Companies live and die by their ability to snowball those sales.

Netflix had this happen, and they offered a million dollars to the mathematician who could solve it for them. This was the famous "Netflix Prize". The resulting algorithm is now known as "Alternating Least Squares", and the details are a topic for another day.

Spark implements the ALS algorithm in it's standard mlib library. It's core. It's yours. You can have a million dollar algorithm, gratis. If you want to run ALS large scale, and this is most important - in real time - then Spark is the only option.

The only option, unless you want to spend about a man-century implementing the equivalent of fine-grained distributed transaction control and data storage, and that's just the infrastructure your math needs to sit on top of.

If you want to grow into the size of one of these services, you need to start with a seed capable of growing that large. Fortunately, in this analogy, those seeds are happily falling off the existing trees, and being blown about by the winds of open source software to the fertile fields of... some poetic metaphor I probably should have shut down a while ago.

That means Scala, and Cassandra. That means Zookeeper, and message queues. haproxy to spread the load. Graphite to chart the rise and fall of resources. Ansible to spin up the servers. It means dozens of support tools you've never heard of, and would never run by choice if you didn't have a pressing need to get the job done.

And these are all sub-components needed to support an overarching system like Spark - which schedules "parallel programs" across the entire cluster which are tolerant to the various slings and arrows of the internet.

There is a level above Spark which is still in formation - exemplified by the Mesos project. That one seeks to be a kind of  "distributed hypervisor" that can manage a cluster of machines and run many flavors of Spark, Hadoop, Cassandra or whatever within the single cluster. Otherwise we tend to get 'clusters of clusters' syndrome where each 'machine' is effectively only running one program.

You have the dev cluster, and the testing cluster, as well as the production cluster of course. Oh, and that's one each for the database cluster, webserver/app cluster, and the small front-end routing clusters or logging cluster that hang off the big clusters...

Yeah. Fire up the music, let the dance of clusters begin. Oh, and once you put on those magic dancing shoes, you can never ever take them off again until the company dies. This is that fairy-tale.

Spark is the answer to questions you haven't asked yet. Literally, that's the kind of algorithms it is specialized to run. And it scales all the way. That's its value. That's what Apache is doing these days, trying to close the conceptual gap so both ends, big and small, are using the same base code. I love it.

But no-one sells it, and the people who do use it in anger are too busy making billions of dollars to spend much time explaining exactly how, or writing documentation. You really gotta tease the information out of them, and watch a lot of their talks at Big Data conferences to see where all the pieces actually fit. There is an enormous learning curve.

And that's why it's still such a secret.

And given how many people read my blog, I'm sure will remain so.