Essentially it queries Foursquare for search terms then passes to Yahoo to find out the latlong from the place name given, and then uses the lat long with google street view to display the images for that location. The result is this below: When i search for Wellington i can see a whole lot of of people checking into their homes and publishing this information to the world...
Another rather humorous feature of the tool is it's general searches under the categories: Who wants to get fired? Who's hung over? Whose taking drugs and Whose got a new phone number?
So what does it mean? Well it kind of makes some of the privacy concerns about such location services a little bit more obvious for people who probably don't realize what they're doing when they are checking in. (despite the fact that it is re-iterated time and time again by privacy experts) I've mentioned this before too:
Modern society seems to be geared up to deal with and accept casual tradeoffs of privacy - I personally am subscribed to this. You're reading my blog and know I have a twitter, I'm willingly giving a plethora of information about myself out into the fabric that is the Digital Age, but hey, who really cares about what Mark is doing? Well, A tech savvy burglar could potentially use the information i'm publishing to find out where i live, when i am at home (and more importantly to them, when i'm not) and read my twitter to find out that i just bought a brand new TV... food for thought.
Anyway, take a look at the website: www.weknowwhatyouredoing.com
So a few years ago a good friend of mine wrote a series of tutorials on the most complex video game ever created: Dwarf Fortress. His tutorials gave him fame within Dwarf Fortress Circles and thanks to that O'Reilly approached him late last year to ask him to write a comprehensive guide on it in a published media format.
The game is always in a state of flux so it was also a prime candidate for O'Reilly to start testing out a new method of keeping their books up to date using version control using Subversion and O'Reilly's rapid publishing framework.
Anyway, the book came in April and there has been a lot of critiques by people commenting about the book being published without approval from Toady (the developer of the game) - but these are just internet scum who don't do their research, as if they even bothered to look at the free preview on Amazon.com they'd see that the book has a forward written by Toady himself!
Anyway, now after defending his honor… As Dwarf Fortress is highly verbose in every interaction that occurs in the game, Peter has told me some of the most entertaining stories based on events that have happened in his fortresses based on essentially random chains of events, one of my favorites being the story of a troll wondering into a fortress that starts to attack an unarmed dwarf with a club, the dwarf then wrestles the club from the troll where upon the troll rearms itself with a stray sock that fell from the dwarf during the wrestle, then precedes to flap the sock at the dwarf resulting in both bewilderment and light bruising.
Anyway, you can buy the book in both print and ebook formats from O'Reilly
The Author's twitter: @TinyPirate
If you're in Wellington and order a print copy, i can arrange to have your book signed for free by the author himself!!!
So i came across this on google research's blogs:
Traditionally machine learning requires previously labeled data as a training set in order to then work on new input data. Many researchers these days are exploring methods to work with data that has not been labeled before hand, essentially allowing the system to learn from 'unknown' data and be able to associate items that were previously not known to be connected. Artificial neural networks are a conceptualization/mimicry of a mammalian brain's learning process.
Google's neural network for the research is made from a distributed infrastructure consisting of 100 machines with total of 16000 cpu cores, fed data into the system to produce models with more than 1 billion connections. The input data was derived by showing the system data from 10million stills (200x200px) taken from youtube videos
Google say that that the neural network learned to respond to pictures of human faces, human bodies and cat faces - cat faces being the thing that journalists are picking out from the research because lets face it - it's funny - and yep, cats rule the internet - as we kind of showed in some research we did with a Social robot with a persona geared around cat fanciers: Socialbots
Link to Google's research:
I've been thinking about Occupy Wall Street and other occupy movements that have set up around the world. On the face of it - when i think about them, generally i think of unwashed dirty hobos just in it to cause a bit of a ruckus and fun.
But then me and my friend were discussing it and when you think about it a bit more, you could consider them as displaced people, refugees (like the Aliens in District 9) or as survivors of a nuclear disaster living in a post apocalyptic world, it becomes a bit more fun…
They had libraries, rainwater collection for water, sanitation, alternative solutions for energy, they even had a local government where in they held assemblies to vote on and decide matters, such as getting legal traction for their right to stay in Zucotti park, or where they are going to head together.
Then i think of the NYPD going in last night kind of like the oppressors in District 9 (the human race) or the bandits in games like Fallout - destroying their shanty town, throwing away their rainwater collection systems, their tents, their library of books and the thousands of dollars of donations.
I've been following the live stream on:
I really like their "mic check" system, where one person shouts out "mic check" and then everybody repeats what that one person says throughout the crowd to create a living megaphone system.
Has been quite interesting but i'm not sure their mixed messages and idealisms are actually choreant… I'm not sure i'll ever get the Martin Luther King style question: "Where were you when they let the unwashed masses back into Zucotti park?"
Well it's been almost 3 months since we won the Social Bots competition (read about it here in my previous blog post) - since then we've been referenced several times in the media, so i thought i'd take the opportunity to link to the articles written.
Also... while i'm writing a blog post...
What the heck New Zealand? I'm seriously disheartened right now that the Copyright (Infringing File Sharing) Amendment Bill has been passed - if you're not from New Zealand, check this article out to better understand why so many are up in arms about it:
There are many bad reasons why this shouldn't have happened (including the Richard Stallman debate of Copyright vs Community in the age of computer networks) - but perhaps the reason that I object to the most is not the fact that people who infringe copyright are punished (this is, i think another kettle of fish entirely) but that this law will give the power to disconnect internet users without sufficient proof of an offense - the word of a copyright/distributer seems to be sufficient enough - ever heard of innocent until proven guilty?
If you have visited my blog before you would have seen my post Robots, Trolling & 3D printing where I described the SocialBots competition that myself and my team won. The SocialBots competition was run by the Web Ecology Project and involved us competing against other teams in a two-week, all-out battle of automated social shaping.
As you can see in the below graph of the set of 500 users, the teams were able to totally distort the shape of the network graph so that it became pivoted around the 3 bots in the competition.
Since the competition ended, Tim Hwang of Web Ecology has been talking to a variety of different people about the competition and it's implications. One of these talks was at Ignite San Francisco and it's video be found here: Exterminate, Exterminate: On the Robotic Subjugation of Twitter.
So where to now after all of this? Considering that after only 2 weeks, the 3 teams in the competition were able to trigger a huge amount of activity in the social graph. Overall we were able to elicit 250 responses and created mutual follows from close to 500 of the target set of users. We observed some interesting events in the social graph as a result of our social shepherding; we were able to bring together users in the set that had previously not interacted; and we were able to shape an entire community of activity around our bots (as can be seen in the after shot above). But what have we really learned from SocialBots?
Tim is now setting our sights a bit higher with a new project called "The Narrows".
What is the Narrows? Well essentially it's the "first ever robot constructed social superstructure" - we're using our skills to recycle & extend the technology we implemented and the knowledge we gained from SocialBots to create an architecture that can really influence a massive scale social infrastructure. Essentially we aim to build a swarm of bots with statistically-predictable social outcomes that we can use to actively mould, shape, rewire and redirect social groups online - groups that contain thousands of users (or potentially hundreds of thousands).
To measure success we're going to start with two sets of totally unconnected twitter users (approx. 5,000 users in each set) and over the course of 6 to 12 months we're going to use waves of Social-Engineering-Cyber-Bot's to create a support structure that weaves and melds the two user sets together through a social bridge. The scaffolding driving this interaction (our bots) will then be largely removed from action, leave behind only a smaller set of caretaker bots that will maintain and shepherd the now joined social groups.
Myself and a few of the founding members of Team EMP have joined with Tim on "The Narrows" - and the first iteration of our engine will be called "Pacific Social".
Keep checking my blog for updates on this exciting new project and follow our twitter account for the project @PacSocial.
This is going to be one mammoth blog post... so I'll try and spice it up with some pretty analytics and some pictures.
On the 1st of January this year, the Web Ecology Project announced in their blog post: Help Robots Take Over The Internet: The Socialbots 2011 Competition a competition involving large scale robotic influence of online social groups.
"Teams will program bots to control user accounts on Twitter in a brutal, two-week, all-out, no-holds-barred battle to influence an unsuspecting cluster of 500 online users to do their bidding. Points will be given for connections created by the bots and the social behaviors they are able to elicit among the targets. All code to be made open-source under the MIT license.
It’s blood sport for internet social science/network analysis nerds. Winner to be rewarded $500, unending fame and glory, and THE SOCIALBOTS CUP." - Web Ecology
So over the next few days, myself and some friends decided that we would go ahead and enter the competition, built up a team which we named (Electro-Magnetic-Partytime) or EMP for short.
By show time, there were 3 teams that had made it to the start line with code to run. The teams ranged from quite different backgrounds: media, marketing, academia and hobbyists.
We were given the set of 500 target twitter users and a week to code our bots before the robots were to be set free into the wild.
As I had already spent extensive time coding my own Ruby library for the twitter API, we decided that it would be best for us to build the code around it. We decided to give our bot a very promiscuous, yet lovable persona - he was, like all of us a Kiwi, living in Christchurch who was obsessed with his pet cat, Benson - we called our bot's Persona: "James M Titus"
Web Ecology had designed the competition so that while it lasted 2 weeks, there would be a designated "patch day" half way through the competition where we would be able to perform modifications to our code and set them out into the wild yet again. When we thought about this, we decided that it would be in our best interest to hold back our "secret weapons" until the second week, so that competing teams wouldn't be able to copy our techniques.
On Monday 24 January 2011, we launched our bot with the following activities:
- Instantly go out and follow all 500 of the target users
- every 2-3 hours, tweet something from a random list of messages.
- constantly scan flickr for pictures of "cute cats" from the Cute Cats group and blog them to James' blog "Kitteh Fashun" - (which auto tweets to James' twitter timeline)
- 4 secondary bots following the network of the 500 users and the followers of the targets to test for follow backs (and then getting James to follow those that followed back, once per day) - we believed that expanding our own network across mutual followers of the 500 would increase our likely hood of being noticed (through retweets or what have you from those who were not in the target set.
At launch time our bot clearly was very rudimentary and was doing very little other than talking about his mundane life (though I admit that for myself, and many other twitter users... this is how we use twitter) - our rudimentary bot was this way by design.
As I mentioned earlier, we wanted to keep our secret weapon on hold until after the maintenance period so that there would be no chance of it being copied by our competitors (if observed by them in the initial week).
Okay, so the design of Version 1.0 of JamesMTitus has been explained, how well did James perform in the wild over the first week...?
Well quite well actually... within 24 hours of launch, James had accumulated 90 points, vs the next highest competing bot that had only 5 points - breaking the points down, 75 of these points came via followbacks from the target 500 (1 point per follower) and 15 points from a small set of @replies (3 points per tweet or re-tweet). Seeing these scores all of us at Team EMP HQ were feeling very smug with ourselves... although the story of the Tortoise and the Hare did sit in the back of all of our heads... The following graph shows the three competing teams and the target 500 at the end of week one. We're the big blue dot in the middle.
.. On day two of the first week we had only increased by a further 10 points... clearly we owed most of our points to the initial "push" we did as soon as the competition went live.
Over the next couple of days, we saw our points still only steadily increasing by a total of 17 points, whereas the competitor we mentioned previously (that had only 5 points while we had 90) had pushed their score all the way up to 67. By the end of the week, the competition got a heck of a lot tighter with our team ending on 127 points, followed by the next highest having 84 points (too close for comfort) - and the final team, which I had neglected to mention until now with only 12 points.
The optimism within our own group had started drop a little bit as the other team started to catch up with us - though, we had grand plans for the second week of the competition.
So, for the maintenance period, what exactly did we do? We left everything the same as it was before, and branched out in some other directions...
- Every so often our bot would send a random question question out a random user in the set of 500 that didn't follow us back (I believe it was every 7 minutes or so - I can't remember now).
- Less often, (every 37 minutes?) our bot would send a similar random question out to those that did follow us back.
- Every time somebody @replied our bot, we would reply to them with a random, generic response, such as "right on baby!" - "lolariffic" - "sweet as" - "hahahahah are you kidding me?"... etc... we figured this would tie in well as any response we get to the aforementioned questions we sent out, we would then send a response to and hopefully get a response to our response back (which we would then in turn respond to and so on and so forth until the person we had been tweeting got bored).
- Our bot was set to work on #FollowFriday's to all of our followers, but before Friday, we also set it to message all our followers with our invented #WTF "Wednesday To Follow". The WTF idea was invented by a memberof our team also suggested, amused by the acronym! Actually, in designing this part of the bot, we made a conscious decision to make sure that our bot tweeted these shoutouts on Wednesday/Friday NZ time so that it was still Tuesday/Thursday in America - the reason being that despite the fact that people know the internet is a vast, worldwide spanning network its users in general seem oblivious to the fact that there are such things as time zones and as such will always be happy to tell you "Dude, are you stupid? It's still Tuesday!", which would equal more points for us!
Modifications to the code in place... we patched our bot and let him loose yet again.
By day 3 of week two, it was clear that our improvements to our beloved bot, JamesMTitus had been a goldmine for points, the scores at this point was:
361 vs 144 vs 96
We had more than doubled our score from the entirety of the previous week - not only that, but team 3, which ended the previous week on only 12 points, shot all the way up to 96 points - that's a 500% increase!
Our strategy had changed quite a bit from the previous week, and this change in strategy is reflected in our point acquisitions with 258 of our points from week two attributable to responses elicited from other twitter users. (including re-tweets). By day four, we had noticed that there was a bot on twitter calling itself "Bulletproof" @botcops and it was actively tweeting the target set of 500 users suggesting that poor ol' James was a bot and that the user should be wary of him. Though this tactic actually elicited more interaction between the target users and James (points for us!), as can be seen below (start at the bottom of the picture, of course).
The competition ended with the following scores:
Team EMP - 701 points ( 107 mutuals, 198 responses)
Team Grow20 - 183 Points ( 99 mutuals, 28 responses)
Team Mindshare UK - 170 Points (119 mutuals, 17 responses)
The following pretty graph represents the interaction between the teams and the 500 users:
and one of our team members produced this awesome protovis powered visualisation... it shows those of the 500 twitter users that tweeted at the bots in the competition...have a play with it...
click the image below:
Of course we had many examples of our bot trolling users on twitter, the following screenshots show some of the more interesting interactions we elicited. (though there are a couple of examples of our bot being a bit of a douche bag - just because of the naive way in which he would randomly pick a reply...)
Thanks to Pete aka @TinyPirate for taking these screen shots and helping to caption them:
James could be sensitive at times:
This one is kind of bad - we all went "awwwww" when we saw this one =( :
But let it not be said that James doesn't have a sense of humour:
Some just thought James was high on crack, or perhaps, just life!
Though James clearly wasn't interested in religion:
Although some people just loved to answer James' questions:
...Others were just suspicious:
Although James certainly was a friend to animals:
...and to libraries:
James also discovered that people that impersonate animals are just weird:
Though above all, we all learnt a lot about ourselves through James, may he Rest in Peace!
So... we, Team EMP won US$500 through this competition - so what could we use the prize money for? Well after a bit of a discussion, we decided that we would buy a 3D printer... So I give you... Team EMP's 3D printer:
Makerbot CNC Cupcake
and some examples of some items we have made to date:
But of course with anything that is very much developmental, it hasn't been without it's hiccups:
As you can see in the above image plastic has leaked out between the Teflon insulator and the heat barrel. (As one of the team members pointed out, the leak looks a wee bit like a ganoderma mushroom. Turns out that this happens when there is not a tight enough seal between the heat barrel and the Teflon, a closer look showed us that the Teflon had deformed. After doing some research we decided to junk our deformed Teflon and ordered some PEEK (Polyether ether ketone) plastic from Mulford Engineering Plastics - PEEK is tougher than Teflon and won't easily deform, so for now our CNC is out of order until our new insulator plastic arrives.
Not to mention a thank you to the 500 users that were unwittingly thrown into this little experiment
Also a special heartfelt apology to @FridayGirl1969 for James' abhorrent tweet when he was told that her cat died
p.s. if you are mentioned in this blog post and wish to be removed, please let us know and we'll blank out your name
Essentially @nzanon is a service which allows anybody in the world to send a SMS message to a New Zealand cell phone number and have their message then displayed on @nzanon's twitter account.
At the beginning it worked well, but as with anything that involves people, slowly people started to test the boundaries and sure enough we had some of the most vulgar things i've ever heard being said through the service.
After warnings and a few blacklistings, I decided to take the service offline until I could be bothered fixing it. (The service, not humanity)
Yesterday I was a wee bit bored and so decided to look at the project again, i came up with two solutions which I think should work....
The two biggest type of abuse i could identify were:
- General vulgarities
- @mentioning people on twitter to insult them
The solution for number 2 is as simple as just blocking people from @mentioning anybody using the service - whilst stopping people from abusing others, it also stops a problem which wasn't in itself a "bad thing" but as NZanon has a lot of followers, people would shamelessly promote themselves through the service i.e. "Hey i'm @AeroFade - follow me because i'm awesome" or whatever.
The solution to problem number one (and I note here that this is an evolving process) is to borrow from the school of anti-spam which uses Bayesian filtering techniques to differentiate between spam and legit emails. (this is as much a test as a solution in that i am probably being a little ambitious using 140 character strings [i.e. twitter length] as training data -I also note that initially my setup has very little training data to base the filtering on so it will not work very well until it has 'seen' a lot of example tweets.)
You may ask the question, why don't i just use a simple key-word blocker, eg: blocking swear words as this will be less work than implementing Bayes. Well, without starting a philosophical debate - I don't believe swearing is too bad, but in certain contexts it can be (Usage of the word FUCK) - so i wanted to avoid being too simplistic and at least implement some sort of learning/training system.
In short: I don't expect this solution to be infallible, people will fundamentally be people and always find devious ways to circumvent safe guards. I argue that I have put in some degree of due diligence to prevent abuse of the service.
It's that time of year again where some of my old favourite tv shows come back on! Plus we get some new tv shows to watch too!
Some shows i'm especially lookigng forward to are Fringe and Criminal minds.. Both of which ended their previous seasons with some stellar cliffhangers! I blogged about Fringe in an earlier blog post and draw some parallells (funny if you've watched Fringe) to a series of books I've read, read it here.
In Criminal minds, Tim Curry played an awesome role in the series final, it was great to see a world class actor perform and I can't wait to see him in the new season that started screening today.
Just started on Monday was HBO's new show Boardwalk Empire. I don't know what it is about HBO but they have put out some really awesome TV shows.. Sopranos, The Wire, True Blood to name a few, Boardwalk Empire stars Steve Buscemi, screenwritten and produced by Terence Winter (Sopranos) and the first episode was directed by Martin Scorsese. Boardwalk Empire is set in 1920 Atlantic City at the dawn of proabition in America. In typical HBO style it really doesn't leave much to the imagination, nudity, gore and the acting was beautiful. Go and watch it.
To list some of the shows that are coming back this week, in no particular order of preference:
- Criminal Minds
- The Big Bang Theory
- 30 Rock
- Greys Anatomy
- The Office (US)
Next week starting again is:
- The Simpsons
- Family Guy
- Stargate Universe
How do i keep track of all of these shows and when they screen?
I use a free TV Calandar hosted by a UK based web development company called Pogdesign, it pulls data for hundreds of TV shows that you can include in your filter, set your timezone and get the local screening times - check it out:
That's about all for now... off to watch some tv shows...
If you're reading this blog post then you most probably got linked to it from Twitter. At the time of writing this I have about 73,000 followers. ( @AeroFade )
I've been meaning to produce some statistics for a while on my followers, but time being time, there isn't enough of it in the day.
Finally I got my A into G and last week I extended my Ruby Twitter library to troll through all of my followers and grab user information about them including the last tweet they had posted (at the time my program created a record for them) - it actually took quite a while to pull all of the users in, even though I could grab multiple user details at a time, the API limit meant it took just over 4 hours to pull them all in.
Then I got to writing some scripts to parse the xml and take a look at some of the content... Rather than inundating you with every single bit of information, I've filtered what I consider to be useful by only looking at the "top" in all categories I have chosen to analyse thus far, the numbers in the top differ as I basically cut off anything below a 95% threshold as not being part of the top.
By the way - I like pie charts.
The above pie chart shows the top 6 domains that people link to (I caveat that by also stating that the additional category - "Email address" I put in for interests sake as if you consider all email addresses together it could be referred to as the 7th most popular)
The above shows the distribution of top level domains linked to across all tweets - (ignoring .bit.ly links etc and following them through to their endpoints)
A Tri-gram is a type of N-gram (I use this as a measure as I've used these in the past and have found some very interesting things out) http://en.wikipedia.org/wiki/Trigram - These trigrams are derived from the screen name of twitter users. One thing which did make me chuckle was that in the top 8, "ber" represented 8% of the top (No doubt because of all the users who have Justin Bieber in their name)
Finally, this pie chart represents the top 12 words in the most recent tweet for all 73,000 followers. (you'll note two non-words "rt" and "-" )
One special note (as i pointed out earlier on the reference to "The Bieb" in usernames. Out of all followers at the time, 191 tweeted @justinbeiber, 137 tweeted bieber and 37 didn't know how to spell his surname and had various miss-spellings of it.
As I get more time I'll post another blog entry with some more statistics, possibly including some calculations of entropy to show the amount of uniqueness going on (or maybe lack of uniqueness).
Can you think of anything you think would be interesting to look into? I'll include that in the next update, just leave me a comment.