Monday, October 17, 2011

Using Google+ API - The Real One, Not The Official Ersatz. - Part 1. The Basics


Important Update


hile This post is now a little bit obsolete, because Google changed the way they render Google+ post pages. I am going to prepare an updated post describing the changes very soon. Meanwhile you can go to my announcement post to grab the latest version of G+7 that fixes the problems that Google's changes caused.

Introduction
 
While working on the G+7 Windows Desktop Gadget, the most difficult problem was retrieving information that I wanted from Google+, and also, updating Google+ with stuff (like adding comments, sharing, +1-ing, etc.). When I started, there was no Google+ API available at all. People all over the world were trying to find some workarounds, but none of them was good enough to be of practical use. 

Since then, Google released a first version of Google+ API. Unfortunately, in its current incarnation, the API has a very limited functionality. As Steve Yegge described it in his recent infamous rant, it is a "Stalker API" - it allows you to learn stuff about a particular person on Google+ - their posts, circles, comments, etc. But it does not let you do any of the stuff you would want to use to implement a proper Google+ application, such as a standalone desktop client, a mobile client, a desktop gadget etc. There is no way to retrieve a stream of activities that you as a Google+ user see when you open http://plus.google.com. The is no way to comment, share posts, +1 a post or a comment.

I hope this situation changes (Google, are you listening?), but in the meantime, I had to resort to reverse engineering the protocol that Google+ pages actually use to produce the content that you can see on them. The result of this reverse-engineering efforts will be the subject of the series of blog posts that I am beginning today. I hope to post every second day. I hope that these posts are useful to Google+ developers out there.

Todays post will cover some very important basics, such as data fromats, protocols, etc. Next installments will describe the following topics
  • details of authentication 
  • retrieving list of your circles
  • retrieving message stream for a particular circle
  • adding +1 to a post
  • commenting on a post
  • sharing a post
The list above reflects the current functionality of G+7 desktop gadget.  Which means that it reflects the state of reverse engineering which I am currently at. There is obviously more to Google+ than that, and I suppose over time G+7 will be updated, as soon as I figure out details of of other operations.

The Basics - Authentication, Protocols, Data Formats

Google+ uses Ajax to create pages that you see. What happens when you open https://plus.google.com (or any other page containing at least one message on Google+) is - in order
  1. a page with some trivial HTML is loaded and displayed
  2. the page, in addition to HTML, contains a Javascript data structure that encodes the details of the currently logged in user, as well as a first page of the message stream that you see
  3. this structure is read by Javascript code and HTML that displays message stream is dynamically constructed by Javascript
  4. Javascript executes some HTTP GET and HTTP POST calls to Google+ to retrieve additional information and put it on a page
The easiest (but not the only) way to start implemeting a Google+ client is get at this first page and retrieve the initial Javascript data structure. For this, you have to
  1. issue a HTTP GET request to https://plus.google.com or to a page with one particular message
  2. read in the resulting page
  3. find the part of the page between the string 'OZ_initData = '  and ';window.jstiming.load.tick('idp');'
  4. interpret this substring as a Javascript data structure
  5. use the structure for subsequent operations
This is how this can be done using jQuery, like this:

jQuery.support.cors = true; // this is a secret sauce, which I will describe later
jQuery.ajax({
        type: 'GET',
        url: 'https://plus.google.com/?_requid=' + new Date().getTime(),
        success: function (data, status, xhr) {
                var start = data.indexOf('OZ_initData = ');
                var end = data.indexOf(";window.jstiming.load.tick('idp');");
                if (start > 0 && end > 0) {
                    start += 'OZ_initData = '.length;
                    var pageData = jQuery.parseJSON(data.substring(start, end));
                    // ... do stuff with pageData
                }
        },
    });
You may be wondering what's the deal with jQuery.support.cors = true;. Well, to be frank - it is a hack. The propblem with Ajax support in jQuery is that it only works within a single domain. Which means that only Javascript loaded from a particular web page is able to access URLs belonging to a domain, to which the page belongs. Which means that if you implement a desktop gadget or a desktop client, you are out of luck - jQuery will throw a "No Transport" error at you if you tell it to make an Ajax call to https://plus.google.com. Unless of course, you force it to behave. The trick was described in this excellent post. But admittedly, this is a trick and a hack, and if jQuery implementation ever changes, it might not be possible to do this any more
Side note - Authentication 
As you can see, the data structure that you want to get at is contained now in a pageData variable. I will describe this data structure in more detail later, but first we must overcome one big hurdle. Namely - in order to be able to view https://plus.google.com, you need to authenticate. If you are implementing some code that will execute in a browser, in the context of an already logged in user (like a greasemonkey script, browser plugin, or somesuch), there is nothing to be done - you are already authenticated and jQuery will execute its Ajax request with proper credentials. But if you are implementing something like a standalone client - say a .NET desktop client for Google+, you must first retrieve three cookie values, resulting from authenticating to Google+. The names of cookies you want are: SID, SSID and HSID. Then you have to use these cookie values in the Cookie header of all your HTTP requests. I will cover how to get these cookies in an example .NET application in one of the subsequent posts.

On To The Good Stuff

The data structure that we retrieved to the pageData variable  has been partially described in an excellent post that I used as a reference when implementing G+7. That post took a stab at figuring out the data format.  It was a tremendous help, but it missed some less frequently used details. I have discovered some more information in there. Here it goes

for a page with a list of posts (like https://plus.google.com or a page showing a stream from a circle)
post = pageData[4][0][i] - where i is the post index. There are 20 post in the table
 for a page with a single post (like this one)
post = pageData[20]
Each post is a huge table (93 entries, each of the cells can have big sub-entries). Details of each post:

post[3] = poster name
post[4] = Full HTML text
post[5] = post timestamp (milliseconds since 01.01.1970)
post[7] = Comments post
post[7][x][16] = comment author photo. You will want to add a '?sz=16' parameter to shrink the image
post[7][x][6] = comment author ID
post[7][x][1] = comment author name
post[7][x][3] = comment timestamp (milliseconds since 01.01.1970)
post[7][x][2] = comment text
post[8] = ID of the post
post[11] = Array of one or more links
post[11][x][3] = Title of link
post[11][x][5][1] = URL of image uploaded
post[11][x][21] = Description of link
post[11][x][24][1] = Linked URL
post[11][x][24][4] = Type: document, image, photo, video
post[11][x][41][0][1] = Thumbnail of image
post[16] = poster ID
post[18] = URL of the poster photo. You will want to add a '?sz=48' parameter to shrink the image
post[21] = Link to Google+ page for the post
post[25] = shares of the post
post[27] = location where the post was made
post[27][10] = a miniature image of Google Maps image of the location, centered at the location coordinates
post[27][3] = name of the location
post[27][8] = Google Maps URL of the location
post[44] = if the post is a share of some other message, the original message author
post[44][0] = original message author name
post[44][1] = original message author ID
post[44][4] = original message author photo URL

post[47] = if the post is a share of some other message, the message accompanying the share
post[73][9] = if you are one of the people who +1d the post, this is set to your name
post[73][16] = number of +1s for the post
post[93] = number of comments

Some potentially useful details for dealing with the information above

  • URLs don't necessarily start with http: or https:. This is ok if you use them from the context of the page you are currently on (e.g., if you are implementing a greasemonkey script). But if you implement a standalone client, you may want to fix these URLs by  prepending https: to them before use
  • The user mugshot photos come from Picasaweb. As such, they are quite big, but if you append a '?sz=48' parameter to them , they get resized to 48x48 pixels. This also works for 16x16. I have not tried other sizes, but I suppose they work also
  • Other image URLs can come from anywhere. And suppose you want to display them shrinked proportionally, you will need to have a way to do it easily. I found that Google+ uses some internal service to do so, which I also taken the liberty to use. The following Javascript function does the trick:

function getSizedImage(imgUrl, size) {
    return
        'https://images1-focus-opensocial.googleusercontent.com/gadgets/proxy?url=' 

        + encodeURIComponent(imgUrl) 
        + '&container=focus&gadget=a&rewriteMime=image/*&refresh=31536000&resize_w=' 
        + size + '&no_expand=1';
}


To Be Continued


Well, I guess this is enough for today - like I said, the message structure is quite huge and describing it took a lot of space.

In the next installment, I will describe how to:

  • interpret some additional information that can be retrieved using Ajax requests
  • retrieve the same information as above using Ajax request and not screen-scraping the Google+ web page
  • display more then one page of messages (20 posts) 
  • comment on a message







No comments:

Post a Comment