Deciphering Google’s mysterious ‘batchexecute’ system

Anyone trying to find or exploit vulnerabilities on the web has likely needed to pose as a client before. In order to find flaws in a web service, you need at least a basic understanding of how the client talks to the server and vice versa, so that you can later send your own crafted requests. But modern protocols and data structures aren’t always easy on the middle man.

For most of its major web apps, Google uses a batch-style RPC system that can be spotted by its common slug: batchexecute. At first glance, a request to this special API can seem hostile to anyone wanting an inside look:

POST /u/1/_/ContactsUi/data/batchexecute?rpcids=rptSGc&f.sid=-6483512770624070754&bl=boq_contactsuiserver_20201018.13_p0&hl=en&soc-app=527&soc-platform=1&soc-device=1&_reqid=502219&rt=c HTTP/1.1
Host: contacts.google.com
User-Agent: [REDACTED]
Origin: https://contacts.google.com
X-Client-Data: [REDACTED]
Referer: https://contacts.google.com/
Cookie: [REDACTED]
Content-Length: 160
Content-Type: application/x-www-form-urlencoded
Connection: keep-alive
f.req=%5B%5B%5B%22rptSGc%22%2C%22%5B%5B%5C%22c8351307351755208604%5C%22%5D%5D%22%2Cnull%2C%22generic%22%5D%5D%5D&at=ACHfmxprW1XKRktXjCXD06UGAxSR%3A1603611417347

But fear not! There is method to this madness. I’m going to walk through replicating and modifying the above request to Google Contacts, but you can take these concepts and apply them to all sites that use this system. Our request will be querying the server for information on a single contact.

Summarizing all of the nonsense

First, let’s take a look at the URL:

/u/1/_/ContactsUi/data/batchexecute
?rpcids=rptSGc
&f.sid=-6483512770624070754
&bl=boq_contactsuiserver_20201018.13_p0
&hl=en
&soc-app=527
&soc-platform=1
&soc-device=1
&_reqid=502219
&rt=c

The path:

  • denotes which user is calling to the API. This won’t be present on clients with only one Google account signed in, but in my case, I’m calling from the second user account on my browser.
  • is the UI name of the web app. Every app that uses this system has one.

The query string:

  • is a comma-separated list of IDs that defines which functions will be called on the server. This is the most important part of the request. I will explain how to find them later!
  • is a signed 64-bit integer that updates on every page load. It is likely an XSRF deterrent alongside , which we’ll learn about later.
  • is the name and version of the backend software handling the requests.
  • is the language the response should be in. This is a two-character ISO 639–1 code, which you’ve probably seen before. means English!
  • All values starting with are optional, and seem to be for analytics purposes.
  • is a semi-random ID number generated by the client on each request. I’ll expand on how to generate this later.
  • stands for response type. It’s always .

The headers:

The only headers you’ll ever need are , , and .

The request body:

The request body is always a form with . I’ve taken the liberty of decoding it for those of us who haven’t memorized our percent tables. :’)

f.req=[[["rptSGc","[[\"c8351307351755208604\"]]",null,"generic"]]]
&at=ACHfmxprW1XKRktXjCXD06UGAxSR:1603611417347

The form values:

  • is an array of “envelopes” for each payload in the batch. I’ll dive deeper into the structure of this later.
  • is another XSRF mitigation parameter, this time tied to the user’s Google account and paired with a UNIX timestamp.

Oh god, where do I get all of these values?

The answer: Wiz.

There are so many values to keep track of here, all of which are pretty opaque and strange. Luckily, finding them is not too difficult: Google sends all of them in a JavaScript object called WIZ.

When you load a page that uses batchexecute, there’s a tag in the document header that has everything you need to build a request. It looks like this:

<script data-id="_gd" nonce>
window.WIZ_global_data = {...};
</script>

The object defined here is usually long, and — yes — it is also entirely obfuscated (arg!), but I’m going to make it easy for you and just provide the names of all the keys and which values they correspond to in the request. They’re the same across all Google apps!

  • — The UI name in the path. Here, it’s .
  • — The value. Here, it’s .
  • — The value. Here, it’s .
  • — The value. Here, it’s .

also has a ton of other variables the app needs to function. You might find some fun stuff in here that can enable new features or tip you off to a vulnerability.

But wait, what about ?

Oh, right. Here’s everything you need to know about :

The first ID your client sends should be a random four digits, with 100,000 added onto it with each subsequent request. For example: , , , and so on. This is likely for spam and abuse detection, or to track the order of requests.

In my experience, the server really doesn’t care about this value. You can give it a random six-digit number each time and it will behave normally. If you want to be a little more stealth, though, it wouldn’t hurt to use this pattern.

But wait, what about ?

Okay, here’s the run-down on :

As I said earlier, this is a comma-separated list of identifiers that tells the server which functions the request should call. In our case, we just want one — — which corresponds to the server side function that handles getting contact information. To call more than one function in a single request (a batch of functions, if you will), it would look like this: ...

These are obfuscated on purpose — they don’t want to make this easy for us. Luckily, they are also constant, so they persist across requests and backend updates. This means we can hardcode them and forget! In fact, that’s exactly what Google does in their JavaScript, but you’ll have a hard time distinguishing them from literally any other identifier because RPC IDs and IDs representing HTML elements or JavaScript actions are all obfuscated in the same way.

For that reason, the only way to get a hold of them is to execute a request on your browser and lift the RPC ID from there.

The magical

We saw an example of an value earlier. Let’s look at it pretty-printed:

[ // First array
[ // Second array
[ // Third array
"rptSGc",
"[[\"c8351307351755208604\"]]",
null,
"generic"
]
]
]

Yes, that’s three nested arrays! Let’s break them down one at a time.

  • The first/outermost array simply holds the entire request. This array will always have exactly one item, which is the second array.
  • The second array contains each request in the batch. We’re only sending one request with one payload, so this array only has one item.
  • The third array is like an envelope for our payload, describing when and where it should be sent. Index 0 is our RPC ID, index 1 is the actual data being sent, and index 3 describes in what order the payloads should be processed. Because we only have one, its value is , but if there were multiple it would start at and go upwards. The value at index 2 is always .

Here’s the same example but with two envelopes, annotated for niceness:

[ // The whole thing
[ // All of the requests
[ // Envelope 1
"rptSGc", // RPC ID!
"[[\"c8351307351755208604\"]]", // Payload 1
null, // ALWAYS null
"1" // Order of payloads
],
[ // Envelope 2
"xxXxX",
"[[\"payload_data_goes_here\"]]", // Payload 2
null,
"2"
]
]
]

You should have a somewhat-good understanding of the structure now. But what about the actual data, at index 1? How is that formatted?

Well, it varies. Everything at index 1 is defined entirely by the application and the function you’re calling. What I can tell you is that it’s always JSON-formatted, and it’s always a safe, escaped string (meaning newlines are replaced with , and other tricky characters like are all preceded by a backslash).

In our case, the data is a double-nested array containing a single string: the identifier for the contact we want to get information on. To get this ID, we have to look at the HTML element of the contact we want in the list. It’s the attribute seen here:

Image for post
Image for post
Our contact ID can be seen on the right, highlighted in purple. It always starts with ‘c’.

Why is the data like this?

I don’t know for sure, but I have a hunch.

The default IDL of gRPC, Google’s RPC framework, is Protobuf — a data serialization method also created by Google that makes data transmission quick, efficient, and accessible to any language. You can read about it here. This is what an example protobuf message looks like, parsed into printable characters from its usual binary form:

2 {
1: "Field 1"
3: "Field 3"
4 {
1: 0
2: 0
3: 1
4: 1
5: 0
}
}

There are no key-value pairs in raw protobuf, just field numbers. With batchexecute, I think Google is mapping protobuf messages to JSON in a special way. There is documentation on this, but it doesn’t quite match up to what we see here. This is how I think the above message would be mapped to JSON in batchexecute:

[ // The root is explicitly defined here, unlike protobuf,
// so this is where [0] comes from below.
null, // Field 1 is missing, so null
[
"Field 1",
null, // Field 2 is missing, so null
"Field 3",
[
0,
0,
1,
1,
0
]
]
]

Each message is converted to an order-sensitive array, so the indexes of each item match up to the numbers of each field minus one. For example, getting a value at [2][3] in the protobuf message would be the same as getting an item at index [0][1][2] in the JSON. Fields that are missing from a message become in the JSON mapping.

In short, the data structure used in batchexecute is just protobuf disguised as JSON.

Sending the request

So now we know all the values we need and what structure to send the data in. Good job getting this far! You’ve crafted a proper request when it includes all of these:

In the query string:

In the request body (the form):

In :

  • An array encompassing the entire request.
  • An array inside that one containing each “envelope” we talked about earlier.
  • One or more envelopes containing: RPC ID, the data, , and /a number.

NOTE: There may be a scenario in which you want to send a request unauthenticated. To do this, simply remove the header and the value from your request.

With all of our values now sorted, we send the request, and…

)]}'2593
[["wrb.fr","rptSGc","[[[\"c8351307351755208604\", ... \n]\n]\n]\n",null,null,null,"generic"]
]
57
[["di",79]
,["af.httprm",79,"246063832929204055",128]
]
27
[["e",4,null,null,2691]
]

Yikes.

Home stretch: picking apart the response

Okay, so the server’s response has come back, and if you scroll around in it you can probably spot some contact info like first and last name. This is actually less complicated than you’d think. We’re almost done!

Right away, we can discard the first six characters. That’s and the two newlines that follow. Now we have this:

2593
[["wrb.fr","rptSGc","[[[\"c8351307351755208604\", ... \n]\n]\n]\n",null,null,null,"generic"]
]
57
[["di",79]
,["af.httprm",79,"246063832929204055",128]
]
27
[["e",4,null,null,2691]
]

Just like we sent in the request, the response has envelopes too! There’s one for each RPC call, and then two for additional metadata we won’t use. Here we can see there are three lines that are just integers — these denote the length, in bytes, of each envelope. Think of it like an object where each key is the length and each value is the envelope:

{
"2593": [["wrb.fr","rptSGc","...",null,null,null,"generic"]],
"57": [["di",79],["af.httprm",79,"246063832929204055",128]],
"27": [["e",4,null,null,2691]]
}

For each additional request in the batch, a response envelope will be added.

Since we’re just focusing on the first payload today, let’s drop all of the length integers and the last two metadata responses (and pretty-print to make things easier):

[
[
"wrb.fr",
"rptSGc",
"[[[\"c8351307351755208604\", ... \n]\n]\n]\n",
null,
null,
null,
"generic"
]
]

Sweet, so now we’ve isolated the response we want from the server. It looks very similar to the envelopes from our request. Here are the important indexes to keep in mind:

  • Index 1 is the RPC ID of the function that’s returning the data. These match all the ones we sent in our request.
  • Index 2 is the actual data, safely escaped like in our request. If the request fails for some reason, this will be .
  • Index 6 is the order of which the payload was processed.

The data at index 2, once unescaped, can be processed by any JSON interpreter. Our data is very long, so I won’t be going into it here, but if the outermost array isn’t empty it means the request was successful. You can find the indexes of the values you need on your own.

That’s it!

I wrote this purely because it was such a pain to figure out. I hope it finds researchers who are struggling, and makes pentesting Google apps more accessible to everyone.

If you know more, or think I missed something, please let me know. I’m still learning too.

Happy bug hunting!

Image for post
Image for post

Written by

I'm a web security researcher participating in the Google VRP in my free time.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store