Deciphering Google’s mysterious ‘batchexecute’ system
Anyone trying to find or exploit vulnerabilities on the web has likely needed to pose as a client before. In order to find flaws in a web service, you need at least a basic understanding of how the client talks to the server and vice versa, so that you can later send your own crafted requests. But modern protocols and data structures aren’t always easy on the middle man.
For most of its major web apps, Google uses a batch-style RPC system that can be spotted by its common slug: batchexecute. At first glance, a request to this special API can seem hostile to anyone wanting an inside look:
POST /u/1/_/ContactsUi/data/batchexecute?rpcids=rptSGc&f.sid=-6483512770624070754&bl=boq_contactsuiserver_20201018.13_p0&hl=en&soc-app=527&soc-platform=1&soc-device=1&_reqid=502219&rt=c HTTP/1.1
Host: contacts.google.com
User-Agent: [REDACTED]
Origin: https://contacts.google.com
X-Client-Data: [REDACTED]
Referer: https://contacts.google.com/
Cookie: [REDACTED]
Content-Length: 160
Content-Type: application/x-www-form-urlencoded
Connection: keep-alivef.req=%5B%5B%5B%22rptSGc%22%2C%22%5B%5B%5C%22c8351307351755208604%5C%22%5D%5D%22%2Cnull%2C%22generic%22%5D%5D%5D&at=ACHfmxprW1XKRktXjCXD06UGAxSR%3A1603611417347
But fear not! There is method to this madness. I’m going to walk through replicating and modifying the above request to Google Contacts, but you can take these concepts and apply them to all sites that use this system. Our request will be querying the server for information on a single contact.
Summarizing all of the nonsense
First, let’s take a look at the URL:
/u/1/_/ContactsUi/data/batchexecute
?rpcids=rptSGc
&f.sid=-6483512770624070754
&bl=boq_contactsuiserver_20201018.13_p0
&hl=en
&soc-app=527
&soc-platform=1
&soc-device=1
&_reqid=502219
&rt=c
The path:
/u/1
denotes which user is calling to the API. This won’t be present on clients with only one Google account signed in, but in my case, I’m calling from the second user account on my browser.ContactsUi
is the UI name of the web app. Every app that uses this system has one.
The query string:
rpcids
is a comma-separated list of IDs that defines which functions will be called on the server. This is the most important part of the request. I will explain how to find them later!f.sid
is a signed 64-bit integer that updates on every page load. It is likely an XSRF deterrent alongsideat
, which we’ll learn about later.bl
is the name and version of the backend software handling the requests.hl
is the language the response should be in. This is a two-character ISO 639–1 code, which you’ve probably seen before.en
means English!- All values starting with
soc-
are optional, and seem to be for analytics purposes. _reqid
is a semi-random ID number generated by the client on each request. I’ll expand on how to generate this later.rt
stands for response type. It’s alwaysc
.
EDIT June 2, 2021: I lied! Thert
value can bec
— which will get you the traditional response format I detail in this article — but it can also beb
for Protobuf, or you can omit it for a much easier JSON-formatted message. (Why didn’t I think of that?)
The headers:
The only headers you’ll ever need are Cookie
, Content-Type
, and Content-Length
.
The request body:
The request body is always a form with Content-Type: application/x-www-form-urlencoded
. I’ve taken the liberty of decoding it for those of us who haven’t memorized our percent tables. :’)
f.req=[[["rptSGc","[[\"c8351307351755208604\"]]",null,"generic"]]]
&at=ACHfmxprW1XKRktXjCXD06UGAxSR:1603611417347
The form values:
f.req
is an array of “envelopes” for each payload in the batch. I’ll dive deeper into the structure of this later.at
is another XSRF mitigation parameter, this time tied to the user’s Google account and paired with a UNIX timestamp.
EDIT June 2, 2021: I discovered recently that these values can also be placed in the query string, if you need to do that — values between the query string and the form seem to be interchangeable. If an identical key is present in both places, the query string value takes priority and the form value is ignored. The request still needs to be a POST request, however, even if you’re sending an empty form.
Oh god, where do I get all of these values?
The answer: Wiz.
There are so many values to keep track of here, all of which are pretty opaque and strange. Luckily, finding them is not too difficult: Google sends all of them in a JavaScript object called WIZ.
When you load a page that uses batchexecute, there’s a <script>
tag in the document header that has everything you need to build a request. It looks like this:
<script data-id="_gd" nonce>
window.WIZ_global_data = {...};
</script>
The object defined here is usually long, and — yes — it is also entirely obfuscated (arg!), but I’m going to make it easy for you and just provide the names of all the keys and which values they correspond to in the request. They’re the same across all Google apps!
qwAQke
— The UI name in the path. Here, it’sContactsUi
.cfb2h
— Thebl
value. Here, it’sboq_contactsuiserver_20201018.13_p0
.FdrFJe
— Thef.sid
value. Here, it’s-6483512770624070754
.SNlM0e
— Theat
value. Here, it’sACHfmxprW1XKRktXjCXD06UGAxSR:1603611417347
.
WIZ_global_data
also has a ton of other variables the app needs to function. You might find some fun stuff in here that can enable new features or tip you off to a vulnerability.
But wait, what about _reqid
?
Oh, right. Here’s everything you need to know about _reqid
:
The first ID your client sends should be a random four digits, with 100,000 added onto it with each subsequent request. For example: 1234
, 101234
, 201234
, and so on. This is likely for spam and abuse detection, or to track the order of requests.
In my experience, the server really doesn’t care about this value. You can give it a random six-digit number each time and it will behave normally. If you want to be a little more stealth, though, it wouldn’t hurt to use this pattern.
But wait, what about rpcids
?
Okay, here’s the run-down on rpcids
:
As I said earlier, this is a comma-separated list of identifiers that tells the server which functions the request should call. In our case, we just want one — rptSGc
— which corresponds to the server side function that handles getting contact information. To call more than one function in a single request (a batch of functions, if you will), it would look like this: rptSGc,xxXxX,XXxxX
...
These are obfuscated on purpose — they don’t want to make this easy for us. Luckily, they are also constant, so they persist across requests and backend updates. This means we can hardcode them and forget! In fact, that’s exactly what Google does in their JavaScript, but you’ll have a hard time distinguishing them from literally any other identifier because RPC IDs and IDs representing HTML elements or JavaScript actions are all obfuscated in the same way.
For that reason, the only way to get a hold of them is to execute a request on your browser and lift the RPC ID from there.
The magical f.req
We saw an example of an f.req
value earlier. Let’s look at it pretty-printed:
[ // First array
[ // Second array
[ // Third array
"rptSGc",
"[[\"c8351307351755208604\"]]",
null,
"generic"
]
]
]
Yes, that’s three nested arrays! Let’s break them down one at a time.
- The first/outermost array simply holds the entire request. This array will always have exactly one item, which is the second array.
- The second array contains each request in the batch. We’re only sending one request with one payload, so this array only has one item.
- The third array is like an envelope for our payload, describing when and where it should be sent. Index 0 is our RPC ID, index 1 is the actual data being sent, and index 3 describes in what order the payloads should be processed. Because we only have one, its value is
"generic"
, but if there were multiple it would start at"1"
and go upwards. The value at index 2 is alwaysnull
.
Here’s the same example but with two envelopes, annotated for niceness:
[ // The whole thing
[ // All of the requests
[ // Envelope 1
"rptSGc", // RPC ID!
"[[\"c8351307351755208604\"]]", // Payload 1
null, // ALWAYS null
"1" // Order of payloads
],
[ // Envelope 2
"xxXxX",
"[[\"payload_data_goes_here\"]]", // Payload 2
null,
"2"
]
]
]
You should have a somewhat-good understanding of the structure now. But what about the actual data, at index 1? How is that formatted?
Well, it varies. Everything at index 1 is defined entirely by the application and the function you’re calling. What I can tell you is that it’s always a JSON.stringify’d data structure.
In our case, the data is a double-nested array containing a single string: the identifier for the contact we want to get information on. To get this ID, we have to look at the HTML element of the contact we want in the list. It’s the data-id
attribute seen here:
Why is the data like this?
I don’t know for sure, but I have a hunch.
The default IDL of gRPC, Google’s RPC framework, is Protobuf — a data serialization method also created by Google that makes data transmission quick, efficient, and accessible to any language. You can read about it here. This is what an example protobuf message looks like, parsed into printable characters from its usual binary form:
2 {
1: "Field 1"
3: "Field 3"
4 {
1: 0
2: 0
3: 1
4: 1
5: 0
}
}
There are no key-value pairs in raw protobuf, just values assigned to field numbers. With batchexecute, I think Google is mapping protobuf messages to JSON in a special way. There is documentation on this, but it doesn’t quite match up to what we see here. This is how I think the above message would be mapped to JSON in batchexecute:
[ // The root is explicitly defined here, unlike protobuf,
// so this is where [0] comes from below.
null, // Field 1 is missing, so null
[
"Field 1",
null, // Field 2 is missing, so null
"Field 3",
[
0,
0,
1,
1,
0
]
]
]
Each message is converted to an order-sensitive array, so the indexes of each item match up to the numbers of each field minus one. For example, getting a value at [2][3] in the protobuf message would be the same as getting an item at index [0][1][2] in the JSON. Fields that are missing from a message become null
in the JSON mapping.
In short, the data structure used in batchexecute is just protobuf disguised as JSON.
Sending the request
So now we know all the values we need and what structure to send the data in. Good job getting this far! You’ve crafted a proper request when it includes all of these:
In the query string:
rpcids
f.sid
bl
hl
_reqid
rt
In the request body (the form):
f.req
at
In f.req
:
- An array encompassing the entire request.
- An array inside that one containing each “envelope” we talked about earlier.
- One or more envelopes containing: RPC ID, the data,
null
, and"generic"
/a number.
NOTE: There may be a scenario in which you want to send a request unauthenticated. To do this, simply remove the Cookie
header and the at
value from your request.
With all of our values now sorted, we send the request, and…
)]}'2593
[["wrb.fr","rptSGc","[[[\"c8351307351755208604\", ... \n]\n]\n]\n",null,null,null,"generic"]
]
57
[["di",79]
,["af.httprm",79,"246063832929204055",128]
]
27
[["e",4,null,null,2691]
]
Yikes.
Home stretch: picking apart the response
Okay, so the server’s response has come back, and if you scroll around in it you can probably spot some contact info like first and last name. This is actually less complicated than you’d think. We’re almost done!
Right away, we can discard the first six characters. That’s )]}'
and the two newlines that follow. Now we have this:
2593
[["wrb.fr","rptSGc","[[[\"c8351307351755208604\", ... \n]\n]\n]\n",null,null,null,"generic"]
]
57
[["di",79]
,["af.httprm",79,"246063832929204055",128]
]
27
[["e",4,null,null,2691]
]
EDIT June 2, 2021: I recently found that this response format can be simplified by omitting the rt
query parameter, which will cause the server to return a JSON array of all the envelopes instead of splitting them by length integer. (You can also change it to b
for a Protobuf response… if you want that.) I’ll keep explaining the c
response type below, but the JSON format is much easier to work with.
Just like we sent in the request, the response has envelopes too! There’s one for each RPC call, and then two for additional metadata we won’t use. Here we can see there are three lines that are just integers — these denote the length, in bytes, of each envelope. Think of it like an object where each key is the length and each value is the envelope:
{
"2593": [["wrb.fr","rptSGc","...",null,null,null,"generic"]],
"57": [["di",79],["af.httprm",79,"246063832929204055",128]],
"27": [["e",4,null,null,2691]]
}
For each additional request in the batch, a response envelope will be added.
Since we’re just focusing on the first payload today, let’s drop all of the length integers and the last two metadata responses (and pretty-print to make things easier):
[
[
"wrb.fr",
"rptSGc",
"[[[\"c8351307351755208604\", ... \n]\n]\n]\n",
null,
null,
null,
"generic"
]
]
Sweet, so now we’ve isolated the response we want from the server. It looks very similar to the envelopes from our request. Here are the important indexes to keep in mind:
- Index 1 is the RPC ID of the function that’s returning the data. These match all the ones we sent in our request.
- Index 2 is the actual data, JSON.stringify’d like in our request. If the request fails for some reason, this will be
"[]\n"
. - Index 6 is the order of which the payload was processed.
The data at index 2 can be processed by any JSON interpreter. Our response is very long, so I won’t be going into it here, but if the outermost array isn’t empty it usually means the request was successful. You can find the indexes of the values you need on your own and trust that they won’t change (thanks, Protobuf!).
That’s it!
I wrote this purely because it was such a pain to figure out. I hope it finds researchers who are struggling, and makes pentesting Google apps more accessible to everyone.
If you know more, or think I missed something, please let me know. I’m still learning too.
Happy bug hunting!