Hacker News new | past | comments | ask | show | jobs | submit login
Limitations of the GET method in HTTP (dropbox.com)
127 points by varenc on March 2, 2015 | hide | past | favorite | 53 comments



>GET requests don’t have a request body, so all parameters must appear in the URL or in a header.

GETs can have bodies--- its just that some people who write software forget this.

A simple solution is to put a hash or maybe even just a body content length in the query string or a header so the server can make sure the body made it to it. If the the body doesn't make it, then it can inform the client retry with another less ideal HTTP method.


There's another article currently on the first page of HN discussing this issue. That post has a pointer to this [1] stackoverflow with the following quote from Roy Fielding:

    Yes. In other words, any HTTP request message is allowed to contain a message body,
    and thus must parse messages with that in mind. Server semantics for GET, however,
    are restricted such that a body, if any, has no semantic meaning to the request.
    The requirements on parsing are separate from the requirements on method
    semantics.

    So, yes, you can send a body with GET, and no, it is never useful to do so.

    This is part of the layered design of HTTP/1.1 that will become clear again once
    the spec is partitioned (work in progress).

    ....Roy
[1] -- http://stackoverflow.com/questions/978061/http-get-with-requ...


I should point out if you use my above approach and vary on the checksum/hash header, you don't even break the semantics here either...


Does that actually work in practice? Will the body make it through various opinionated routers, WAFs, etc? (I have no idea, but if not, it doesn't matter.)


It often does, and if not, the above approach is to inform your client of the failure so it can switch to using POST or whatever else is best.

Edit: As I mentioned on another follow up, you may need to vary on your checksum header to play nice with proxy caches, etc. Or mark your responses as uncachable.


Given the apparent desire to have GET-like requests with bodies that have meaning to the server, would there be any downside to introducing a new method for that purpose, lets say... QUERY?

So for example: QUERY example.com/api/thing ('{"thing_id": 00335}'::json)


By "some people", I assume that you mean many of the most popular http-based servers and frameworks.


They could have used a GET method with a request body.

Although it's debatable whether it's a better solution (semantically a request body should not influence the result of an HTTP GET response), but saying that "GET requests don’t have a request body" is false.


Yup, Elasticsearch ran into the same problem that Dropbox has, so they use request bodies to get around it.

From: https://github.com/elasticsearch/elasticsearch-definitive-gu... :

> A GET request with a body? The HTTP libraries of certain languages (notably Javascript) don’t allow GET requests to have a request body. In fact, some users are suprised that GET requests are ever allowed to have a body.

> The truth is that RFC 7231 — the RFC which deals with HTTP semantics and content — does not define what should happen to a GET request with a body! As a result, some HTTP servers allow it, and some — especially caching proxies — don’t.

> The authors of Elasticsearch prefer using GET for a search request because they feel that it describes the action — retrieving information — better than the POST verb. However, because GET with a request body is not universally supported, the search API also accepts POST requests:


The significant difference here is that people using elastic search typically control the machines and network involved while Dropbox does not. Therefore it is OK for elastic search to say, "This is a reasonable way to work, and it is up to you to use compatible software etc." But a similar decision on Dropbox's part would mean that a random fraction of their customers would have a broken experience.


If the request body effects the behavior, they are breaking the semantics of GET and breaking anything (e.g., caching, including local) built around the defined semantics.

There should be a GET-like method that takes a (significant) request body but is otherwise semantically like GET (e.g., safe) -- essentially, it would represent asking for a representation of the result of applying a pure function to the resource identified by the URI rather than asking for a representation of the resource itself.

But no such function currently exists in HTTP/1.1 or any general purpose extension (there are highly-specific things like the SEARCH method, but that's not general purpose the way the base HTTP/1.1 methods and, say, PATCH are.)


What about if the header included a hash of the content in the request body?


There's a REPORT method that can used with any content-type.


If you mean the WebDAV one in RFC 3253, its spec (as is usual for WebDAV methods) has some WebDAV specific requirements in it that aren't really appropriate for general use; content-type isn't the only place where problems with generality can be found.

Something like that completely decoupled from WebDAV is basically what I'd like to see.


I had no idea that was an option so I researched it a bit and found this stackoverflow question: http://stackoverflow.com/questions/978061/http-get-with-requ...

Note this part of the accepted answer:

> So, yes, you can send a body with GET, and no, it is never useful to do so.


Specifically the intent was to say that it would be a violation of the HTTP/1.1 spec to change the response based on the contents of the message body, however if /delta used a hash of the message body contents in the URL then it should still be semantically correct.


ooh, that's clever. but yeah, would be a pain for developers trying to actually work with the API, to have to create a hash just to make HTTP happy.


I ran across this when writing extension methods for HttpClient (.NET). HttpWebRequest did not like content-body being set for GET requests and threw this exception: System.Net.ProtocolViolationException: Cannot send a content-body with this verb-type.


I wonder how many lines of code contribute toward throwing that error? ;)


Another option is to use PUT/POST to create a query resource that you then refer to in your GET request. It can be useful if you want to refer to it and the server can use the information eg to cache the complex query.


That’s technically true, but performance-wise I’m not realistically going to submit two separate requests for what is semantically one action. On the other hand, it makes the query resource reusable for saved searches, and for showing search history.


It is not much different from a prepared statement or a stored query, indeed it can even be parameterized. With keepalive two requests is not so bad.


And then you need to store the queries in a database. Make sure we flush them out 'some time'. What happens when a client tries to use a flushed query? Something that previously works now no longer does.

Or, lets just use POST.


it would also require a long poll, but is definitely useful for certain situations where a synchronous operation would take too long.


Microservices largely break the standard HTTP patterns anyway. E.g. you are supposed to redirect after a POST request so that users can't double submit a form, but if you're hosting your frontend on a CDN then this isn't possible because the CORS spec prevents browsers from honoring redirect responses except for after GET and HEAD requests.

Incidentally, you are supposed to be allowed to redirect in response to GET requests, but if you use JWT auth then you can't even do this because the non-standard header forces a preflight request, and CORS requests with preflight aren't allowed to redirect at all.

I don't even understand how it's supposedly secure to pass back JSON with a URL for your promise to trigger a redirect, but it's insecure to pass your browser a 3XX response to do the same thing automatically.


To be fair, CORS is not a part of HTTP. If you're talking to your microservice with a browser, you've misunderstood the point of microservices.


> Microservices largely break the standard HTTP patterns anyway.

There's a difference between violating common patterns and breaking the specified semantics of the standard.

> E.g. you are supposed to redirect after a POST request so that users can't double submit a form

That's a common practice, but not a requirement of the HTTP standard the way that GET methods not having semantically-significant request bodies is.


(I feel like a broken record when this topic comes up, but...)

RFC5323 is still a thing, and the HTTP verb mechanism is still extensible. Use the SEARCH verb, submit your complex query in the method body. It's up to you to define the query structure appropriate for your use case, but MongoDB's query language is a good starting place.

http://tools.ietf.org/search/rfc5323


Like most things connected to WebDAV, RFC 5323's SEARCH method makes sense within WebDAV when used for the specific role it is intended for in WebDAV, but is specified with a whole bunch of requirements that aren't all that suitable for general use outside of WebDAV or its specific intended use within WebDAV -- e.g., the requirement of the server to accept XML requests and the specific definition of how it handles XML requests, and the requirement to use the 207 Multistatus response code on success. (Of course you dump all that, but then you aren't using RFC 5323, you are just inventing your own thing that shares the same name as the method defined there.)

REPORT (also from WebDAV) is somewhat closer to a general GET-with-body, but still has some WebDAV baggage.


> As a rule, HTTP GET requests should not modify server state. This rule is useful because it lets intermediaries infer something about the request just by looking at the HTTP method.

As long as we're picking nits, it doesn't really matter if a GET request modifies server state so long as the state after N requests is the same as the state after 1 request (idempotence).


Actually there are two guarantees for GET:

1) It is a "safe" method.

2) It is an idempotent method.

A safe method (GET/HEAD) should not modify server state.

"In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. "

Whereas PUT & DELETE are also idempotent but not "safe".

http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html


Also from that RFC:

> Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.

I guess my point is that idempotence is the aspect that matters to clients - you can have side-effects if you really want, so long as the request can be safely repeated.


...and in fact, there's a popular use-case called "using memcached" that does have side-effects. That is, doing a GET on something absolutely changes the server "state" (in the sense that memcached might have a key added to it) but that should be invisible to the user.

Of course, it's not actually-invisible: their next request might be a little bit faster.


Its probably better to think of GET as not affecting the state of the resources within the API rather than not affecting the state of the server.


Safe implies idempotent.

"Same effect from 0 or more actions" implies "Same effect from 1 or more actions".


All safe methods are idempotent, but not all idempotent methods are safe (e.g. PUT or DELETE.)


Safe means you can call it in any way you want. Like from a bot, or an accelerator plugin.


One reason to not use GET to modify server state is that bots and crawlers may hit your api. They almost universally use GET and not POST.


if your api is modifying server state w/o HMAC, api is probably broken.


HMAC won't save you if the application is public (and therefore available to scrapers. If a search engine or other bot can get a link to the page from another page in the app just as any interactive user would, it can also get a link with a correct HMAC token just as any interactive user would.


That isn't how HMAC works. http://en.wikipedia.org/wiki/Hash-based_message_authenticati...

The private key of the API user is hashed with the full URL of the service. The user id is sent in the header of the request. The server looks up the user id and gets their private key, it then uses that key to hash the URL, comparing it to the one passed.

There is no way for a scraper or bot to get the HMAC token.


For an authenticated user, yes, as you have somewhere to hang a known-by-the-server-but-unknown-elsewhere key from.

For a public interface you don't necessarily have that. Though once you are dealing with a service that can be modified by unauthenticated users you have larger problems from malicious interactive users than from accidents by scrapers, so I'm probably arguing an irrelevant point...


This doesn't really sound like a problem with HTTP, but more a problem with common implementations on clients and servers.

What they are doing with their delta API does seem like a GET is the correct option, and according to the spec a GET with a really long URI should be allowed.

It's good to raise awareness of this so http client and server software stops assuming that the URI is something small (historically entered by a human, or at least copied and pasted around)


Right, it is not a problem with HTTP. It is usually a problem with buffer sizes of (proxy) server implementations, resources (memory) or practical security concerns (DoS prevention). For example, if a proxy server uses a single buffer for evaluating the headers, you end up with a size that is too large for the majority of request, which raises resource and security concerns.


I believe they're right in using POST, and here's why:

- Using GET with a request body:

Per RFC 7231, Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content, section 4.3.1 [1]: "A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request."

Although you can argue whether or not you should be able to define the semantics of such a request body within the context of the resource, it quickly becomes moot in the face of reality: there's a good chance it won't work due to whatever client or intermediary is involved.

- Using non-HTTP verbs (i.e. REPORT, SEARCH):

Again, one can argue over their validity, but since most clients or intermediaries will not support them, there's little practical use. In this particular case, I'd recommend against them since most developers will not be familiar with them, making your API less user friendly.

- Using POST:

Per RFC 7231, Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content, section 4.3.3 [2]: "The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics."

In other words, do whatever you want – such as executing a search query contained within the request body. This is semantically valid, widely supported and common practice. The only downside would be that caching of POST requests is not widely supported, although still possible. The spec even suggests an alternative solution: "Responses to POST requests are only cacheable when they include explicit freshness information (see Section 4.2.1 of RFC7234). However, POST caching is not widely implemented. For cases where an origin server wishes the client to be able to cache the result of a POST in a way that can be reused by a later GET, the origin server MAY send a 200 (OK) response containing the result and a Content-Location header field that has the same value as the POST's effective request URI (Section 3.1.4.2)."

In other words, you can implement a caching mechanism for your query resource that exposes itself through a URL structure, available for GET requests.

[1] https://tools.ietf.org/html/rfc7231#section-4.3.1

[2] https://tools.ietf.org/html/rfc7231#section-4.3.3


> Using non-HTTP verbs (i.e. REPORT, SEARCH):

These are HTTP extensions, and are as much HTTP verbs as PATCH. The big problem is that REPORT and SEARCH are WebDAV extensions to HTTP, and their specs have WebDAV related baggage, and its at least as much "bad REST" to redefine the semantics of methods with existing standards as to use POST to get around the technical/practical problems with GET.


Let me clarify, by non-HTTP I mean "not defined in the HTTP/1.1: Semantics and Content RFC" (RFC7231) and therefor not likely to be commonly implemented.

Could you elaborate on what you consider to be "bad REST" with regards to using POST in this case? The RFC clearly leaves the definition of the semantics of POST up to the implementation of that particular resource.

The only argument I could come up with, is that POST is not guaranteed to be idempotent or safe, whereas a search query would be. But a lack of guarantee doesn't exclude use cases where it would be.


> Could you elaborate on what you consider to be "bad REST" with regards to using POST in this case?

I'm not saying POST is bad REST, I am acknowledging the claim by some that it is and, without debating whether or not it is, saying that using REPORT or SEARCH, as currently defined, is at least as bad as using POST from a REST point-of-view.

(I actually think that, in terms of using HTTP in a RESTful way, POST is the least-bad choice in terms of existing verbs, though it is not an exceptionally good fit -- its basically a case where POST-works-for-anything-for-which-no-other-method-seems-appropriate logic applies.)


Ah, my bad, I misunderstood. Thanks for clarifying!


A POST does not mean that a state change must happen as the result of the request. The server may just process the data of the body and return a result, without changing state.

If you happen to run into buffer size problems with GET headers, a POST can be an inconvenient answer (no caching). Or you create a new verb and customize your proxy's buffer settings for that special thing. This will make the pain explicit in your infrastructure.

But generally, I assume a complex operation behind a large URI. So a POST would do fine if the costs of the request are somewhat transparent to the user. And if you cannot do HTTP caching, an application cache would be the next thing to look for.


a bit of a tangent but...

> For forms that use HTTP POST, it may not be safe to retry so the browser asks the user for confirmation first.

this always makes me a bit mad. the user has absolutely no idea what the server is doing... and most users are baffled by this message and can't understand why they would want to do anything else other than resubmit the form, in the vast majority of cases.

if it really is potentially dangerous (it can be) browsers should apologise, not offer the user a gun to shoot his feet with.


We use http get body to be able to submit searches to the servers. It feels somehow cleaner that doing a complex search in an encoded url.

The client can decide to use query string or body. Most ruby web servers handle this transparently, so the app doesn't know if the params came from the query string or from the body.


I've hit this a few times. One way is to try to reduce the shape of the parameters. There is obviously a balance between magnifying and making it readable.

In one of the cases I ended up using POST as well. It was easy as that resource didn't already have a POST method (like in this case).




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: