all 5 comments
sorted by: best
[–]harlows_monkeys 9 points 2 years ago
I'm amazed at how poorly written some of the fundamental web specs are. Here's the heart of the spec for how to encode application/x-www-form-urlencoded data:
Control names and values are escaped. Space characters are replaced by +', and then reserved characters are escaped as described in RFC1738, section 2.2: Non-alphanumeric characters are replaced by%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
First problem: it says to do space to '+', then do the %HH encoding. So given the input "a b+c", the first step would give "a+b+c" as the input to the second step.
Second problem: '+' is not a reserved character in RFCC1738, so after %HH encoding the reserved characters in "a+b+c", we have "a+b+c". Unfortunately, we'd get the same result if the initial input had been "a+b c". Collision!
The second problem is resolved by noting upon a second reading the it says reserved characters are escaped as described in RFC1738. That could be interpreted to mean that then escaping is as described in the RFC--not that we are supposed to be using the same set of reserved characters as the RFC uses. The reserved characters for the form encoding are the non-alphanumeric characters.
That's a little better--it will give us "a%2Bb%2Bc". Still a collision though.
To fix that, we've got to read
Space characters are replaced by '+', and then reserved characters are escaped...
as meaning space characters are replaced by '+', and then any reserved characters OTHER THAN the '+'s that resulted from the space replacement are escaped.
[–][deleted] 1 point 2 years ago
[–]SoPoOneO 1 point 2 years ago
Why are spaces in the URL treated differently than spaces in the query string? According to this article spaces become &20 in the first case and + in the second.
[–]KayEss 3 points 2 years ago
%20 rather than &20, but to answer the actual question, I think the best way of looking at is that the space -> + transform is a bug.
Back in the day when this was being put together nobody was using file specifications with spaces in them -- after all, half of the software you'd want to use did all sorts of horrid random things if you had spaces in file names (this is mostly sorted out these days, but there are still some weirdness in dark corners).
So although spaces in filenames were rare, spaces in GET submissions were not and it was just felt that %20 was too ugly for something so common so the + was picked as an easier to read alternative. The confusion that this might cause wasn't anticipated though, or if it was then it wasn't given enough weight.
These days you can safely use spaces pretty much everywhere, but you cannot use + signs in URLs because too many systems are broken in their handling of them. Even if you correctly encode them as %2B some systems will go so far as to decode that to a + and then replace that with a space which it will re-encode as a %20 before requesting the URL back from your server. Ouch!
[–]Porges 1 point 2 years ago
Because the query string is x-www-form-urlencoded and then stuck into the URL. This is what the article is about...
Nothing more to say ;-).