Javascript replace plus sign with space (but not %20)

Javascript replace plus sign with space (but not %20) - javascript

I am trying to use the League of Legends API and request data on a certain user. I use the line
var user = getUrlVars()["username"].replace("+", " ");
to store the username. However, when I do the XMLHttpRequest with that username, it'll put %20 instead of a space.
y.open("GET", "https://na.api.pvp.net/api/lol/na/v1.4/summoner/by-name/"+user, false);
Edit: When I run this code with a user that has no space in their name it works, however when they have a space in their name it says the user is undefined.
For example, if I was looking for the user "the man", it would do a get at
https://na.api.pvp.net/api/lol/na/v1.4/summoner/by-name/the%20man
But the correct request URL is
https://na.api.pvp.net/api/lol/na/v1.4/summoner/by-name/the man

When you're creating a URL, you should use encodeURIComponent to encode all the special characters properly:
y.open("GET", "https://na.api.pvp.net/api/lol/na/v1.4/summoner/by-name/"+encodeURIComponent(user), false);

Actually there are no "spaces" in the summoner names on Riot's side. So:
https://na.api.pvp.net/api/lol/na/v1.4/summoner/by-name/the man
Becomes:
https://na.api.pvp.net/api/lol/na/v1.4/summoner/by-name/theman
Have a look at this: https://developer.riotgames.com/discussion/community-discussion/show/jomoRum7
I am unsure how + are handled (in fact I don't think you're able to have a + in your name). All you have to do is remove the spaces.
For "funny" characters, just request them with the funny character in them, and Riot returned it fine.
https://euw.api.pvp.net/api/lol/euw/v1.4/summoner/by-name/Trøyer?api_key=<insert your own>
will auto correct to
https://euw.api.pvp.net/api/lol/euw/v1.4/summoner/by-name/Tr%C3%B8yer?api_key=<insert your own>
and you generally don't even have to decode it. (I used JS as my language to fetch it, if you use something else your results may require the decoded value)

What you're experiencing is correct behaviour and is called URL encoding. HTTP requests have to conform to certain standards. The first line is always made up of three parts delimited by a space:
Method (GET, POST, etc.)
Path (i.e. /api/lol/na/v1.4/summoner/by-name/the%20man)
HTTP version (HTTP/1.1, HTTP/1.0, etc.)
This is usually followed by HTTP headers which I'll leave out for the time being since it is beyond the scope of your question (if interested, read this https://www.rfc-editor.org/rfc/rfc7230). So a normal request looks like this:
GET /api/lol/na/v1.4/summoner/by-name/the%20man HTTP/1.1
Host: na.api.pvp.net
User-Agent: Mozilla
...
With regards to your original question, the reason the library is URL encoding the space to %20 is because you cannot have a space character in the request line. Otherwise, you would throw off most HTTP message parsers because the man would replace the HTTP version line like so:
GET /api/lol/na/v1.4/summoner/by-name/the man HTTP/1.1
Host: na.api.pvp.net
User-Agent: Mozilla
...
In most cases, servers will return a 400 bad request response because they wouldn't understand what HTTP version man refers to. However, nothing to fear hear, most server-side applications/frameworks automatically decode the %20 or + to space prior to processing the data in the HTTP request. So even though your URL looks unusual, the server side will process it as the man.
Finally, one last thing to note. You shouldn't be using the String.replace() to URL decode your messages. Instead, you should be using decodeURI() and encodeURI() for decoding and encoding strings, respectively. For example:
var user = getUrlVars()["username"].replace("+", " ");
becomes
var user = decodeURI(getUrlVars()["username"]);
This ensures that usernames containing special characters (like / which would be URL encoded as %2f) are also probably decoded. Hope this helps!

Related

What does colon ':' in API request mean?

i am using firebase for my project ,the documentation gives me the endpoint for signing in users as:
https://identitytoolkit.googleapis.com/v1/accounts:signInWithPassword?key=[API_KEY]
i want to know what does the colon : mean, for example the word key after the question mark shows its a parameter likewise what does the notion accounts:signInWithPassword mean.The reason:I have an axios instance with config:
axios.create(
{
baseURL:"https://identitytoolkit.googleapis.com/v1",
params:{
apiKey:"somekey"
}
})
now since the baseUrl shown above remains same for firebase signing in with email and password or signing up with email and password. I want to dynamically embed accounts:signInWithPassword and accounts:signUp for respective requests and i am not sure if specifying accounts:respectiveUsecase in params object would work.

A colon doesn't have any special meaning in an URL path. It's just a convention those APIs tend to use in their paths.
There are a handful of metacharacters that do:
question marks (?) and hashes (#) delimit the query or search parts
% is used for escaping characters (e.g. %0A)
+ is sometimes an encoding for a space instead of %20.
& generally separates query parameters (e.g. foo=bar&baz), though this is not a part of the standard. Some server software could expect e.g. semicolon-separated parameters.
As #deceze pointed out, colons do have a special meaning in the host part, e.g. https://user:pass#host/path:where:colons:do:not:matter.

It is a dynamic value (like a parameter where you in pass in a value directly)
:nounId: The colon (:) before the word indicates that we don't mean the literal string "nounId" as part of the endpoint, but rather that we are expecting some dynamic data to be inside there. From the above example of /ski/:skiId, one actual endpoint might be something like /ski/1234 (where 1234 is the unique ID number of one of the skis in our database.
source: https://coursework.vschool.io/rest-api-design/#:~:text=%3AnounId%20%3A%20The%20colon%20(%3A)%20before,data%20to%20be%20inside%20there.

What is the right way to deal with special characters in resource URIs? How to properly escape?

Our application implements APIs to access entity details via AJAX GET requests - it once used POST but we decided to change our standards.
Unfortunately we hit a design flaw recently.
Suppose our API is http://localhost/app/module/entity/detail/{id} where {id} is a Spring #PathVariable that matches that entity's primary key.
If the primary key is a numeric surrogate key (auto-increment) there is no problem then. But we thought that this all worked with String primary keys.
It happened that we found some valid production data contain slashes, backslashes and semicolons for primary key. Our tables cannot use auto-increment surrogate keys because of their gargantuar size.
More in general, we discovered that we were unprepared to handle non-alphanumeric characters.
What the defective code was originally
Here is how our application used to reach to display entity data in a single page form the entity list:
User navigates table.jsp where an AJAX list is retrieved via POST
An Angular expression constructs the link to the detail page of that entity by means of detail?entityId={{::row:idAttribute}}
A valid link is generated by Angular
Examples:
detail?entityId=5903475934 //numeric case, we have several details page with same navigation pattern
detail?entityID=AAABBBCCC
detail?entityID=DO0000099101\test
The user clicks on the link and the browser points to the corresponding address
http://localhost/blablabla/detail?entityId=DO0000099101\test //Could be escaped.... read more later
The page needs the ID code from the parameters to issue the correct AJAX call
Need to retrieve the Entity ID from the query string. The Angular controller is in a separate file that doesn't see the query string (and is included dynamically, having the same page name)
<script type="text/javascript">
var ____entityId = '${param.entityId}';
</script>
Which gets translated to
<script type="text/javascript">
var ____entityId = 'DO0000099101\test'; //Yes, I know it's incorrect because now we have a TAB
</script>
Angular fetches the Entity ID into the page scope
In the Angular controller, we do the following
$scope.entityId = ___entityId;
$http.get($const.urlController+'detail/'+$scope.entityId)
Ajax call is issued
http://localhost/blablablabla/entity/detail/DO0000099101est //URL is bad
Spring MVC decodes the #PathVariable
Unfortunately, the parameter is treated as DO0000099101est
What we tried to fix
We have tried to fix the mistake by escaping the \t as it is clear that its presence in the HTML content constitutes a bug. Javascript interprets it as a tab.
Trying to URL-escape the ID
We tried to manually navigate to http://localhost/blablabla/detail?entityId=DO0000099101%5Ctest by escaping the backslash to its URL entity. The purpose is that if this worked we could modify Angular code in the table page
The result is that the \t appeared again as it is in the Javascript fragment. From that point, the sequence is the same
<script type="text/javascript">
var ____entityId = 'DO0000099101\test';
</script>
Trying to Java-escape + URL-Escape the ID in the URL
What if the Angular table page escaped the URL by this way?
http://localhost/blablabla/detail?entityId=DO0000099101%5C%5Ctest
<script type="text/javascript">
var ____entityId = 'DO0000099101\\test'; //Looks great
</script>
Unfortunately Angular now reverses the backslash into a forward slash when performing the Ajax request
http://localhost/blablabla/detail/DO0000099101/test
Trying to perform the Ajax call manually with escaped URL
So know that the REST url is http://localhost/app/module/entity/detail/{id}, let's try to see how Spring expects the backslash to be escaped
http://localhost/app/module/entity/detail/DO0000099101%5Ctest //results in 400 error
http://localhost/app/module/entity/detail/DO0000099101\\test //Chrome reversed the backslash into a forward slash and the result is 404 as expected for http://localhost/app/module/entity/detail/DO0000099101//test
Using encodeURIComponent
We already tried this (but I didn't mention in the original post). To #Thomas comment, we tried again to encode the primary key by Javascript escaping in the controller itself.
$http.get($const.urlController+'detail/'+encodeURIComponent($scope.entityId))
The problem with this approach is that the backslash-T sequence in the URL is encoded to %09, which results in a tab decoded on the server side
Using both encodeURIComponent and a Java escape
Now we tried to use both approach #4 and fixing the hardcoded Java-escaped (i.e. \\t) into the Javascript for testing purposes.
<script type="text/javascript">
var ____entityId = 'DO0000099101\\test';
</script>
$http.get($const.urlController+'detail/'+encodeURIComponent($scope.entityId))
This time the Ajax call resulted in 400 error as case #3
http://localhost/app/module/entity/detail/DO0000099101%5Ctest //results in 400 error
Question time
I want to ask this question in two ways, one specific and one more general.
Specific: how can I properly format a #PathVariable so that it can preserve special characters?
General: in the REST world, what care is to be done when dealing with resource IDs that may contain special characters?
For the second, let me be more clear. If you control your REST application and generation of IDs, you can choose to allow only alphanumeric identifier (user-generated or random or whatever) and sleep happy.
But in our case we need to browse entities that come supplied from external systems that have broader restrictions on character sets allowed for entity primary identifiers.
And again, we cannot use auto-increment surrogate keys.

XMLHttpRequest stalls on send() if I attempt to pass an encoded URL containing an ampersand

I'm making an Ajax call that sends a URL to a PHP script. Oftentimes these URLs are complex and contain special characters. If I don't encode the URL before sending, the PHP script works, but the URL's values are interpreted as my values (which is expected).
If I do encode a URL that doesn't contain special characters, like 'http://google.com' for example, the XMLHttpRequest sends my parameters to my PHP script as expected and everything works fine.
The issue is when I encode a URL that does contain special characters, like 'http://google.com?this=that&that=this' -- send() doesn't ever reach my PHP script. It just hangs indefinitely. No errors or anything. I'm stumped.
Here's what I'm working with:
function getContent(content, sumbittedUrl){
var postContent = encodeURIComponent(content);
var postUrl = encodeURIComponent(submittedUrl);
var postString = 'content=' + postContent + '&url_get=' + postUrl;
var httpRequest;
//create XMLHttpRequest
.
.
.
httpRequest.onreadystatechange = alertContents;
httpRequest.open('POST', 'php/my_php_script.php', true);
httpRequest.setRequestHeader('Content-type', 'application/x-www-form-urlencoded');
httpRequest.send(postString);
}
I'm a JavaScript novice so it's possible (likely, even) that I overlooked something that someone with more experience would spot quickly. Any help would be appreciated. Thanks.

I found a somewhat anti-climatic solution. I'm still unsure why encodeURIComponent() doesn't work as expected, but the less destructive encodeURI() is working fine. encodeURI() doesn't touch these characters: ; , / ? : # & = + $ so the URL's form stays intact. Potentially dangerous characters (e.g. < >) are still encoded.
The downside is that since ampersands are among the characters that don't get encoded, it means more work server-side to sort out incoming values.
EDIT:
It turns out it was a server-side issue after all. I was retrieving the values fine the whole time. The issue was with a cURL request using the aforementioned values. Don't trust PHP errors/warnings (or lack of them).

Handling unicode in the http response xml

I'm writing a Google Chrome extension that builds upon myanimelist.net REST api. Sometimes the XMLHttpRequest response text contains unicode.
For example:
<title>Onegai My Melody Sukkiriâ�ª</title>
If I create a HTML node from the text it looks like this:
Onegai My Melody Sukkiriâ�ª
The actual title, however, is this:
Onegai My Melody Sukkiri♪
Why is my text not correctly rendered and how can I fix it?
Update
Code: background.html
I think these are the crucial parts:
function htmlDecode(input){
var e = document.createElement('div');
e.innerHTML = input;
return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}
function xmlDecode(input){
var result = input;
result = result.replace(/</g, "<");
result = result.replace(/>/g, ">");
result = result.replace(/\n/g, "
");
return htmlDecode(result);
}
Further:
var parser = new DOMParser();
var xmlText = response.value;
var doc = parser.parseFromString(xmlDecode(xmlText), "text/xml");

<title>Onegai My Melody Sukkiriâ�ª</title>
Oh dear! Not only is that the wrong text, it's not even well-formed XML. acirc and ordf are HTML entities which are not predefined in XML, and then there's an invalid UTF-8 sequence (one high byte, presumably originally 0x99) between them.
The problem is that myanimelist are generating their output ‘XML’ (but “if it ain't well-formed, it ain't XML”) using the PHP function htmlentities(). This tries to HTML-escape not only the potentially-sensitive-in-HTML characters <&"', but also all non-ASCII characters.
This generates the wrong characters because PHP defaults to treating the input to htmlentities() as ISO-8859-1 instead of UTF-8 which is the encoding they're actually using. But it was the wrong thing to begin with because the HTML entity set doesn't exist in XML. What they really wanted to use was htmlspecialchars(), which leaves the non-ASCII characters alone, only escaping the really sensitive ones. Because those are the same ones that are sensitive in XML, htmlspecialchars() works just as well for XML as HTML.
htmlentities() is almost always the Wrong Thing; htmlspecialchars() should typically be used instead. The one place you might want to encode non-ASCII bytes to entity references would be when you're targeting pure ASCII output. But even then htmlentities() fails because it doesn't make character references (&#...;) for the characters that don't have a predefined entity names. Pretty useless.
Anyway, you can't really recover the mangled data from this. The � represents a byte sequence that was UTF-8-undecodable to the XMLHttpRequest, so that information is irretrievably lost. You will have to persuade myanimelist to fix their broken XML output as per the above couple of paragraphs before you can go any further.
Also they should be returning it as Content-Type: text/xml not text/html as at the moment. Then you could pick up the responseXML directly from the XMLHttpRequest object instead of messing about with DOMParsers.

So, I've come across something similar to what's going on here at work, and I did a bit more research to confirm my hypothesis.
If you take a look at the returned value you posted above, you'll notice the tell-tell entity "â". 99% of the time when you see this entity, if means you have a character encoding issue (typically UTF-8 characters are being encoded as ISO-8859-1).
The first thing I would test for is to force a character encoding in the API return. (It's a long shot, but you could look)
Second, I'd try to force a character encoding onto the data returned (I know there's a .htaccess override, but I don't know what's allowed in Chrome extensions so you'll have to research that).
What I believe is going on, is when you crate the node with the data, you don't have a character encoding set on the document, and browsers (typically, in my experience) default to ISO-8859-1. So, check to make sure it's not your document that's the problem.
Finally, if you can't find the source (or can't prevent it) of the character encoding, you'll have to write a conversation table to replace the malformed values you're getting with the ones you want { JS' "replace" should be fine (http://www.w3schools.com/jsref/jsref_replace.asp) }.

You can't just use a simple search and replace to fix encoding issue since they are unicode, not characters typed on a keyboard.
Your data must be stored on the server in UTF-8 format if you are planning on retrieving it via AJAX. This problem is probably due to someone pasting in characters from MS-Word which use a completely different encoding scheme (ISO-8859).
If you can't fix the data, you're kinda screwed.
For more details, see: UTF-8 vs. Unicode

Why use encodeURIComponent() when writing json to a cookie

In particular, when saving a JSON to the cookie is it safe to just save the raw value?
The reason I dopn't want to encode is because the json has small values and keys but a complex structure, so encoding, replacing all the ", : and {}, greatly increases the string length

if your values contain "JSON characters" (e.g. comma, quotes, [] etc) then you should probably use encodeURIComponent so these get escaped and don't break your code when reading the values back.

You can convert your JSON object to a string using the JSON.stringify() method then save it in a cookie.
Note that cookies have a 4000 character limit.
If your Json string is valid there should be no need to encode it.
e.g.
JSON.stringify({a:'foo"bar"',bar:69});
=> '{"a":"foo\"bar\"","bar":69}' valid json stings are escaped.

This is documented very well on MDN
To avoid unexpected requests to the server, you should call encodeURIComponent on any user-entered parameters that will be passed as part of a URI. For example, a user could type "Thyme &time=again" for a variable comment. Not using encodeURIComponent on this variable will give comment=Thyme%20&time=again. Note that the ampersand and the equal sign mark a new key and value pair. So instead of having a POST comment key equal to "Thyme &time=again", you have two POST keys, one equal to "Thyme " and another (time) equal to again.

If you can't be certain that your JSON will not include reserved characters such as ; then you will want to perform escaping on any strings being stored as a cookie. RFC 6265 covers special characters that are not allowed in the cookie-name or cookie-value.
If you are encoding static content you control, then this escaping may be unnecessary. If you are encoding dynamic content such as encoding user generated content, you probably need escaping.
MDN recommends using encodeURIComponent to escape any disallowed characters.
You can pull in a library such as cookie to handle this for you, but if your server is written in another language you will need to ensure it uses a library or language utilities to encodeURIComponent when setting cookies and to decodeURIComponent when reading cookies.
JSON.stringify is not sufficient as illustrated by this trivial example:
const bio = JSON.stringify({ "description": "foo; bar; baz" });
document.cookie = `bio=${stringified}`;
// Notice that the content after the first `;` is dropped.
// Attempting to JSON.parse this later will fail.
console.log(document.cookie) // bio={\"description\":\"foo;

Cookie: name=value; name2=value2
Spaces are part of the cookie separation in the HTTP Cookie header. Raw spaces in cookie values could thus confuse the server.

We Keep Coding

JavaScript is the programming language of the Web.