Safely adding a customizable "tracking code" field to php web application

Safely adding a customizable "tracking code" field to php web application - javascript

I have a small web application written in vanilla PHP with MySQL database, on which registered users are able to create custom profile pages.
Id like to add a textarea form field in the control panel, for users to add their custom tracking code (namely Facebook Pixel or Google Analytics) for their tracking purposes.
My question is, what is the correct way to add such functionality? I'm afraid letting my users to "inject" custom code would lead to security issues for my website. As far as i know the aforementioned tracking codes use regular JS/HTML for their tracking. If that is the case how to allow JS while restricting server side code, like PHP, from being executed?
Any help would be greatly appreciated.

The tracking code is always the same, apart from one or two elements. For instance, in a Google Tracking code the only variable element is a tracking ID in the format: UA-XXXXX-Y.
You could let the user select which type of tracking they want and let them only enter the tracking ID, and perhaps one more variable. You can easily check whether these have been entered in the correct format. From that you can generate the Javascript code yourself.
Alternatively, now that you know the code is always the same you could also use that to check what has been entered. Strip out any non-linguistical elements like space, and check it against your template: Only the ID's should be variable and you know their format.
However, I think this last method is less user-friendly, because it relies on the users to provide you with a rather exact copy of the code. That might be too difficult, and that will be your problem.

Related

How could I create a one click hyperlink for websites like https://www.integral-calculator.com/ that automatically pastes into the Javascript box?

My web app generates Calculus I problems in Latex and non-Latex form.
I use these two websites to check my answers:
https://www.derivative-calculator.net/
https://www.integral-calculator.com/
E.g., I copy/paste the non-Latex form of the math problem log(v)/v**4 and put it into the Integral calculator website to understand the gaps in my knowledge that is stopping me from solving the problem independently on my own by hand writing/hand solving.
I am writing this post because I want to improve my web app and reduce the steps it takes for the user to check their answers to one of these calculators. Right now, the user has to copy the equation, click the website link, then paste the equation into the website Javascript box, and finally click "Go" to be able to see suggested steps to solve the problem.
I want to take some string the represents my equation (e.g. log(v)/v**4) and turn it into a one click hyperlink like https://www.integral-calculator.com/log(v)/v**4
A link such as this does not work, presumably because the website is using a JavaScript or MathJax feature.

You are correct in that, as these websites do not use URL parameters, you will not be able to generate a URL from within your app that auto-populates the required fields. Additionally, these websites do not appear to use any sort of publicly accessible API (they take form data and process the results on the server side).
My suggestion would be to use an existing public API to check results within your app. I believe Wolfram Alpha allows up to 2000 API calls per month for free.

Write a auto-fill and submit web form program

I am trying to write a program which can automatically fill in and submit a form in a web in particular time slot.
But i have no idea how and where to start. i searched this in google, but only resulting very general answer like using JavaScript, python. Can anyone tell me which languages should i learn first?

Despite the fact that the generic advice on this thread is quite good, it's pretty broad. I have tackled this problem myself, and despite the fact that I posted a fully functional example, it was deleted by a moderator, despite "theoretically answering the questions".
So, for anyone else looking to solve this problem, you will want to do the following:
Use Selenium and openpyxl, these are two modules that are relatively straight forward and will perform this task perfectly.
You will use selenium to open your web page, and retrieve the relevant html elements that you wish to populate. I suggest finding elements by xPath if you aren't well versed in HTML. The Xpath finder google chrome addon will make this very easy to do.
The driver.get() and driver.find_element_by_xpath() will be the functions that you need.
We will use openpyxl to work with our excel sheet. 'load_workbook()' will load a workbook. We will then use the 'sheet = workbook.active' function to access a sheet from within the workbook.
We now have the functionality to open our website and select an excel sheet.
Now we need to assign cell values to variables, so that we can then populate the HTML form. We assign a variable to each COLUMN in the workbook. So if column A contained first_names we could assign that to by a variable by writing 'FNAME = sheet['A']. Now that we have a way of referring to cells within columns we can begin feeding the data into our HTML form.
We populate our form by using the .send_keys() function in Selenium.
first_name.send_keys(FNAME.value)
the .value makes sure that the correct value is displayed, as sometimes selenium will print the cell function rather than value, we do not want this to happen.
Now that we can print values to our HTML forms from our excel sheet we will need to iterate through each row. We do this with a simply while loop:
i = 1
x = 1
while x <= 50:
first_name.send_keys(FNAME[i].value)
i+=1
x+=1
driver.quit
Once the loop occurs 50 times, the driver will quit, closing the browser and stopping the script.
Some other misc stuff you may find useful when trying to automate this:
driver.back()
time.sleep()
If you would like to see an actual working example feel free to PM me, as apparently posting it here doesn't contribute to the discussion.

The answers you found, as general as they are, are correct. I'll try to explain it to you.
As you are trying to automatize some activity, you have to use a script language to basically
Get stuff references (like getting indexes, forms/fields IDs, etc)
Insert/change stuff (like filling a field, following the field references)
Save stuff (prepare a file, with field references and it's content, that can be injected to the specific system or website)
One example of script language is Python.
So you already have your script. Now you need to inject it on the page. The programming language that can do it for you is Javascript, or JS. It allows you to talk with the webpage, GETting data/references or POSTing data.

I suggest you to write a Python script using a library called Selenium Webdriver.
It will be easy to implement what you have in mind.

Prevent user-entered scripts from running in webpage

In my application, there is a comment box. If someone enters a comment like
<script>alert("hello")</script>
then an alert appears when I load that page.
Is there anyway to prevent this?

There are several ways to address this, but since you haven't mentioned which back-end technology you are using, it is hard to give anything but rough answers.
Also, you haven't mentioned if you want to allow, or deny, the ability to enter regular HTML in the box.
Method 1:
Sanitize inputs on the way in. When you accept something at the server, look for the script tags and remove them.
This is actually far more difficult to get right then might be expected.
Method 2:
Escape the data on the way back down to the server. In PHP there is a function called
htmlentities which will turn all HTML into which renders as literally what was typed.
The words <script>alert("hello")</script> would appear on your page.
Method 3
White-list
This is far beyond the answer of a single post and really required knowing your back-end system, but it is possible to allow some HTML characters with disallowing others.
This is insanely difficult to get right and you really are best using a library package that has been very well tested.

You should treat user input as plain text rather than HTML. By correctly escaping HTML entities, you can render what looks like valid HTML text without having the browser try to execute it. This is good practice in general, for your client-side code as well as any user provided values passed to your back-end. Issues arising from this are broadly referred to as script injection or cross-site scripting.
Practically on the client-side this is pretty easy since you're using jQuery. When updating the DOM based on user input, rely on the text method in place of the html method. You can see a simple example of the difference in this jsFiddle.

The best way is replace <script> with other string.For example in C#use:
str.replace("<script>","O_o");
Other options has a lot of disadvantage.
1.Block javascript: It cause some validation disabled too.those validation that done in frontend.Also after retrive from database it works again.I mean attacker can inject script as input in forms and it saved in database.after you return records from database in another page it render as script!!!!
2.render as text. In some technologies it needs third-party packages that it is risk in itself.Maybe these packages has backdoor!!!

convert value into string ,it solved in my case
example
var anything

Can an attacker insert malicious code in html5 data attributes

So I learned earlier that using the data attribute in html5 you could insert values to be handled in a javascript file. e.g
Hey
the handling javascript file will have a line to handle that link tag which might do this
var value=$('.check').data('name');
window.location.href="http://www.example.com/'+value+'";
Now I was wondering, can a malicious coder exploit this? Do you need to sanitize the value before using it for a redirect?

It really depends.
An attacker can modify anything he wants in his browser, so it doesn't matter how much sanitization you put in the front-end, an attacker can work his way around all your javascript functions and the like to circumvent your front-end code.
I'm not saying that you shouldn't sanitize your input in the front-end because it will always help in terms of usability and experience for a legitimate user.
If the address that you're redirecting your user to uses that data attribute to do something with the server, then yes by all means sanitize it in both places: front and back end. Otherwise, you shouldn't worry, the worst case scenario is that a malicious user (or a knowledgable one) will end up in a 404 page.
** EDIT **
After reading your comment in this answer, here's my updated answer:
The dangers reside in how you're using that piece of information. Take as an example google analytics script:
Google provides with you a script that will help you track your visitors actions and behaviors through google analytics interface.
If you change any value in google's script, google analytics won't work, and there's no way you can hack google through the analytics script.
How does google achieve this? They put all their security in the backend, and they sanitize modifiable user input that will be rendered in a website, stored in a database or somehow interacts with the server.
Back to your case:
If you're going to use that data attribute to do a document.write(), an eval, do a database lookup or any sensitive operation (delete, update, retrieve data) then yes by all means: sanitize it.
How are you going to sanitize it? That's problem specific and more than likely you should ask a new question.

If the HTML is taken from user input or generated from user input, yes, you should definitely perform sanitation. However, if you're asking if data attributes are somehow vulnerable in a way other attributes aren't, the answer is no.

A user with access to the browser (e.g. via XSS) can insert anything into a data attribute. But (s)he can just redirect anywhere at anytime, so this trivial case is irrelevant.
If the value is set by a user via some other means, then the link could be set somewhere other than intended within the same domain. That might be annoying but it shouldn't be a security risk.
If you're doing something else, like including a javascript string for eval in the attribute and that comes from a user (e.g. via a database value), then you will create an XSS vulnerability. But you should never, ever, ever, trust user supplied values anyway. Nothing special about html data attributes there.

Do you need to sanitize the value before using it for a redirect?
No need to sanitize before, but you need to sanitize after.
In your example, if you are not sanitizing data - you can get a victim of classic XSS.
I.e: http://www.example.com/ + value, where value is search?q=<script>alert(1)</script>, and where search page actually outputs raw query to the browser.
p.s.: this is not specific to data-attributes. It will work the same with normal attributes.

What precautions should I take to prevent XSS on user submitted HTML?

I'm planning on making a web app that will allow users to post entire web pages on my website. I'm thinking of using HTML Purifier but I'm not sure because HTML Purifier edits the HTLM and it's important that the HTML is maintained just how it was posted. So I was thinking making some regex to get rid of all script tags and all the javascript attributes like onload, onclick, etc.
I saw a Google video a while ago that had a solution for this. Their solution was to use another website to post javascript in so the original website cannot be accessed by it. But I don't wanna purchase a new domain just for this.

be careful with homebrew regexes for this kind of thing
A regex like
s/(<.*?)onClick=['"].*?['"](.*?>)/$1 $3/
looks like it might get rid of onclick events, but you can circumvent it with
<a onClick<a onClick="malicious()">="malicious()">
running the regex on that will get you something like
<a onClick ="malicious()">
You can fix it by repeatedly running the regex on that string until it doesn't match, but that's just one example of how easy it is to get around simple regex sanitizers.

The most critical error people make when doing this is validating things on input.
Instead, you should validate on display.
The context matters when determing what is XSS and what isn't. Therefore, you can happily accept any input, as long as you pass it through appropriate cleaning functions when displaying it.
Consider that something that constitutes 'XSS' will be different when the input is placed in a '<a href="HERE"> as opposed to <a>here!</a>.
Thus, all you need to do, is make sure that any time you write user data, you consider, very carefully, where you are displaying it, and make sure that it can't escape the context you are writing it to.

If you can find any other way of letting users post content, that does not involve HTML, do that. There are plenty of user-side light markup systems you can use to generate HTML.
So I was thinking making some regex to get rid of all script tags and all the javascript attributes like onload, onclick, etc.
Forget it. You cannot process HTML with regex in any useful way. Let alone when security is involved and attackers might be deliberately throwing malformed markup at you.
If you can convince your users to input XHTML, that's much easier to parse. You still can't do it with regex, but you can throw it into a simple XML parser, and walk over the resulting node tree to check that every element and attribute is known-safe, and delete any that aren't, then re-serialise.
HTML Purifier edits the HTLM and it's important that the HTML is maintained just how it was posted.
Why?
If it's so they can edit it in their original form, then the answer is simply to purify it on the way out to be displayed in the browser, not on the way in at submit-time.
If you must let users input their own free-form HTML — and in general I'd advise against it — then HTML Purifier, with a whitelist approach (ban all elements/attributes that aren't known-safe) is about as good as it gets. It's very very complicated and you may have to keep it up to date when hacks are found, but it's streets ahead of anything you're going to hack up yourself with regexes.
But I don't wanna purchase a new domain just for this.
You can use a subdomain, as long as any authentication tokens (in particular, cookies) can't cross between subdomains. (Which for cookies they can't by default as the domain parameter is set to only the current hostname.)
Do you trust your users with scripting capability? If not don't let them have it, or you'll get attack scripts and iframes to Russian exploit/malware sites all over the place...

Make sure that user content doesn't contain anything that could cause Javascript to be ran on your page.
You can do this by using an HTML stripping function that gets rid of all HTML tags (like strip_tags from PHP), or by using another similar tool. There are actually many reasons besides XSS to do this. If you have user submitted content, you want to make sure that it doesn't break the site layout.
I belive you can simply use a sub-domain of your current domain to host Javascript, and you will get the same security benefits for AJAX. Not cookies however.
In your specific case, filtering out the <script> tag and Javascript actions is probably going to be your best bet.

1) Use clean simple directory based URIs to serve user feed data.
Make sure when you dynamically create URIs to address the user's uploaded data, service account, or anything else off your domain make sure you don't post information as parameters to the URI. That is an extremely easy point of manipulation that could be used to expose flaws in your server security and even possibly inject code onto your server.
2) Patch your server.
Ensure you keep your server up to date on all the latest security patches for all the services running on that server.
3) Take all possible server-side protections against SQL injection.
If somebody can inject code to your SQL database that can execute from services on your box that person will own your box. At that point they can then install malware onto your webserver to be feed back to your users or simple record data from the server and send it out to a malicious party.
4) Force all new uploads into a protected sandboxed area to test for script execution.
No matter how you try to remove script tags from submitted code there will be a way to circumvent your safeguards to execute script. Browsers are sloppy and do all kinds of stupid crap they are not supposed to do. Test your submissions in a safe area before you publish them for public consumption.
5) Check for beacons in submitted code.
This step requires the previous step and can be very complicated, because it can occur in script code that requires a browser plugin to execute, such as Action Script, but is just as much a vulnerability as allowing JavaScript to execute from user submitted code. If a user can submit code that can beacon out to a third party then your users, and possibly your server, is completely exposed to data loss to a malicious third party.

You should filter ALL HTML and whitelist only the tags and attributes that are safe and semantically useful. WordPress is great at this and I assume that you will find the regular expressions used by WordPress if you search their source code.

We Keep Coding

JavaScript is the programming language of the Web.