I am creating a website in php. One of the features is that users can edit their own pages by entering custom html code. Right now, you can enter code into a textarea and have it displayed in a div. In the future I plan on adding more helpful tools to the user.
My question is how to protect my site from malicious code. I know Facebook has an option to put custom HTML in a page tab so it can be done safely. Currently, the html is being displayed by a php script that echos it onto a page so users can enter javascript in <script> tags as well. I don't know the full limits of javascript and html but I know that custom javascript embedded into the website has the potential to screw things up.
Here are my ideas so far:
Remove all javascript from user code
Pros: Easy
Cons: Users can't do anything interesting with javascript
Limit the javascript to only execute inside the display div
Pros: Safe custom javascript
Cons: May be impossible/very difficult
If anyone has ideas about how to do this or how Facebook did this, I would love to know! Thanks in advance.
If you are using php, an excellent solution is to use HTMLPurifier. It has many options to filter out bad stuff, and as a side effect, guarantees well formed html output.
Related
I've developed a form using raw HTML code as well as referenced inside separate CSS and javascript code files. Now, what I would like to do is to create a custom form in WordPress based on my raw HTML, CSS and javascript files. I am not sure which approach would make the most sense. The form is also supposed to send an email containing user-filled form fields. Is there any standard WordPress approach to achieve it? Should I create a custom plugin or perhaps use the already existing one? I found few which seem to enable certain custom codes (eg. HTML Forms, Insert Html Snippet, Raw HTML) but do they really? The number of options is really overwhelming and I am not convinced to any of them.
I am sure some of you dealt with something like that before, is it personal experience or helping out someone in a similar case. Hopefully, you could point me in the right direction.
Can i hide my wordpress blog site's page source like this? I have to ask this question because there are many plugins which can disable page source and right-click. but after typing manually "view-source:url" in browser it instantly shows the source-code of wordpress site.If any body knows answer please write down . this is very important for all bloggers 'I think' .
TLDR: No, you can never hide the soure code of your page. There is NO way.
For a browser to render a web site you need to send it the HTML, CSS, and javascript code. Even though you can make it "harder" to see the code by disabling right click, anyone with just a little bit of understanding of the web will be able to read it easily (F12 in most browsers).
As a website designer it is important for you to understand this concept as it is very important in deciding on how to design your web site. Things the user should not see need to happen on the server side (where no user can reach it). Only things which dont matter if anyone can see should be sent to the user.
I think you cannot hide the whole code. Or do you want to hide a specific part of the code? For example, you can hide/encrypt the URL of an iframe in which you can display more sensitive content. However, I also don't really know how to do it in WordPress (I searched for a solution many times) but I heard it's possible.
I'm curious what about the answers.
I wouldn't even consider doing this. Not only does it ruin the end-user's experience, it can actually stop people with disability copying text from your site or using other right-click menu items.
You can't hide your code, but you can obfuscate some of it.
CSS Obfuscater.
JSS Obfuscater.
My website lets people paste html into a textbox, which gets displayed later to other users. It occurred to me that they might want to display math formulas, and to do that in a browser, I would think the simplest method is MathML.
Is there a JavaScript library I can include in my pages that will make MathML render as formulas?
Thanks.
Sorry for the vague question name - didn't know how to phrase it.
I have built a PHP engine to parse web pages and extract phone numbers, addresses etc.
This is going to be used by clients to populate an address book by simply entering a new contacts web address.
The problem I am having is useability:
At the moment the script just adds each item (landline number, fax etc) to a different list box and the user picks the correct one - from a useability standpoint this is hard work (how do you know which is the correct contact number without looking at the site)
so my question (finally!)
How would achieve the functionality of
http://bartaz.github.io/sandbox.js/jquery.highlight.html
On someone else website (I have no problem writing this functionality).
FOR CLARITY**
I want to show someone elses site (their contact page for example) on my site BUT I want to highlight items I have found (so for example add a tag around a phone number my php script has found)
I am aware that to display a website not on your domain an iFrame would be used - but as I need to alter the page content this is useless.
I also contemplated writing a bookmarklet that could be run on that page - but that means re-writing my parsing engine in javascript and exposing some of my tricks to make it accurate.
So I am left with pulling the page by cURL and then trying to match up javascript files, css files etc. that have relative URLs
Does anyone know how best to achieve this - and any pitfalls that might befall me.
I have tried using simple html dom parser - but it is tricky to get consistency and I also dont know how having two sets of tags, body tags etc. would affect sites.
If anyone has managed this before and could point me to the tools / general methods they used I would be eternally grateful!
PLEASE NOTE - I am very proficient with google and stack-overflow and have looked there first!
The ideal HTML solution
The easiest way to work around the relative paths for an arbitrary site would be to use the base href tag to specify the default relative location (just use the url up to the filename, such as <base href="http://www.example.com/path/to/" /> for the URL http://www.example.com/path/to/page. This should go at the top of the head block.
Then you can alter the site simply by finding the relative parts and wrapping them in your own tag, such as a span. For the formatting of these tags, the easiest way would be to add a style attribute, but you could also try to insert a <style> tag in the <head>.
Of course, you'll also need to account for badly made webpages without <html>, <head> or <body> tags. You could either wrap the source in a new set of these tags, or just put in your base and style tags, hoping that the browser will work out what to do.
You probably also want to make this interactive, so you should also wrap them with some kind of link, and ideally you'll insert some javascript to handle their actions by ajax. You should also insert your own header at the top of the page, probably floating at the top, so that they know they're using your tool. Just keep in mind that some advanced pages might then conflict with your alterations (though for those cases you could have a link saying 'is this page not displaying correctly?' to take the user to your original basic listbox page as a backup).
The more robust solution
Clearly there are a lot of potential problems with the above, even though it is ideal. If you want to ensure robustness and avoid any problems with custom javascript and css on the page you're trying to alter, you could instead use a similar algorithm to that used in text based browsers such as lynx to reformat the page consistently. Then you can apply your algorithm to highlight the relevant parts of the page, and you can apply your own formatting as well without risk of it not displaying correctly. This way you can frame it really well and maintain your interface.
The problem with this is that you lose the actual look of the original page, but you should keep the context around the numbers and addresses which is the important thing. You would also then be able to use some dynamic javascript to take the user to each number and address consecutively to improve the user experience. Basically, this is rigorous and gives you complete control over the user experience, but you lose the original look of the website which may or may not confuse your users.
Personally, I'd go for the second option, but I'm not sure if anyone's created such a parser before. If not, the simplest thing you could do would be to strip the tags to get it as plain text. The next simplest would be to convert it into some simple text markup format like markdown, then convert it back into html. That way, you'd keep some basic layout such as headings, italicised and bold text, etc.
You definitely don't want to have nested body tags. It might work, but it'll probably mess up your formatting and be inconsistent across browsers.
Here's a resource I found after a quick Google search:
https://github.com/nickcernis/html-to-markdown
There are other html to markdown scripts, but this was the more robust from the few I found. I'm still not sure though whether it can handle badly formatted pages or ones with advanced formatting, try it out yourself.
There are quite a few markdown to html converters though, in fact you could probably make a custom converter yourself quite easily to accommodate your personal needs.
Is there is any way to hide asp.net page view source?
If you mean, can you hide your ASP.NET code: it's not visible in View Source.
If you mean can you hide your HTML: you can discourage casual peeking by creating your HTML on the fly via Javascript or AJAX, but a developer will always be able to see what you are doing, using simple tools like Firebug and Fiddler.
Edited to add:
I wasn't thinking of obfuscation (though that also discourages casual peeking), I was thinking of using javascript to pull down HTML. Doing a View Source will only show a bunch of <SCRIPT> tags.
But it appears his question has been revised to go in a different direction anyway, to can I keep people from downloading my images, and the answer to that is a simple no. Making money from small numbers of images is not a viable business model. (If you have thousands of images, that's another story.)
Edited to add:
The conventional way of making a catalog of photographs is to [a] show low-resolution previews, [b] put a watermark on each image (here's an example), or both.
Are you talking about ASP.NET or the result? Since ASP.NET is server-sided, it simply returns HTML. Basically, your ASP.NET file is processed by the server and variables and functions are converted into HTML. Your users can view the HTML but not the ASP.NET as it resides on server.
No, there is no way to hide the html source of a page. It's just not possible. There are tools that will promise the ability to do this, but don't believe them. Consider that it might not even be a traditional web browser that downloads the html.
What you can do is obfuscate it a bit, but even that is trivial to reverse.
No, you can't hide HTML, and there's no point either. There's nothing of value in the HTML. It would take maybe a couple hours for a skilled developer to replicate the look and feel of a website without even glancing at the HTML. In fact, it would probably be easier for him to do it his way.
The ASP/code-behind, however, already isn't visible. It's processed on the server and outputs HTML. Only the HTML (and CSS etc.) makes it to the client.
Reading the comments, it appears you want to prevent users from downloading your images. You can't really do that either. You can make it a lot more difficult for users to download them by embedding the images in Flash, or a Java applet, or something like that, but a determined thief could still decompile it and nab your image. Easier yet, he could just take a screenshot and save it out.
The best you can do is restrict access to the image to only certain users by making the image source point to a script instead that runs some validation before outputting the image.
This is not true you can hide source code. One way would be to write a loop that puts a 100k /n in the source code at the top. So it will push it so far down with white space that you can see it :-)
Where there is a problem there is a way.
And for all those who dont like this. Amazon used to hide there code somehow until sometime back.