dynamically generated invalid html

dynamically generated invalid html - javascript

I know that invalid html can cause some seo-issues. But that only applies on the html-source right?
what about html that is served validly to the client, but then via some fancy js it gets manipulated into a new structure with markup violations?
Example. I have an unordered list with several li-elements, but want them separated in clusters to be displayed on a row. So once the user performs a certain action the ul includes several divs (class="liCluster") that contains the original list.
I know it's not a really swanky way to do it, but is there actually some serious problem with that, that I might not see yet?
At least it looks fine so far from a client's point of view...

It looks fine because most browsers are smart enough to know what you're trying to do. However, This takes the browser more time do do.
Most search engines are also smart enough to know what you're trying to do. Although search engines don't care much for markup, rather they care about content and how that content relates to different parts of your website. A poorly formatted ul of lis is not likely to make a massive impact in SEO.
That being said, why would you put DIVs in a ul if they're not wrapped in lis? Most browsers will remove the offending markup and you're going to make more trouble for yourself, Just learn how to use CSS and realise that you can then do everything you need to do and you can do it the correct way.

Related

Should I inject style tags into the head dynamically or include include style tags in the body?

I have some html content that gets embedded into a page via a server side call. So, when the page's html is being compiled on the server, a call is made to another server to return some html, which is then embedded within a div somewhere in the body. The problem is, this content contains it's own css. So, I wrote a script to inject style tags into the HEAD on ready, which works fine on desktop browsers. However, on mobile devices there's a fairly significant flash of unstyled content. I know that you're technically not supposed to include style tags in the body, but in this case would it yield better results to just include them in the body instead of injecting them into the head?

In this case, it sounds like the right solution is to fix up your architecture so that the server-side compiler can include CSS for the remote page in the page head. This probably involves separating the CSS of the remote page(s) out of the markup there and then grabbing it as a separate file to be included in the page head during compilation.
Since the right solution is not always feasible given a myriad reasons, compromise is often required. Leaving the CSS in the remote markup, if it produces the result you desire, could be the best solution for you. Or perhaps some other hack to get the CSS into the head server-side could be appropriate. You need to decide if it is worth the effort to do any of these things, if they are possible for you to accomplish given your constraints.

Some discussion here. In my experience a lot of enterprise content does it. Does that mean it's the RIGHT thing to do? I dont know. But it's certainly not frowned upon in my experience.
Source: https://www.w3.org/wiki/The_web_standards_model_-_HTML_CSS_and_JavaScript
Why separate?
Efficiency of code: The larger your files are, the longer they will take to download, and the more they will cost some people to view (some people still pay for downloads by the megabyte.) You therefore don’t want to waste your bandwidth on large pages cluttered up with styling and layout information in every HTML file. A much better alternative is to make the HTML files stripped down and neat, and include the styling and layout information just once in a separate CSS file. To see an actual case of this in action, check out the A List Apart Slashdot rewrite article where the author took a very popular web site and re-wrote it in XHTML/CSS.
Ease of maintenance: Following on from the last point, if your styling and layout information is only specified in one place, it means you only have to make updates in one place if you want to change your site’s appearance. Would you prefer to update this information on every page of your site? I didn’t think so.
Accessibility: Web users who are visually impaired can use a piece of software known as a “screen reader” to access the information through sound rather than sight — it literally reads the page out to them, and it can do a much better job of helping people to find their way around your web page if it has a proper semantic structure, such as headings and paragraphs. In addition keyboard controls on web pages (important for those with mobility impairments that can't use a mouse) work much better if they are built using best practices. As a final example, screen readers can’t access text locked away in images, and find some uses of JavaScript confusing. Make sure that your critical content is available to everyone.
Device compatibility: Because your HTML/XHTML page is just plain markup, with no style information, it can be reformatted for different devices with vastly differing attributes (eg screen size) by simply applying an alternative style sheet — you can do this in a few different ways (look at the [mobile articles on dev.opera.com] for resources on this). CSS also natively allows you to specify different style sheets for different presentation methods/media types (eg viewing on the screen, printing out, viewing on a mobile device.)
Web crawlers/search engines: Chances are you will want your pages to be easy to find by searching on Google, or other search engines. A search engine uses a “crawler”, which is a specialized piece of software, to read through your pages. If that crawler has trouble finding the content of your pages, or mis-interprets what’s important because you haven’t defined headings as headings and so on, then your rankings in relevant search results will probably suffer.
It’s just good practice: This is a bit of a “because I said so” reason, but talk to any professional standards-aware web developer or designer, and they’ll tell you that separating content, style, and behaviour is the best way to develop a web application.
Additional stackoverflow articles:
Using <style> tags in the <body> with other HTML
Will it be a wrong idea to have <style> in <body>?

Altering a page from another site

Sorry for the vague question name - didn't know how to phrase it.
I have built a PHP engine to parse web pages and extract phone numbers, addresses etc.
This is going to be used by clients to populate an address book by simply entering a new contacts web address.
The problem I am having is useability:
At the moment the script just adds each item (landline number, fax etc) to a different list box and the user picks the correct one - from a useability standpoint this is hard work (how do you know which is the correct contact number without looking at the site)
so my question (finally!)
How would achieve the functionality of
http://bartaz.github.io/sandbox.js/jquery.highlight.html
On someone else website (I have no problem writing this functionality).
FOR CLARITY**
I want to show someone elses site (their contact page for example) on my site BUT I want to highlight items I have found (so for example add a tag around a phone number my php script has found)
I am aware that to display a website not on your domain an iFrame would be used - but as I need to alter the page content this is useless.
I also contemplated writing a bookmarklet that could be run on that page - but that means re-writing my parsing engine in javascript and exposing some of my tricks to make it accurate.
So I am left with pulling the page by cURL and then trying to match up javascript files, css files etc. that have relative URLs
Does anyone know how best to achieve this - and any pitfalls that might befall me.
I have tried using simple html dom parser - but it is tricky to get consistency and I also dont know how having two sets of tags, body tags etc. would affect sites.
If anyone has managed this before and could point me to the tools / general methods they used I would be eternally grateful!
PLEASE NOTE - I am very proficient with google and stack-overflow and have looked there first!

The ideal HTML solution
The easiest way to work around the relative paths for an arbitrary site would be to use the base href tag to specify the default relative location (just use the url up to the filename, such as <base href="http://www.example.com/path/to/" /> for the URL http://www.example.com/path/to/page. This should go at the top of the head block.
Then you can alter the site simply by finding the relative parts and wrapping them in your own tag, such as a span. For the formatting of these tags, the easiest way would be to add a style attribute, but you could also try to insert a <style> tag in the <head>.
Of course, you'll also need to account for badly made webpages without <html>, <head> or <body> tags. You could either wrap the source in a new set of these tags, or just put in your base and style tags, hoping that the browser will work out what to do.
You probably also want to make this interactive, so you should also wrap them with some kind of link, and ideally you'll insert some javascript to handle their actions by ajax. You should also insert your own header at the top of the page, probably floating at the top, so that they know they're using your tool. Just keep in mind that some advanced pages might then conflict with your alterations (though for those cases you could have a link saying 'is this page not displaying correctly?' to take the user to your original basic listbox page as a backup).
The more robust solution
Clearly there are a lot of potential problems with the above, even though it is ideal. If you want to ensure robustness and avoid any problems with custom javascript and css on the page you're trying to alter, you could instead use a similar algorithm to that used in text based browsers such as lynx to reformat the page consistently. Then you can apply your algorithm to highlight the relevant parts of the page, and you can apply your own formatting as well without risk of it not displaying correctly. This way you can frame it really well and maintain your interface.
The problem with this is that you lose the actual look of the original page, but you should keep the context around the numbers and addresses which is the important thing. You would also then be able to use some dynamic javascript to take the user to each number and address consecutively to improve the user experience. Basically, this is rigorous and gives you complete control over the user experience, but you lose the original look of the website which may or may not confuse your users.
Personally, I'd go for the second option, but I'm not sure if anyone's created such a parser before. If not, the simplest thing you could do would be to strip the tags to get it as plain text. The next simplest would be to convert it into some simple text markup format like markdown, then convert it back into html. That way, you'd keep some basic layout such as headings, italicised and bold text, etc.
You definitely don't want to have nested body tags. It might work, but it'll probably mess up your formatting and be inconsistent across browsers.
Here's a resource I found after a quick Google search:
https://github.com/nickcernis/html-to-markdown
There are other html to markdown scripts, but this was the more robust from the few I found. I'm still not sure though whether it can handle badly formatted pages or ones with advanced formatting, try it out yourself.
There are quite a few markdown to html converters though, in fact you could probably make a custom converter yourself quite easily to accommodate your personal needs.

Display DOM node in multiple places w/o cloning/copying

Disclaimer:
I've blathered on kind-of excessively here in an attempt to provide enough context to pre-empt all questions you folks might have of me. Don't be scared by the length of this question: much of what I've written is very skim-able (especially the potential solutions I've come up with).
Goal:
The effect I'm hoping to achieve is displaying the same element (and all descendants) in multiple places on the same page. My current solution (see below for more detail) involves having to clone/copy and then append in all the other places I want it to appear in the DOM. What I'm asking for here is a better (more efficient) solution. I have a few ideas for potentially more efficient solutions (see below). Please judge/criticize/dismiss/augment those, or add your own more-brilliant-er solution!
"Why?" you ask?
Well, the element (and it's descendants) that I'm wanting to display more than once potentially has lots of attributes and contents - so cloning it, and appending it someplace else (sometimes more than one other place) can get to be quite a resource-hogging DOM manipulation operation.
Some context:
I can't describe the situation exactly (damn NDA's!) but essentially what I've got is a WYSIWYG html document editor. When a person is editing the DOM, I'm actually saving the "original" node and the "changed" node by wrapping them both in a div, hiding the "original" and letting the user modify the new ("changed") node to their heart's content. This way, the user can easily review the changes they've made before saving them.
Before, I'd just been letting the user navigate through the "diff divs" and temporarily unhiding the "original" node, to show the changes "inline". What I'm trying to do now is let the user see the whole "original" document, and their edited ("changed") document in a side-by-side view. And, potentially, I'd like to save the changes through multiple edit sessions, and show 'N' number of versions side-by-side simultaneously.
Current Solution:
My current solution to achieve this effect is the following:
Wrap the whole dang dom (well, except the "toolbars" and stuff that they aren't actually editing) in a div (that I'll call "pane1"), and create a new div (that I'll call "pane2"). Then deep-clone pane1's contents into pane2, and in pane1 only show the "original" nodes, and in pane2 only show the "changed" nodes (in the diff regions - everything outside of that would be displayed/hidden by a toggle switch in a toolbar). Then, repeat this for panes 3-through-N.
Problem with Current Solution:
If the document the user is editing gets super long, or contains pictures/videos (with different src attributes) or contains lots of fancy styling things (columns, tables and the like) then the DOM can potentially get very large/complex, and trying to clone and manipulate it can make the browser slow to a crawl or die (depending on the DOM's size/complexity and how many clones need to be made as well as the efficiency of the browser/the machine it's running on). If size is the issue I can certainly do things like actually remove the hidden nodes from the DOM, but that's yet more DOM manipulation operations hogging resources.
Potential Solutions:
1. Find a way to make the DOM more simple/lightweight
so that the cloning/manipulating that I'm currently doing is more efficient. (of course, I'm trying to do this as much as I can anyway, but perhaps it's all I can really do).
2. Create static representations of the versions with Canvas elements or something.
I've heard there's a trick where you can wrap HTML in an SVG element, then use that as an image source and draw it onto a canvas. I'd think that those static canvasses (canvi?) would have a much smaller memory footprint than cloned DOM nodes. And manipulating the DOM (hiding/showing the appropriate nodes), then drawing an image (rinse & repeat) should be quicker & more efficient than cloning a node and manipulating the clones. (maybe I'm wrong about that? Please tell me!)
I've tried this in a limited capacity, but wrapping my HTML in SVG messes with the way it's rendered in a couple of weird cases - perhaps I just need to message the elements a bit to get them to display properly.
3. Find some magic element
that just refers to another node and looks/acts like it without being a real clone (and therefore being somehow magically much more lightweight). Even if this meant that I couldn't manipulate this magic element separately from the node it's "referencing" (or its fake children) - in that case I could still use this for the unchanged parts, and hopefully shave off some memory usage/DOM Manipulation operations.
4. Perform some of the steps on the server side.
I do have the ability to execute server side code, so maybe it's a lot more efficient (some of my users might be on mobile or old devices) to get all ajax-y and send the relevant part of the DOM (could be the "root" of the document the user is editing, or just particularly heavy "diff divs") to the server to be cloned/manipulated, then request the server-manipulated "clones" and stick 'em in their appropriate panes/places.
5. Fake it to make it "feel" more efficient
Rather than doing these operations all in one go and making the browser wait till the operations are done to re-draw the UI, I could do the operations in "chunks" and let the browser re-render and catch a breather before doing the next chunk. This probably actually would result in more time spent, but to the casual user it might "feel" quicker (haha, silly fools...). In the end, I suppose, it is user experience that is what's most important.
Footnote:
Again, I'm NDA'd which prevents me from posting the actual code here, as much as I'd like to. I think I've thoroughly explained the situation (perhaps too thoroughly - if such a thing exists) so it shouldn't be necessary for you to see code to give me a general answer. If need be, I suppose I could write up some example code that differs enough from my company's IP and post it here. Let me know if you'd like me to do that, and I'll be happy to oblige (well, not really, but I'll do it anyway).

Take a look at CSS background elements. They allow you to display DOM nodes elsewhere/repeatedly. They are of course they are read-only, but should update live.
You may still have to come up with a lot of magic around them, but it is a similar solution to:
Create static representations of the versions with Canvas elements or something.
CSS background elements are also very experimental, so you may not get very far with them if you have to support a range of browsers.

To be honest, after reading the question I almost left thinking it belongs in the "too-hard-basket", but after some thought perhaps I have some ideas.
This is a really difficult problem and the more I think about it the more I realise that there is no real way to escape needing to clone. You're right in that you can create an SVG or Canvas but it won't look the same, though with a fair amount of effort I'm sure you can get quite close but not sure how efficient it will be. You could render the HTML server-side, take a snapshot and send the image to the client but that's definitely not scalable.
The only suggestions I can think of are as follows, sorry if they are long-winded:
How are you doing this clone? If you're going through each element and as you go through each you are creating a clone and copying the attributes one by one then this is heaavvvyy. I would strongly suggest using jQuery clone as my guess is that it's more efficient than your solution. Also, when you are making structural changes it might be useful to take advantage of jQuery's detach/remove (native JS: removeChild()) methods as this will take the element out of the DOM so you can alter it before reinserting.
I'm not sure how you'v got your WYSIWYG, but avoid using inputs as they are heavy. If you must then I'm assuming they don't look like inputs so just swap them out with another element and style (CSS) to match. Make sure you do these swaps before you reinsert the clone in to the DOM.
Don't literally put video at the time of showing the user comparisions. The last thing we want to do is inject 3rd party objects in to the page. Use an image, you only have to do it while comparing. Once again, do the swap before inserting the clone in to the DOM.
I'm assuming the cloned elements won't have javascript attached to them (if there is then remove it, less moving parts is more efficiency). However, the "changed" elements will probably have some JS events attached so perhaps remove them for the period of comparision.
Use Chrome/FF repaint/reflow tools to see how your page is working when you restructure the DOM. This is important because you could be doing some "awesome" animations that are costing you intense resources. See http://paulirish.com/2011/viewing-chromes-paint-cycle/
Use CSS over inline styling where possible as modern browsers are optimised to handle CSS documents
Can you make it so your users use a fast modern browser like Chrome? If it's internal then might be worth it.
Can you do these things in Silverlight or Adobe Air? These objects get special resource privileges, so this will most likely solve your problem (according to what I'm imagining the depth of the problem is)
This one is a bit left-field but could you open in another window? Modern browsers like Chrome will run the other window in its own process which may help.
No doubt you've probably looked in to these things more than I but good luck with it. Would be curious how you solved it.

You may also try: http://html2canvas.hertzen.com/
If it works for you canvas has way better support.

In order to get your "side-by-side" original doc and modified doc.. rather than cloning all pane1 into pane2.. could you just load the original document in an iframe next to the content you are editing? A lot less bulky?
You could tweak how the document is displayed when it's in an iframe (e.g. hide stuff outside editable content).
And maybe when you 'save' a change, write changes to the file (or a temp) and open it up in a new iframe? That might accomplish your "multiple edit sessions"... having multiple iframes displaying the document in various states.
Just thinking out loud...
(Sorry in advance if I'm missing/misunderstanding any of your goals/requirements)

I don't know if it's already the case for you but you should consider using jQuery library as it allows to perform different kinds of DOM elements manipulation such as create content and insert it into several elements at once or select an element on the page and insert it into another.
Have a look on .appendTo(), .html(), .text(), .addClass(), .css(), .attr(), .clone()
http://api.jquery.com/category/manipulation/
Sorry if I'm just pointing out something you already know or even work with but your NDA is in the way of a more accurate answer.

Maintain height of the website

I have a client who wants to do a website with specific height for the content part.
The Question:
Is there any way that when the text is long / reach the maximum height of the content part, then a new page is created for the next text.
Within my knowledge, somehow I know this can't be done.
Thanks for helping guys!

You will probably want to look into something like jQuery paging with tabs
http://code.google.com/p/jquery-ui-tabs-paging/
Unfortunately you would need to figure out the maximum number of characters you want to allow in the content pane and anything after that would need to be put into another tab. You can hide the tab and use just a link instead.

Without more knowledge on what you're development is, this is a difficult question to answer. Are you looking to create a different page entirely, or just different sections on a page?
The former can be done using server-side code (e.g. Rails), and dynamically serving out pages (e.g. Google results are split across many page).
The latter can be done with Javascript and/or CSS. A simple example is:
<div id="the_content" style="overflow:hidden;width:200px;height:100px">
Some really long text...
</div>
This would create a "scroll" bar and just not disrupt the flow of the page. In Javascript (e.g. JQuery), you'll be able to split the content into "tabs".
Does this help?

(Almost) everything is possible, but your intuitions are right in that this can't be done easily or in a way that makes any sense.
If I were in your position, I would go up to the client and present advantages and disadvantages to breaking it up. Advantages include the fact that you'd be able to avoid long pages and that with some solutions to this problem, the page will load faster. Disadvantages include the increased effort (i.e., billable hours) it would take to accomplish this, the lack of precedent for it resulting in users being confused, and losses to SEO (you're splitting keywords amongst n pages).
This way, you're not shooting down the client's idea, and in the likely case the client retreats from his position, he will go away thinking that he's just made a smart choice by himself and everyone goes away happy.
If you're intent on splitting it up into pages, you can do it on the backend by either literally structuring your content into pages or applying some rule (e.g., cut a page off at the first whole paragraph after 1000 characters) to paginate the results. On the frontend, you could use hashtags to allow Javascript to paginate the results. You could even write an extensible library that "paginates" any text node. In fact, I wouldn't be surprised if one didn't exist already.

Generating/selecting non-standard HTML tags with jQuery, a good idea?

I've noticed that jQuery can create, and access non-existent/non-standard HTML tags. For example,
$('body').append('<fake></fake>').html('blah');
var foo = $('fake').html(); // foo === 'blah'
Will this break in some kind of validation? Is it a bad idea, or are there times this is useful? The main question is, although it can be done, should it be done?
Thanks in advance!

You can use non-standard HTML tags and most of the browsers should work fine, that's why you can use HTML5 tags in browsers that don't recognize them and all you need to do is tell them how to style them (particularly which tags are display: block). But I wouldn't recommend doing it for two reasons: first it breaks validation, and second you may use some tag that will later get added to HTML and suddenly your page stops working in newer browsers.

The biggest issue I see with this is that if you create a tag that's useful to you, who's to say it won't someday become standard? If that happens it may end up playing a role or get styles that you don't anticipate, breaking your code.

The rules of HTML do say that if manipulated through script the result should be valid both before and after the manipulation.
Validation is a means to an end, so if it works for you in some way, then I wouldn't worry too much about it. That said, I wouldn't do it to "sneak" past validation while using something like facebook's <fb:fan /> element - I'd just suck it up and admit the code wasn't valid.

HTML as such allows you to use any markup you like. Browsers may react differently to unknown tags (and don't they to known ones, too?), but the general bottom line is that they ignore unknown tags and try to render their contents instead.
So technically, nothing is stopping you from using <fake> elements (compare what IE7 would do with an HTML5 page and the new tags defined there). HTML standardization has always been an after-the-fact process. Browser vendors invented tags and at some point the line was drawn and it was called HTMLx.
The real question is, if you positively must do it. And if you care whether the W3C validator likes your document or not. Or if you care whether your fellow programmers like your document or not.
If you can do the same and stay within the standard, it's not worth the hassle.

There's really no reason to do something like this. The better way is to use classes like
<p class = "my_class">
And then do something like
$('p.my_class').html('bah');
Edit:
The main reason that it's bad to use fake tags is because it makes your HTML invalid and could screw up the rendering of your page on certain browsers since they don't know how to treat the tag you've created (though most would treat it as some kind of DIV).
That's the main reason this isn't good, it just breaks standards and leads to confusing code that is difficult to maintain because you have to explain what your custom tags are for.
If you were really determined to use custom tags, you could make your web page a valid XML file and then use XSLT to transform the XML into valid HTML. But in this case, I'd just stick with classes.

We Keep Coding

JavaScript is the programming language of the Web.