I'm trying to get full html generated by SPA done in AngularJS.
$.get('http://localhost:3388/' + d.response.path, function (d) {
$('#templateContainer').html(d);
});
But it's retuning only the basic html structure, not the dynamic html which in SPA is generated by AJAX (I'm wondering if this is why SPA are not good for SEO).
I believe might exist some technique/trick to solve this problem. Chrome for example when you inspect elements it's able to render all html from AJAX.
Maybe I'm not using the right keywords on google. What people has been doing to workaround this problem?
UPDATE:
Just to be clear about my case. I'm trying to get the full html from this SPA to display to the user a template preview.
I have many different SPA with different templates. I want to display these live templates by AJAX instead IFRAME. With IFRAME works but isn't great.
You can generate a full page in a SPA, but that's not the goal you should not use a SPA in that case.
Because the goal of a SPA is to get only some pieces of the page and load them when necessary you should try a middleware if you care about SEO crawlers like prerender.io you can create you own server with it or use their services its open source.
but generating a full page in SPA is killing all the reasons why you should use a SPA. Best Regards :)
You'll get the raw html because the server doesn't render your app and this is the expected behaviour, there is no bug.
S.P.A are usually ClientSide applications, it means that the browser have to render at rountime the AngularApp!
Of course the browser no renders anything because your code is async-injected into the dom, so, you need to program the AngularJS bootstrapping manually not through the ng-app directive.
by the way, there are many ways to have a server-side rendering of your app, have a look on https://prerender.io/...
If your goal is have a good indexing... loading the app via jQuery is a bad solution because search-engine-crawlers aren't able to process javascript, this is not a angular specific issue, each app built in javascript has this problem.
The best solution should be have a full angular-app and, only for crawlers, implement a server-side prerendering (using prerender.io).
Hope helps!
Related
I'm an android developer and about two years and recently I've been thinking about building web applications. So I started researching about spring boot and everything is great. Then, I came across this thing called template engines (thymeleaf) which by definition separate your code from presentation.
What is confusing me is how can a backend server have html? should the presentation be handled by html, css and javascript in the front end? I even saw tutorials where they actually type in html code in their controller as return values.
My understanding is that the backend server exposes APIs for the frontend to use by using AJAX and the frontend will manipulate this data and present the information on the screen, why would a backend provide html code?
THank you
the frontend will manipulate this data
What front end? You mean the JavaScript code in the HTML page? Where did that come from? Oh yeah, the server.
It is the server that serves the HTML pages to the client, as well as any .js and .css files.
The server could provide static pages, and anything dynamic is handled by JavaScript. Or, the server could dynamically build the HTML page, using ... you guessed it ... a template engine.
You generally don't want JavaScript to build the page initially, just to use JavaScript for handling any dynamic behavior. Some pages don't even need any dynamic behavior.
Unless of course you're thinking about single-page applications (SPA), where there is only one root HTML page, and everything else is built client-side with JavaScript and AJAX calls, but most web applications are not SPAs.
Thymeleaf replaces JSP by providing HTML pages using a template engine. The controller requests the HTML file and Spring Boot provides that template after building it using the Model provided.
Thymeleaf is great because it allows you to rebuild templates on the fly. Lets say for example you're showing a users points to them on the front end, but maybe the points increase or decrease.
What you can do is build the template in the background using a Model. The Model reference is magically provided to the template provided which parses it.
#RequestMapping(...)
public String request(Model model) {
model.put("points", 5);
return "my-template.html"
}
Then use the Thymeleaf language to provide your object to the HTML file to be processed in the engine during runtime.
<html..>
<head>...</head>
<body>
<h1 th:text="${points}"></h1>
</html>
Spring Boots Template engine will build this in the background and present it to the user, but it will show the actual points to the end user! Hope this helps a tiny bit.
I know this question has been answered pretty effectively so far, but I want to add my two cents in as I work with Thymeleaf often.
The easiest way to think about a template engine is that it allows some dynamic development of html based on information passed to it by the controller method. This allows you to put logic in that normally wouldn't exist, or say display a certain section if a user is perhaps logged into admin.
If web pages were houses, html is the frame, css is the walls, Javascript is the lights and electricity, and a template engine would pretty much just be the architect designing the plans on the fly before the frame was built based on desires of the buyer of the house (user inputs).
OK, newer Apps and websites may just load/boot/open once and then pull or push data via AJAX requests, this is good, it saves traffic and it is fast.
But it has not always been like this and some Frameworks still don't build everything on small requests. Spring in Java or Symfony in PHP are MVC Frameworks and use template engines to build pages. This may sound a little outdated but there is still a lot of websites using this.
And if you build a web app for a customer with slow PCs or other devices and the page contents are performance heavy you might want to do as much work as possible on the server so that the user does not have to wait for long. And also you can cache the rendered pages. There is even server side rendering of react pages to e.g. make the initial page load faster...
With Java and Spring I did just use JSP I don't know thymeleaf. Just use what you like and maybe what is most supported/documented.
And building websites like this does not mean you cannot use AJAX, but if you use templates you need to think about what makes sense.
What is confusing me is how can a backend server have html?
The "back end" must have HTML, because that's what's delivered to, and rendered by, the client.
It may just be that the back end "server" is just a CDN delivering, say, an HTML/JS SPA, but there's still something delivering content to the browser.
That said: server-side rendering is still a thing, and has had a resurgence lately--a React app may have its initial rendering done on the server so again the client gets a page rendered with both HTML and associated data, and then starts acting like a normal SPA.
My understanding is that the backend server exposes APIs for the frontend to use by using AJAX and the frontend will manipulate this data and present the information on the screen, why would a backend provide html code?
Because something needs to run the JS to access those APIs.
Some history:
Browsers used to suck. JS used to be a neat add-on, sites were relatively static, and essentially all rendering was done on the server. The back end would get data from wherever it got data from and generate complete HTML pages, and there was little happening on the client side other than some form fields, maybe some validation, and that was about the extent of it.
How can I screen scrape a multi page application? I want to do this using Javascript. Here are the approaches I have considered and the problems I have encountered.
Using the Fetch web API in a Node application to get the web pages
Problem: The web pages won't load properly when being fetched. I guess all javascript on the page does not run when the page is fetched.
Running JavaScript from the console
This is a very simple way to inject JavaScript straight into the document. But one problem is that opening the web page is a browser and pasting into the console is manual work. Another problem is that while this works for single page application it becomes very cumbersome for multi-page applications.
What better approach exists that solves the problems I have encountered?
Depends on what are you doing. If you just want to get some that from some website then injecting JS in the page is the way to go.
But as you said it's manual work from which I deduce you want to scrape the sites and save the data maybe. In this case a service side script is better suited. To fix the problem with the JavaScript not being loaded you can use things like PhantomJs or Horseman.
Take a look at this: https://medium.com/#designman/building-a-performant-web-scraper-in-node-js-5f4449674163
If you want to save website content (html, js, css files, images) to file system you can take a look on website-scraper package for nodejs https://www.npmjs.com/package/website-scraper
It also has plugin for PhantomJS which allows to handle single page applications
I have a news section created with ReactJs, each news post acts as an individual page.
Unfortunately Google is not indexing these pages because of REactJs. I tried to use the babel-polyfill webpack, but it's still not working. Also, I'm making my Ajax call BEFORE rendering the DOM.
Any other ideas for another workaround on this?
the google crawler won't wait for async requests to resolve, and because your pages are rendered on the users' client, they will appear to be empty pages.
You have two options. Either modify your app to render on the server, this is often called a universal app, or an isomorphic app. There are many tutorials for creating these. The other option is to pre-render static html from your code so that the crawler can see what should be there. There are numerous libraries you can use for this.
The first option is more extensible and probably preferable, but a bit more complex. So make the choice about whats more appropriate for you
It is not indexing them because they are bundled so tat they could be rendered inside your client's browser. What you ought to do is server side rendering.
You can find more about it here: https://medium.freecodecamp.org/server-side-rendering-your-react-app-in-three-simple-steps-7a82b95db82e
We have a web app that its content generated by javascript. Can google index those pages?
When we investigate this issue we always found solutions from old pages about using "#!" in links.
In our app the links are like this:
domain.com/paris
domain.com/london
When we use these kind of links, javascript populates content.
Is it wise to use HTML snapshot or do you have any other suggestions?
Short answer
Yes they can crawl JavaScript generated content, as long as you are using pushstates.
Detailed answer
It depends on your setup. Google and Bing CAN crawl javascript and AJAX based content if your are using pushstates. If you do they will handle content coming from AJAX calls, updates to page title or meta tags using javascript, and in general any such things.
Most frontend frameworks like Angular, Ember or Backbone already works with pushstates so in these cases you don't need to do anything. Check whatever system you are using to see how they do things. If you are not using pushstates you will need to implement it on your own or use the whole escapted_fragment html snapshot deal.
So if you use pushstate then yes, search engines can crawl your page just fine. If you don't then no, you will need to implement pushstates or do HTML snapshots.
Bonus info - Unfortunately Facebook does not handle pushstates, so the facebook crawler needs either non-dynamic og-tags or HTML snapshots.
"Generated by JavaScript" is ambiguous. That could mean that you are running a JS script on the server or it could mean that you are making an AJAX call with a JS API. The difference appears to matter as far as Googlebot is concerned. But you don't have to take my word for it, as there is empirical proof of what Googlebot will and won't currently cache as far as JavaScript content in the form of live experiments using both the XMLHTTPRequest API and the Fetch API. So, as you can see, server-side rendering is still going to be the best way to go for SEO.
I have two HTML files: One acts as a template, supplying the navigation, sidebars, etc., and the other has the main content. I'm trying to figure out how best to insert the content into the template. I need persistent URLs, so my plan was to have the content page essentially replace itself with the template, plugging the text back into the resulting page. I'm really new to front-end programming and I'm suspicious that this may be an anti-pattern, so my first question is about whether I'm barking up the right tree. The problem seems universal, and I'm sure there must be a best practice, though I haven't yet seen it discussed. If this is an acceptable way to proceed, then what JavaScript function would allow me to access the HTML of two different pages at the same time?
[EDIT: It's a small page on GitHub]
Do not do this. At current implementation HTML is not designed to be template engine. You can use HTML import but it has not full support in browsers. (compatibility table).
Usually this problem can be solved with:
Use frontend framework. Libraries like angular.js or polymer.js (and so on) usually has support of importing HTML documents in different forms.
Build your application HTML. Task runners like grunt.js usually has plugin that includes HTML.
Use server side technologies to extend your HTML from base layouts
If your application have to be consisted from different HTMLs I recommend you to try polymer. It is polyfill for web components and designed to work in such way by default.
UPD:
About edit to your question. It seems like you just need template engine for HTML. You can google for it. I use nunjucks - javascript port of python's template engine jinja2. It is lightweight, simple and can be compiled right in browser.
Another way is to use special tools for building static web pages. You have mentioned that your page is blog build from simple HTML pages. Try to use pelican. It is the static websites (blogs) generator. It is simple and fast. You can build your HTML even on your local machine and just push your HTML pages to github.