I am an amateur programmer and I want to make a mulitlingual website. My questions are: (For my purposes let the English website be website nr 1 and the Polish nr 2)
Should it be en.example.com and pl.example.com or maybe example.com/en and example.com/pl?
Should I make the full website in one language and then translate it?
How to translate it? Using XML or what? Should website 1 and website 2 be different html files or is there a way to translate a html file and then show the translation using XML or something?
If You need any code or something tell me. Thank You in advance :)
1) I don't think it makes much difference. The most important thing is to ensure that Google can crawl both languages, so don't rely on JavaScript to switch between languages, have everything done server side so both languages can be crawled and ranked in Google.
2) You can do one translation then the other, you just have to ensure that the layout expands to accommodate more/less text without breaking it. Maybe use lorem ipsum whilst designing.
3) I would put the translations into a database and then call that particular translation depending on whether it is EN or PL in the domain name. Ensure that the webpage and database are UTF-8 encoding otherwise you will find that you get 'funny' characters being displayed.
My Advice is that you start to use any Framework.
For instance if you use CakePHP then you have to write
__('My name is')
and in translate file
msgid "My name is"
msgstr "Nazywam się"
Then you can easy translate to any other language and its pretty easy to implement.
Also if you do not want to use Framework you can check this link to see example how it works:
http://tympanus.net/codrops/2009/12/30/easy-php-site-translation/
While this question probably is not a good SO question due to its broad nature. It might be relevant to many users.
My approach would be templates.
Your suggestion of having two html files is bad for the obvious reason of duplication- say you need to change something in your site. You would always need to change two html files- bad.
Having one html file and then parsing it and translating it sounds like a massive headache.
Some templating framework could help you massively. I have been using Smarty, but that's a personal choice and there are many options here.
The idea is you make a template file for your html and instead of actual content you use labels. Then in your php code you include the correct language depending on cookies, user settings or session data.
Storing labels is another issue here. Storing them in a database is a good option, however, remember you do not wish to make 100's of queries against a database for fetching each label. What you can do is store them in a database and then have it generate a language file- an array of labels->translations for faster access and regenerate these files whenever you add/update labels.
Or you can skip the database altogether and just store them in files, however, as these grow they might not be as easy to maintain.
I think the easiest mistake for an "amateur programmer" to make in this area is to allow the two (or more) language versions of the site to diverge. (I've seen so-called professionals make that mistake too...) You need to design it so everything that's common between the two versions is shared, so that when the time comes to make changes, you only need to make the changes in one place. The way you do this will depend on your choice of tools, and I'm not going to start advising on that, because it depends on many factors, for example the extent to which your content is database-oriented.
Closely related to this are questions of who is doing the translation, how the technical developers and the translators work together, and how to keep track of which content needs to be re-translated when it changes. These are essentially process questions rather than coding questions, so not a good fit for SO.
Don't expect that a site for one language can be translated without any technical impact; you will find you have made all sorts of assumptions about the length of strings, the order of fields, about character coding and fonts, and about cultural things like postcodes, that turn out to be invalid when you try to convert the site to a different language.
You could make 2 different language files and use php constants to define the text you want to translate for example:
lang_pl.php:
define("_TEST", "polish translation");
lang_en.php:
define("_TEST", "English translation");
now you could make a choice for the user between polish or english translation and based on that you can include the language file.
So if you would have a text field you put its value to _TEST (inbetween php brackets).
and it would show the translation of the chosen option.
The place i worked was doing it like this: they didn't have to much writing on their site, so they were keeping it in a database. As your tags have a php , I assume you know how to use databases. They had a mysql table called languages with language id(in your case 1 for en and 2 for pl) and texts in columns. So the column names were main heading, intro_text, about_us... when the user comes it selects the language and a php request get the page with that language. This is easy because your content is static and can be translated before site gets online, if your content is dynamic(users add content) you may need to use a translation machine because you cannot expect your users to write their entry in all languages.
Related
I have multiple datasets here that i took from Kaggle. There are multiple csv files and each csv file is made specifically for sit, stand, walking, running etc. The data is taken from sensors like accelerometers and gyroscopes. The values in datasets are of axis like x, y and z.
Sample Data
Here is a sample dataset of jogging. Now i need to make classifiers in my program so that my program can detect itself whether the data is of jogging, sitting, standing etc. I want to mix all the datasets in a single csv file and then upload it into my webapge and then i want the javascript code to start detecting whether a particular row is of sitting, standing, jogging etc. I don't want any code help but instead i just need a little explanation or a way to start coding it. How can i started making such classifier? I know it is kind of broad question but i think i have tried to explain myself in best way possible. Once my program has detected every row with specific activity it will count all the activities separately and then show it in a table format in webpage.
In order to answer properly to your question, it would be very helpful to know which is your level of understanding and experience with machine learning.
If you are a beginner I would suggest to try to run and understand a couple of tutorials that can be easily found on the web.
If you need an idea of which is the "standard" approach for machine learning development, I will try to give you a general idea of the process.
You can summarize the process in these main steps:
Data pre-processing-> Data splitting -> Feature selection -> Model Training -> Validation -> Deployment
Data pre-processing is meant to clean and format the data: removing NA values, decision about categorical variables, outlier analysis,.... This is a complex step that depends on the application. In your case I would start checking that the data in the different data-sets are homogeneous, i.e. the features have the same meaning across csv and corresponding features respect the same distribution. While the meaning of each feature should be explained in the description of your csv, the check of the distributions could be easily done plotting the box-plots for each feature and csv. If distributions of the same feature across different csv files don't overlap you should investigate further the issue.
An important step in the design of a good model is the splitting of the data. You should split your data in training/validation set (training/validation/test for a more comprehensive approach). This step allows you to train your model on the training set and test the model on the validation set computing unbiased performance of your model. I suggest here to become familiar with concepts as: Cross Validation, stratified-cross-validation, nested-cross-validation for hyper-parameter tuning, overfitting, bias,.... The Validation of the model will give you an idea of the expected performance that it will have on unseen data. If you are considering the use of more than one model, you can use the validation results to choose the "best" one. I suggest here a comparison using the confidence interval or if possible a significance test (e.g t-test, anova,...). Before the deployment the model is trained on all the available data.
The choice of the model depends on the data that you are using: number of samples, number of features, type of variable (numerical, categorical),....
I'm not an expert of javascript, but I believe (just a feeling) that python and R are more common choices for developing Machine learning applications. Both have libraries specifically developed for the task and you can find a lot of materials and tutorial around.
With a bit of more context I think that I could be more specific.
I hope it helps
Is there a good js library or function for creating natural phrases/sentences?
specifically, I have random things like "bird", "pencil" "football player" etc
and I want to be able to construct a sentence that would fit with the noun ... "johnny has 7 pencils"
"johnny has seen 3 birds"
"suzy knows 10 more football players"
the goal is to randomly generate sentences that can be modeled with algebraic expressions. It's easy enough except for generating a natural verb and getting the correct tense.
I looked briefly into natural language processing, but (at least on the surface) it mostly it looks like it goes the other way.
can you suggest a library, or if not, perhaps suggest an outline for an algorithm I could create?
Thank you!
It is very complicated, some even say impossible and I think that is the reason your question got down-voted. But your specific problem has-to some extend-a solution if you are OK with the restriction to not-overly-complicated sentences.
The basic English grammar is quite simple: subject, predicate, and object. So with a list of nouns and verbs you are already able to build grammatically correct sentences. The most difficult thing is to get the lists or build them yourself (e.g.: from some dictionary like the public domain 1913 version of Webster's) but the Internet offers several such lists.
If you have more lists (adjectives, adverbs, etc.) you can build more complex sentences. There are also lists with irregular verbs, uncommon plurals or nouns etc.
To makes things simpler I would not look for arbitrary sentences but build a bunch of them myself and allow some random generator to fill in the words from the correct list. With the simplest form SPO:
simpelSentence = randNoun() + randVerb() + randObject();
To some more complex form:
notSoSimpleSentence = randNoun() + randVerb() + randAdjective() + randObject();
Build more templates that way and if you have enough: start filling them, check the output and be disappointed. It does not work very good, you need to implement more rules like e.g.: he has but they have etc. and rest assured it needs many such rules even for the simplest sentences.
There are several scripts out there that can "write" scientific papers for you. The first hit from a Google search is SCIgen although it is written in Perl. These programs are known as "paper generators" and Lo! and Behold-they have a wikipage. If you follow that page one step higher you'll find the Category Natural Language Generation with some more information. This paragraph has some sentences…my…really hard to construct!
If you still want to do it: make lists with n-grams. Or use Google's n-gram lists (huge page with a lot of links to n-gram lists auto-generated by Google although of high quality) but be careful, these lists are huge. No, really, they are huge! Which means that you cannot just wrap them in an Array and use them directly. A megabyte or two are probably acceptable today (text-files compress well) but more than 100 gigabytes? So you have to dust off your word-washing pan and get the needed nuggets.
And after all of that hassle: how to you teach these sentences to make sense? How to avoid to put the bald man's nicely combed maroon hair up in rollers?
Nope, this problem had an overdose of a phosphodiesterase type 5 inhibitor (a non-psychoactive piperazine), sorry. An injection of methylene blue directly into the problem had been considered but assumed to only leave a monochromatic mess.
But serious: anything above some very simple sentences filled from some short lists with a handful of rules is out of reach for a little ECMA-script script written over a weekend or two.
I’m trying to show a list of lunch venues around the office with their today’s menus. But the problem is the websites that offer the lunch menus, don’t always offer the same kind of content.
For instance, some of the websites offer a nice JSON output. Look at this one, it offers the English/Finnish course names separately and everything I need is available. There are couple of others like this.
But others, don’t always have a nice output. Like this one. The content is laid out in plain HTML and English and Finnish food names are not exactly ordered. Also food properties like (L, VL, VS, G, etc) are just normal text like the food name.
What, in your opinion, is the best way to scrape all these available data in different formats and turn them into usable data? I tried to make a scraper with Node.js (& phantomjs, etc) but it only works with one website, and it’s not that accurate in case of the food names.
Thanks in advance.
You may use something like kimonolabs.com, they are much easier to use and they give you APIs to update your side.
Remember that they are best for tabular data contents.
There my be simple algorithmic solutions to the problem, If there is a list of all available food names this can be really helpful, you find the occurrence of a food name inside a document (for today).
If there is not any food list, You may use TF/IDF. TF/IDF allows to calculate the score of a word inside a document among the current document and also other documents. But this solution needs enough data to work.
I think the best solution is some thing like this:
Creating a list of all available websites that should be scrapped.
Writing driver classes for each website data.
Each driver has the duty of creating the general domain entity from its standard document.
If you can use PHP, Simple HTML Dom Parser along with Guzzle would be a great choice. These two will provide a jQuery like path finder and a nice wrapper arround HTTP.
You are touching really difficult problem. Unfortunately there are no easy solutions.
Actually there are two different parts to solve:
data scraping from different sources
data integration
Let's start with first problem - data scraping from different sources. In my projects I usually process data in several steps. I have dedicated scrapers for all specific sites I want, and process them in the following order:
fetch raw page (unstructured data)
extract data from page (unstructured data)
extract, convert and map data into page-specific model (fully structured data)
map data from fully structured model to common/normalized model
Steps 1-2 are scraping oriented and steps 3-4 are strictly data-extraction / data-integration oriented.
While you can easily implement steps 1-2 relatively easy using your own webscrapers or by utilizing existing web services - data integration is the most difficult part in your case. You will probably require some machine-learning techniques (shallow, domain specific Natural Language Processing) along with custom heuristics.
In case of such a messy input like this one I would process lines separately and use some dictionary to get rid Finnish/English words and analyse what has left. But in this case it will never be 100% accurate due to possibility of human-input errors.
I am also worried that you stack is not very well suited to do such tasks. For such processing I am utilizing Java/Groovy along with integration frameworks (Mule ESB / Spring Integration) in order to coordinate data processing.
In summary: it is really difficult and complex problem. I would rather assume less input data coverage than aiming to be 100% accurate (unless it is really worth it).
In my application, there is a comment box. If someone enters a comment like
<script>alert("hello")</script>
then an alert appears when I load that page.
Is there anyway to prevent this?
There are several ways to address this, but since you haven't mentioned which back-end technology you are using, it is hard to give anything but rough answers.
Also, you haven't mentioned if you want to allow, or deny, the ability to enter regular HTML in the box.
Method 1:
Sanitize inputs on the way in. When you accept something at the server, look for the script tags and remove them.
This is actually far more difficult to get right then might be expected.
Method 2:
Escape the data on the way back down to the server. In PHP there is a function called
htmlentities which will turn all HTML into which renders as literally what was typed.
The words <script>alert("hello")</script> would appear on your page.
Method 3
White-list
This is far beyond the answer of a single post and really required knowing your back-end system, but it is possible to allow some HTML characters with disallowing others.
This is insanely difficult to get right and you really are best using a library package that has been very well tested.
You should treat user input as plain text rather than HTML. By correctly escaping HTML entities, you can render what looks like valid HTML text without having the browser try to execute it. This is good practice in general, for your client-side code as well as any user provided values passed to your back-end. Issues arising from this are broadly referred to as script injection or cross-site scripting.
Practically on the client-side this is pretty easy since you're using jQuery. When updating the DOM based on user input, rely on the text method in place of the html method. You can see a simple example of the difference in this jsFiddle.
The best way is replace <script> with other string.For example in C#use:
str.replace("<script>","O_o");
Other options has a lot of disadvantage.
1.Block javascript: It cause some validation disabled too.those validation that done in frontend.Also after retrive from database it works again.I mean attacker can inject script as input in forms and it saved in database.after you return records from database in another page it render as script!!!!
2.render as text. In some technologies it needs third-party packages that it is risk in itself.Maybe these packages has backdoor!!!
convert value into string ,it solved in my case
example
var anything
My project should support different languages for the GUI (spanish, english, german, etc). I'm using CodeIgniter, so I haven't got any problem with PHP views using Language Class. I can load some previously defined vars on PHP to create views in different languages.
The problem comes here:
Some features (a lot of them actually) use Javascript: personalized context menu for some items, differents DIVs created dynamically, etc. The most of this features are created dynamically, so I can't know the language selected (I can create a lot of duplicated code, one for each language, but this is too redundant).
I need to find a way to translate this content into the new language previously selected by user.
For example:
The user right-click and the context menu have the following options (created dynamically using Javascript):
Hi
Goodbye
When the user change the page language to 'Spanish', the context menu should show:
Hola
Adios
Is there any way to save some variables with all the names in different languages, and then load to create the menus?
Sorry for the big post. I hope everyone can help me or bring me a little tip.
there are several systems to use when you want i18n (short for "internationalization") to work in a server side language as well as in client side Javascript.
this is an example I found for PHP and JavaScript: http://doc.silverstripe.org/sapphire/en/topics/i18n
I've done this in PHP, Ruby and Python. In general there is one way this is done. You create a library of text. this library has pointers to specific pieces of that text. In your code you use that pointer to refer to that piece of text. What the example above does provide you a way to create a library in PHP that can be compiled to a JavaScript equivalent that you can call in JavaScript. Basically it completely separates copywriting and other text from the code.
I hope this helps you on your way.
Is there any way to save some
variables with all the names in
different languages, and then load to
create the menus?
I assume you're asking if you can save these user preferences?
If so store it as a cookie on the user's computer
If you don't mean this, and you want to store all of the variations of languages, then you could store it in an array which can be loaded by javascript