Read data from another website

Read data from another website - javascript

If I have a website, how can I display data from a different website?
For example, if I have www.example.com and I want to display the sentence "I have X apples", where X is populated from www.AppleNumber.com, which I know the format of (the X I want will always be in a div named AppleNum formatted number: X )
How can I go about this?
The actual problem I want is to read from a Chrome extension's web page to see how many installations it has, but I'm certainly okay with answers to the simplified version.
I'm only adding tags I'm thinking of, don't limit your answers based on that.

I think this might solve your problem:
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
From: https://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html
With this you can get the output of a website, like reading a txt-file.
I don't know if this will works with a Chrome extension's website.

Try
$apples = file_get_contents("http://www.AppleNumber.com/?AppleID=3");
preg_match("/<div id='AppleNum'>(\d+)<\/div>/", $apples, $Matches);
var_dump($Matches);
Regex Demo: https://regex101.com/r/uK6oR4/1

Related

How to get "Publish Date" dynamic value from page using HtmlUnit in Java?

As a simple coding exercise, I am working on a small project that is comparing current system date to the dates present in few web-pages (to see if there is a new update). For most of them, everything works just fine, but there is one that is causing me some problems.
Page: https://access.redhat.com/security/security-updates/#/security-advisories
Value I am trying to get: Publish Date
Question: How can I do it in Java?
Tried using simple BufferedReader, tried saving whole page to a file - to no avail. I did some research and it seems like I need to use HtmlUnit, but I feel like I need advice to understand how does it work.
public static void main(String[] args) {
Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
System.setProperty("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
String START_URL ="https://access.redhat.com/security/security-updates/#/security-advisories";
try{
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.waitForBackgroundJavaScript(5000);
HtmlPage page = webClient.getPage(START_URL);
String pageContent = page.asText(); //this will NOT include dates
System.out.println(pageContent);
} catch (IOException ex){
ex.printStackTrace();
}
}
}
I would like to get the content of the first "Publish Date" box from a https://access.redhat.com/security/security-updates/#/security-advisories page, however no matter what approach I try, the dynamic value is never visible and cannot be stored/checked.

You could use:
public static void main(String[] args) {
try {
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
HtmlPage page = webClient.getPage("https://access.redhat.com/security/security-updates/#/security-advisories");
webClient.waitForBackgroundJavaScript(15000);
HtmlTable table = (HtmlTable) page.getElementById("DataTables_Table_0");
for (HtmlTableRow row : table.getRows()) {
List<HtmlElement> timeElements = row.getElementsByTagName("time");
if (timeElements.size() > 0) {
HtmlElement timeElement = timeElements.get(0);
// GET THE TIME FROM THE CELL
String time = timeElement.getAttribute("datetime"); // time in format "2019-05-08T17:34:20Z"
System.out.println("TIME: " + time);
} else {
// This row does not contain a element with time tag
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
This is untested, maybe there is something to change in the path to the correct nodes, but this should give you a start :)
UPDATE:
I now tested it and it is printing the desired time to the console.
This ONE way of doing it. There are many other ways in HTMLUnit to get the Elements of the DOM you need.
I suggest to read the getting started document -> "Finding a specific Element".

awesomium web scraping certain parts

I asked this earlier but I wanted to rephrase the question. I am trying to make a scraper for my project. I would like to have it display a certain part of a link. The only part of the link that changes is the number. This number is what I would like to scrape. The link looks like this:
<a href="/link/player.jsp?user=966354" target="_parent" "="">
As mentioned I am trying to scrap only the 966354 part of the link. I have tried several ways to do this but cant figure it out. When I add
<a href="/link/player.jsp?user="
to the code below it breaks
List<string> player = new List<string>();
string html = webControl2.ExecuteJavascriptWithResult("document.getElementsByTagName('a')[0].innerHTML");
MatchCollection m1 = Regex.Matches(html, "<a href=\\s*(.+?)\\s*</a>", RegexOptions.Singleline);
foreach (Match m in m1)
{
string players = m.Groups[1].Value;
player.Add(players);
}
listBox.DataSource = player;
So I removed it, it shows no errors until I go to run the program then I get this error:
"An unhandled exception of type 'System.InvalidOperationException' occurred in Awesomium.Windows.Forms.dll"
So I tried this and it some what works:
string html = webControl2.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
This code scraps but not the way I would like, Could someone lend a helping hand please.

I would use HtmlAgilityPack (install it via NuGet) and XPath queries to parse HTML.
Something like this:
string html = webControl2.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var playerIds = new List<string>();
var playerNodes = htmlDoc.DocumentNode.SelectNodes("//a[contains(#href, '/link/profile-view.jsp?user=')]");
if (playerNodes != null)
{
foreach (var playerNode in playerNodes)
{
string href = playerNode.Attributes["href"].Value;
var parts = href.Split(new char[] { '=' }, StringSplitOptions.RemoveEmptyEntries);
if (parts.Length > 1)
{
playerIds.Add(parts[1]);
}
}
id.DataSource = playerIds;
}
Also you may find these two simple helper classes useful: https://gist.github.com/AlexP11223/8286153
The first one is extension methods for WebView/WebControl and the second one has some static methods to generate JS code for retrieving elements (JSObject) by XPath + getting coordinates of JSObject)

Using a sample html file such as below, I was unable to duplicate the exception.
<html>
test
</html>
However, the javascript
document.getElementsByTagName('a')[0].innerHTML
will return "test" in my example. What you probably want is
document.getElementsByTagName('a')[0].href
which will return the href portion.
The 'innerHTML' property will return everything between the start and end tags (such as <html> </html>). This is probably the reason you have better success when getting the 'html' element - you end up parsing the entire <a> </a> link.
FYI, as a test you can use your browser to test out the javascript output.

How to filter javascript from specific urls in HtmlUnit

HtmlUnit takes lot of time to execute javascript, i would like to know if its possible to make HtmlUnit not to load javascript from url regex filters.

Not exactly, you can't only disable javascript as a whole (probably you already know it):
final WebClient webClient = new WebClient();
webClient.getOptions().setJavascriptEnable(false);
but you can use a ScriptPreProcessor the javascript, and erase what you don't want:
webClient.setScriptPreProcessor(new ScriptPreProcessor() {
#Override
public String preProcess(HtmlPage htmlPage, String sourceCode, String sourceName, int lineNumber, HtmlElement htmlElement) {
if (match...)
return "";
}
});

Web View not returning a correct value

With version 2.3.3 from web view i am getting result string as 1364311909
But with the same 4.0 or above i am getting 1.36431e+09 a string value in different format
The Value is passing from a javascript to a webview using Web View Android Frame Work
The Webview Used With JavaScript enable :
myWebView.getSettings().setPluginState(PluginState.ON);
System.out.println("default encoding state is...."+myWebView.getSettings().getDefaultTextEncodingName());
myWebView.getSettings().setJavaScriptEnabled(true);
myWebView.loadUrl("the Url");
JavaScript Code for retrieving the value is
public void changeIntent(final String updateId) {
showToast(updateId);
//* addToUpdatesId(updateId);
Runnable runnable = new Runnable() {
public void run() {
// your code here
Intent intent = new Intent(mContext,XXX.class);
String allId= updateId;
Global.setUpdateId(updateId);
intent.putExtra(UPDATESID, allId);
mContext.startActivity(intent);
}
};
runOnUiThread(runnable);
}
How to solve this issue. Plz let me know

We have solved it
Java Script Always take High Precision Value and it well add power of exponential at the end of the String
To make this solve Append a Empty String with the Original String That you are passing
and return a string.
like
String s= Original string+"";
return s;
Cheers......!!!!!!!!!!!

Take a screenshot of an iFrame

I am looking for a simple way to take a screenshot of an iFrame in my ASP page. I just couldn't achieve it with C# and I lack of knowledge of Javascript! Does anyone out there know the simple and best way to achieve this?
What I am trying to do is, I am building a website that students can log in to e-government website in my country and prove if they are continuing student with a single click so that they can get discount from our service.
Edit: The puzzle should be solved in local.

this piece of code worked for me. I hope it does the same to the others.
private void saveURLToImage(string url)
{
if (!string.IsNullOrEmpty(url))
{
string content = "";
System.Net.WebRequest webRequest = WebRequest.Create(url);
System.Net.WebResponse webResponse = webRequest.GetResponse();
System.IO.StreamReader sr = new StreamReader(webResponse.GetResponseStream(), System.Text.Encoding.GetEncoding("UTF-8"));
content = sr.ReadToEnd();
//save to file
byte[] b = Convert.FromBase64String(content);
System.IO.MemoryStream ms = new System.IO.MemoryStream(b);
System.Drawing.Image img = System.Drawing.Image.FromStream(ms);
img.Save(#"c:\pic.jpg", System.Drawing.Imaging.ImageFormat.Jpeg);
img.Dispose();
ms.Close();
}
}

Unless I'm misunderstanding you, this is impossible.
You cannot instruct the user's browser to take a screenshot (this would be a security risk … and has few uses cases anyway).
You cannot load the page you want a screenshot of yourself (with server side code) because you don't have the credentials needed to access it.

server side
Take a screenshot of a webpage with JavaScript?
javascript
http://html2canvas.hertzen.com/

We Keep Coding

JavaScript is the programming language of the Web.

Read data from another website - javascript

Try $apples = file_get_contents("http://www.AppleNumber.com/?AppleID=3"); preg_match("/<div id='AppleNum'>(\d+)<\/div>/", $apples, $Matches); var_dump($Matches); Regex Demo: https://regex101.com/r/uK6oR4/1

Related

How to get "Publish Date" dynamic value from page using HtmlUnit in Java?

awesomium web scraping certain parts

How to filter javascript from specific urls in HtmlUnit

Web View not returning a correct value

Take a screenshot of an iFrame

Categories

Resources