I have a set of tables on BigQuery with some kind of data, and I want to process that data via a JavaScript function I defined. The JS function maps the old data to a new schema, that has to be the one implemented by the new tables.
My set of tables has a common prefix and I want to migrate all of them together to the new schema, by creating tables with a different prefix but keeping the same suffix for all of them.
Example: I have 100 tables called raw_data_SUFFIX and I want to migrate them to 100 tables with a new schema called parsed_data_SUFFIX, keeping each suffix.
This is the simple query for migrating the data
SELECT some_attribute, parse(another_attribute) as parsed
FROM `<my-project>.<dataset>.data_*`
Is there a way to do it via the BigQuery UI?
In order to achieve what you aim, you wold have to use a DDL statement CREATE TABLE as follows:
CREATE TABLE 'project_id.dataset.table' AS SELECT * FROM `project_id.dataset.table_source`
However, it would not be possible to reference multiple destinations with wildcards. As stated in the documentation, here, there are some limitations when using wildcards, among them:
Queries that contain DML statements cannot use a wildcard table as the
target of the query. For example, a wildcard table can be used in the
FROM clause of an UPDATE query, but a wildcard table cannot be used as
the target of the UPDATE operation.
Nonetheless, you can use the Python API to make a request to BigQuery. Then, save each view to a new table, each table's name with a new prefix and old suffix. You can do it as below:
from google.cloud import bigquery
client = bigquery.Client()
dataset_id = 'your_dataset_id'
#list all the tables as objects , each obj has table.project,table.dataset_id,table.table_id
tables = client.list_tables(dataset_id)
#initialising arrays (not necessary)
suffix=[]
table_reference=[]
#looping through all the tables in you dataset
for table in tables:
#Filter if the table's name start with the prefix
if "your_table_prefix" in table.table_id:
#retrieves the suffix, which will be used in the new table's name
#extracts the suffix of the table's name
suffix=table.table_id.strip('your_table_prefix')
#reference the source table
table_reference=".".join([table.project,table.dataset_id,table.table_id])
#table destination with new prefix and old suffix
job_config = bigquery.QueryJobConfig()
table_ref = client.dataset(dataset_id).table("_".join(['new_table_prefix',suffix]))
job_config.destination = table_ref
sql='''
CREATE TEMP FUNCTION
function_name ( <input> )
RETURNS <type>
LANGUAGE js AS """
return <type>;
""";
SELECT function_name(<columns>) FROM `{0}`'''.format(table_reference)
query_job = client.query(
sql,
# Location must match that of the dataset(s) referenced in the query
# and of the destination table.
location='US',
job_config=job_config)
query_job.result() # Waits for the query to finish
print('Query results loaded to table {}'.format(table_ref.path))
Notice that in the sql query, first ''' then """ were used in order to define the query and the JS Temp Function, respectively.
I would like to point that you have to make sure you environment has the appropriate packages to use the Python API for BigQuery, here. You can install the BigQuery package using: pip install --upgrade google-cloud-bigquery.
Related
I have a csv file which has the input in the below format (It has no headers as first row):
India,QA,1200,
India,QA1,1201,
India,QA2,1202,
USA,Dev1,5580,
USA,Dev2,5580,
AUS,Dev3,3300,
AUS,Dev4,3301,
I have configured the CSV Data Set Config component and have given the respective path and variable name details. Snapshot below:
from the command line argument, i will be invoking the Jmeter jmeter.bat -t C:\Users\Dev\Desktop\JI\testscript.jmx -JCountry=Indiawhich also has a parameter called JCountry=India.
Now, I have to use this value (India) and then search the csv file's first column and if it matches, I need to send only those particular rows matching to the country name given from the cmd to the script.
I thought of using If Controller but how can I check the csv files first row and when there is a match, send those details to the script.
The easiest option would be dynamically generating a CSV file with only India lines
Add setUp Thread Group to your Test Plan
Add Test Action sampler to the setUp Thread Group (this way you won't have an extra result in .jtl file)
Add JSR223 PreProcessor as a child of the Test Action Sampler
Put the following code into "Script" area:
String country = props.get('Country')
def originalCsvFile = new File('C:/Users/Dev/Desktop/JI/JVMDetails/Details.txt')
def countryCsvFile = new File('C:/Users/Dev/Desktop/JI/JVMDetails/Country.txt')
countryCsvFile.delete()
originalCsvFile.eachLine {line ->
if (line.startsWith(country)) {
countryCsvFile << line
countryCsvFile << System.getProperty('line.separator')
}
}
Configure your CSV Data Set Config to use C:\Users\Dev\Desktop\JI\JVMDetails\Country.txt as this file will have only those lines which start with what you have defined as Country property
More information:
Apache Groovy - Why and How You Should Use It
Groovy Goodness: Working with Files
You need to loop through CSV, see example or other examples.
As the second example use in while condition the variable from CSV: ${Country} .
Inside loop you need to add If Controller with condition to compare country variable against country property:
${__jexl3("${__P{Country}" == "${Country}")}
Checking this and using __jexl3 or __groovy function in Condition is advised for performances
During my load test, I would like to fetch values from an SQL database. How can I achieve this on load runner TrueClient protocol using JavaScript?
This would be great help…
Important: This will only work in TruClient (IE) and not in TruClient (Firefox).
Enter a new "Eveluate Javascript" step, and edit the javasctipt like so:
var connection = new ActiveXObject("ADODB.Connection") ;
var connectionstring="Data Source=<server>;Initial Catalog=<catalog>;User ID=<user>;Password=<password>;Provider=SQLOLEDB";
connection.Open(connectionstring);
var rs = new ActiveXObject("ADODB.Recordset");
rs.Open("SELECT * FROM table", connection);
rs.MoveFirst
while(!rs.eof)
{
// Here you should get the value from the 1st cell, 1st column
var value = rs.fields(1);
rs.movenext;
}
rs.close;
connection.close;
There are several options.
I'll list them in order of their complexity:
Option 1:
Use a parameter file to hold all of your data. If you need to modify it periodically, consider placing it in a shared location, accessible for all LGs.
Option 2:
Use the Virtual Table Server (VTS) provided with LoadRunner. It is dedicated to sharing test data between virtual-users. Queries are easy with a built in API.
Option 3:
You could write a custom C function, using LoadRunner DB API to query the DB, calling the function from your script using an Eval C step.
Note this can only be done in VuGen.
I am creating a enterprise search engine. I am using Solr for creating a search engine and SolrJ for front end with JSP. I want apply pagination to my search results. My code for getting the results from solr core is as follows.
while(iter.hasNext())
{
SolrDocument doc1 =iter.next();
String dc =iter.next().toString();
out.println(doc1.getFieldValue("id"));
out.println(doc1.getFieldValue("title"));
out.println("<BR>");
out.println("content_type :");
out.println(doc1.getFieldValue("content_type"));
out.println("<BR>");
out.println("doc_type :");
out.println(doc1.getFieldValue("doc_type"));
} %>
there are 600 records in my search engine. if a search for a specific keyword all the records related to it come on single page. Can any body suggest me any logic for pagination using javascript. i want to use client side pagination. please help.
While creating solr query you can set start and rows e.g:
SolrQuery query = new SolrQuery(searchTerm);
query.setStart((pageNum - 1) * numItemsPerPage);
query.setRows(numItemsPerPage);
// execute the query on the server and get results
QueryResponse res = solrServer.query(solrQuery);
Also, each time you do iter.next() you are reading next element in the Iterator. Hence by doing
SolrDocument doc1 =iter.next();
String dc =iter.next().toString();
you are skiping one element. Refer: here
I am currently trying to use the node module Oracledb to import a csv to my database. The database is connected correctly, as I can SELECT * FROM MYDB. But I cannot find a way to easily import a CSV to this database.
*note : I am currently using jsontoCSV and then fswrite to create the CSV. So if it is easier to write a SQL query to import a json to Oracle using Oracledb, that could also work.
Thank you.
I only have a rough management overview on oracle techniques. I would guess to use sqlloader or datapump (abbrevated dp) to read csv; perhaps using additionally (oracle) external tables (could be accessed via file system). It depends on your infrastructure.
The internet is full of examples. One main ressource for me is "ask tom" (use your fav. search engine with the words oracle ask tom kyte).
User SQL*Loader as mentioned.
create table
drop table mytab;
create table mytab (id number, data varchar2(40));
mycsv.csv
"ab", "first row"
"cd", "secondrow"
Create control file myctl.ctl:
LOAD DATA
INFILE 'mycsv.csv'
INSERT INTO TABLE mytab
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS (id, data)
Run:
sqlldr cj/cj#localhost/orcl control=myctl.ctl
I have a program that scrapes value from https://web.apps.markit.com/WMXAXLP?YYY2220_zJkhPN/sWPxwhzYw8K4DcqW07HfIQykbYMaXf8fTzWT6WKnuivTcM0W584u1QRwj
My current code is:
doc = Nokogiri::HTML(open(source_url))
puts doc.css('span.indexDate').text
date = doc.css('span.indexDate').text
date = Date.parse(date)
puts date
values = doc.css('table#CdsIndexTable td.col2 span')
puts values
This scrapes the date and values of the second column from the "CDS Indexes" table correctly which is fine. Now, I want to scrape the similar values from the "Bond Indexes" table where I am facing the problem.
I can see a JavaScript function changes it without loading the page and without changing the URL of the page. The difference between these two tables is their IDs are different which is exactly that it should be. But, unfortunately when I try with:
values = doc.css('table#BondIndexTable')
puts values
I get nothing from the Bond Indexes table. But I get values from CDS Indexes table if I use:
values = doc.css('table#CdsIndexTable')
puts values
How can I get the values from both tables?
You can use Capybara with the Poltergeist driver to execute the Javascript and format the page. Poltergeist is a wrapper for the PhantomJS headless browser. Here's an example of how you can do it:
require 'rubygems'
require 'capybara'
require 'capybara/dsl'
require 'capybara/poltergeist'
Capybara.default_driver = :poltergeist
Capybara.run_server = false
module GetPrice
class WebScraper
include Capybara::DSL
def get_page_data(url)
visit(url)
doc = Nokogiri::HTML(page.html)
doc.css('td.col2 span')
end
end
end
scraper = GetPrice::WebScraper.new
puts scraper.get_page_data('https://web.apps.markit.com/WMXAXLP?YYY2220_zJkhPN/sWPxwhzYw8K4DcqW07HfIQykbYMaXf8fTzWT6WKnuivTcM0W584u1QRwj').map(&:text).inspect
Visit here for a complete example using Amazon.com:
https://github.com/wakproductions/amazon_get_price/blob/master/getprice.rb
If you don't want to use PhantomJS you can also use the network sniffer on Firefox or Chrome development tools, and you will see that the HTML table data is returned with a javascript POST request to the server.
Then instead of opening the original page URL with Nokogiri, you'd instead run this POST from your Ruby script and parse and interpret that data instead. It looks like it's just JSON data with HTML embedded into it. You could extract the HTML and feed that to Nokogiri.
It requires a bit of extra detective work, but I've used this method many times with JavaScript web pages and scraping. It works OK for most simple tasks, but it requires a bit of digging into the inner workings of the page and network traffic.
Here's an example of the JSON data from the Javascript POST request:
Bonds:
https://web.apps.markit.com/AppsApi/GetIndexData?indexOrBond=bond&ClientCode=WSJ
CDS:
https://web.apps.markit.com/AppsApi/GetIndexData?indexOrBond=cds&ClientCode=WSJ
Here's the quick and dirty solution just so you get an idea. This will grab the cookie from the initial page and use it in the request to get the JSON data, then parse the JSON data and feed the extracted HTML to Nokogiri:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'json'
# Open the initial page to grab the cookie from it
p1 = open('https://web.apps.markit.com/WMXAXLP?YYY2220_zJkhPN/sWPxwhzYw8K4DcqW07HfIQykbYMaXf8fTzWT6WKnuivTcM0W584u1QRwj')
# Save the cookie
cookie = p1.meta['set-cookie'].split('; ',2)[0]
# Open the JSON data page using our cookie we just obtained
p2 = open('https://web.apps.markit.com/AppsApi/GetIndexData?indexOrBond=bond&ClientCode=WSJ',
'Cookie' => cookie)
# Get the raw JSON
json = p2.read
# Parse it
data = JSON.parse(json)
# Feed the html portion to Nokogiri
doc = Nokogiri.parse(data['html'])
# Extract the values
values = doc.css('td.col2 span')
puts values.map(&:text).inspect
=> ["0.02%", "0.02%", "n.a.", "-0.03%", "0.02%", "0.04%",
"0.01%", "0.02%", "0.08%", "-0.01%", "0.03%", "0.01%", "0.05%", "0.04%"]
PhantomJS is a headless browser with a JavaScript API. Since you need to run the scripts on the page you are scraping, a browser will do that for you; and PhantomJS will allow you to manipulate and scrape the page after the script execution.