Wixful Thinking: Mining Hidden Wix Web Pages

Wixful Thinking: Mining Hidden Wix Web
Pages
Written by Justin, October 29th, 2015
Wix is a popular free website hosting platform, that has 73 million users across 180
countries (stats from Wix themselves). Recently while working on an investigation, I
was spending time hunting around a website that is hosted on Wix and I discovered
an interesting anomaly. Wix has a WYSIWYG editor that allows you to build your
entire site, and through that editor you are able to create new pages, hide these
pages or delete them. What I discovered is that although you can hide content from
a passing user, the content is still accessible to the general public by analyzing the
underlying code on the hosted web page. This post will walk you through how I
discovered this, and how we can write code to automatically extract the data that is
hidden from view. Do note that this isnt really a vulnerability per se, but is an
implementation flaw in how Wix renders pages in the browser.
The Wix Editor

To test out my theory, I simply created a new Wix account and used a default
template. If you click on the navigation bar of the site, youll get a little menu.
Here you can click the Navigate button that will open a sub dialog where you can
add a new page. Lets do that, and set the page to hidden.
Now if you click the Publish button on the top right of the site, your page should be
live and you can go and view it. You will notice that your test page that you added
and then hid, is not viewable on the site. The interesting thing is that depending on
your target site, Google might have had the opportunity to index these pages
before they were hidden, and sometimes it wont. Lets see how we can extract the
hidden page.
Wix Client Side Code

If you do a view source on the page you will see some interesting things right away:
Wix is telling you something here (this is why you should spend time reading a sites
source code as well) that is really important. Wix relies heavily on client-side
processing, and dynamic content loading much like Twitter or Facebook. In this
above HTML comment they are giving you an alternate URL scheme to access a
search engine friendly version of the page using the following URL format:
http://yoursite.com/?_escaped_fragment=page_name/page_id
We will keep this little nugget of information in our back pocket for when we want
to extract content. Keep scrolling through the source code and eventually you
will see a Javascript variable with a big blob of code after it (I have cut the output for
brevity):
var publicModel =
{domain:wix.com,externalBaseUrl:http:\/\/ts686680.wix.com\/boutiquerecruitment,unicodeExternalBaseUrl:http:\/\/ts686680.wix.com\/boutiquerecruitment,pageList:{masterPage:
[https:\/\/static.wixstatic.com\/sites\/406c29_71b321500c659bb2b255d2c04d9916c
3_8.json.z?
v=3,https:\/\/staticorigin.wixstatic.com\/sites\/406c29_71b321500c659bb2b255d2
c04d9916c3_8.json.z?v=3,
https:\/\/fallback.wix.com\/wix-html-editor-pageswebapp\/page\/406c29_71b321500c659bb2b255d2c04d9916c3_8.json],
pages:[{pageId:cce3,title:Contact,urls:
[https:\/\/static.wixstatic.com\/sites\/406c29_ad9c7e51443384e9823a4bcca63c33e
c_1.json.z?v=3,
https:\/\/staticorigin.wixstatic.com\/sites\/406c29_ad9c7e51443384e9823a4bcca63
c33ec_1.json.z?v=3,
https:\/\/fallback.wix.com\/wix-html-editor-pageswebapp\/page\/406c29_ad9c7e51443384e9823a4bcca63c33ec_1.json]},

{pageId:rz5e9,title:Test,urls:
[https:\/\/static.wixstatic.com\/sites\/406c29_4f07e319f667853a8ea619bee7dc003
d_8.json.z?
v=3,https:\/\/staticorigin.wixstatic.com\/sites\/406c29_4f07e319f667853a8ea619b
ee7dc003d_8.json.z?v=3,
https:\/\/fallback.wix.com\/wix-html-editor-pageswebapp\/page\/406c29_4f07e319f667853a8ea619bee7dc003d_8.json]}]
Beautiful! You can see all of the pages have entries in this publicModel variable
including our test page that was set to hidden. This was the exact anomaly that I
was referring to in the opening paragraph of this blog post. By taking a closer look
at the JSON we can see a list of URLs that point to some JSON endpoints. If we visit
the first URL in the list:
https://static.wixstatic.com/sites/406c29_4f07e319f667853a8ea619bee7dc003d_8.js
on.z
You will see a full JSON document that has a bunch of information about the page
including some of the text content. By scrolling to the very end of the page you
should see a key in the JSON called pageUriSEO. This key contains the search engine
friendly title of the page. As well looking at the structure key we see another key
named id. By combining these pieces of information, we can then construct our
search engine friendly URL that will allow us to download and store the content of
the site as it is seen by search engines. If we use the above example on my test site
we can test whether we can access hidden content in our browser:
http://ts686680.wix.com/boutique-recruitment?_escaped_fragment_=test/rz5e9
Now that we can do this, lets create a script that will mine all pages from a Wix site
automatically for us including hidden content.
An Extra Tidbit
While reviewing all of this code, I also discovered a key called timeSincePublish which
is a timestamp value of when the owner of the site last clicked the Publish button.
This is NOT the original publish date of the site itself however. This timestamp can
be useful from an investigation perspective if you are building a timeline of events

on a target or if you are looking for a date correlation to other events.
Coding It Up
This is going to be pretty straightforward, we are just going to retrieve the target
domains Wix page, extract the Javascript code and walk through each URL that we
discover.
1
2
3
4
5
6
7
8
9
1
0
1
1
1
2
1
3
1
4
import
import
import
import
import
requests
json
os
argparse
sys
ap = argparse.ArgumentParser()
ap.add_argument("-d","--domain",
args = vars(ap.parse_args())
required=False,help="The domain to target ie. cnn.com")
domain = args['domain']
if not os.path.exists(domain):
os.mkdir(domain)
Pretty straightforward code, we are just adding the necessary imports, putting
some argument parsing in place and we create a directory to store our results. Now
lets add the first set of requests:
1
6
1
7
1
8
1
9
2
0
2
1
2
# send off first request

response = requests.get("http://%s" % domain)
print "[*] Trying domain: %s" % domain
public_model = response.content.find("publicModel")
if public_model == -1:
print "[!] Could not locate Javascript code. Is this a Wix domain?"
sys.exit(0)
end_model
= response.content[public_model:].find(";")
2
2
3
2
4
2
5
2
6
json_blob = response.content[public_model:public_model+end_model]
2
json_blob = json_blob.split("=",1)[1]
7
2
model
= json.loads(json_blob)
8
2
9
3
0
3
1
3
2
Lets take a closer look at this code:

Line 17: send off the first request to retrieve the home page that will contain
the Javascript code.
Lines 21-25: attempt to find the publicModel Javascript variable in the content
of the page, and if we cant we bail out.
Lines 27-32: we find the end of the Javascript blob (27) and then extract all of
the data (29-30) and parse it as JSON to convert it to a Python dictionary (32).
Now we have retrieved the main page JSON we can iterate over the information
and begin retrieving all of the published pages for this Wix site. Lets implement the
code to do that.
3 for url in model["pageList"]["pages"]:

5
3
# grab the first JSON url
6
json_response = requests.get(url["urls"][0])
3
7
page
= json.loads(json_response.content)
3
8
# grab the SEO friendly version of the page
3
#http://yoursite.com/?_escaped_fragment=page_name/page_id
9
print "[*] Retrieving page: %s" % page['pageUriSEO']
4
seo_url = "http://%s/?_escaped_fragment_=%s/%s" % (domain,page['pageUriSEO'],url['pageId'])
0
4
response = requests.get(seo_url)
1
4
# store the HTML page
2
with open("%s/%s.html" % (domain,page['pageUriSEO']),"wb") as fd:
4
fd.write(response.content)
3
4
# store the raw JSON
4
4
5
4
6
4
7
4
8
4
9
5
0
5
1
5
2
5
3
5
4
5
5
with open("%s/%s.json" % (domain,page['pageUriSEO']),"wb") as fd:

fd.write(json_response.content)
This is the final bit of code to make this badboy work, lets take a look at it:
Line 35: we are extracting the list of pages from the pageList key in
the publicModel Javascript code.
Lines 38-40: we grab the first JSON URL from the page list (38) and retrieve it
(40).
Line 45: here we are building the search engine friendly version of the page
so that we can store it as plain HTML. We are using the pageUriSEO key as
well as the pageId to build this URL.
Lines 47-55: we retrieve the HTML (47) and then store the page (50) and
store the raw JSON as well (54) for the page.
That is it! Find yourself a Wix site to target, and give it a run. I also encourage you to
test out hiding content on your own test site and see how it works.
Finding Wix Sites

Remember my post that dealt with mining Google Analytics codes? If you are
looking to find Wix pages to test, just spin up a test Wix account, and point the
Google Analytics mining script at your test page. All Wix sites appear to share a
common Google Analytics tracking code so youll have no shortage of sites to look
at when it returns its results. Be patient, there are a pile of results!

Wixful Thinking: Mining Hidden Wix Web Pages

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Wixful Thinking: Mining Hidden Wix Web Pages

Hochgeladen von

Copyright:

Verfügbare Formate

Wixful Thinking: Mining Hidden Wix Web

The Wix Editor

Wix Client Side Code

be useful from an investigation perspective if you are building a timeline of events

required=False,help="The domain to target ie. cnn.com")

# send off first request

Lets take a closer look at this code:

3 for url in model["pageList"]["pages"]:

with open("%s/%s.json" % (domain,page['pageUriSEO']),"wb") as fd:

Finding Wix Sites

Das könnte Ihnen auch gefallen