Beruflich Dokumente
Kultur Dokumente
Pages
Written by Justin, October 29th, 2015
Wix is a popular free website hosting platform, that has 73 million users across 180
countries (stats from Wix themselves). Recently while working on an investigation, I
was spending time hunting around a website that is hosted on Wix and I discovered
an interesting anomaly. Wix has a WYSIWYG editor that allows you to build your
entire site, and through that editor you are able to create new pages, hide these
pages or delete them. What I discovered is that although you can hide content from
a passing user, the content is still accessible to the general public by analyzing the
underlying code on the hosted web page. This post will walk you through how I
discovered this, and how we can write code to automatically extract the data that is
hidden from view. Do note that this isnt really a vulnerability per se, but is an
implementation flaw in how Wix renders pages in the browser.
Here you can click the Navigate button that will open a sub dialog where you can
add a new page. Lets do that, and set the page to hidden.
Now if you click the Publish button on the top right of the site, your page should be
live and you can go and view it. You will notice that your test page that you added
and then hid, is not viewable on the site. The interesting thing is that depending on
your target site, Google might have had the opportunity to index these pages
before they were hidden, and sometimes it wont. Lets see how we can extract the
hidden page.
Wix is telling you something here (this is why you should spend time reading a sites
source code as well) that is really important. Wix relies heavily on client-side
processing, and dynamic content loading much like Twitter or Facebook. In this
above HTML comment they are giving you an alternate URL scheme to access a
search engine friendly version of the page using the following URL format:
http://yoursite.com/?_escaped_fragment=page_name/page_id
We will keep this little nugget of information in our back pocket for when we want
to extract content. Keep scrolling through the source code and eventually you
will see a Javascript variable with a big blob of code after it (I have cut the output for
brevity):
var publicModel =
{domain:wix.com,externalBaseUrl:http:\/\/ts686680.wix.com\/boutiquerecruitment,unicodeExternalBaseUrl:http:\/\/ts686680.wix.com\/boutiquerecruitment,pageList:{masterPage:
[https:\/\/static.wixstatic.com\/sites\/406c29_71b321500c659bb2b255d2c04d9916c
3_8.json.z?
v=3,https:\/\/staticorigin.wixstatic.com\/sites\/406c29_71b321500c659bb2b255d2
c04d9916c3_8.json.z?v=3,
https:\/\/fallback.wix.com\/wix-html-editor-pageswebapp\/page\/406c29_71b321500c659bb2b255d2c04d9916c3_8.json],
pages:[{pageId:cce3,title:Contact,urls:
[https:\/\/static.wixstatic.com\/sites\/406c29_ad9c7e51443384e9823a4bcca63c33e
c_1.json.z?v=3,
https:\/\/staticorigin.wixstatic.com\/sites\/406c29_ad9c7e51443384e9823a4bcca63
c33ec_1.json.z?v=3,
https:\/\/fallback.wix.com\/wix-html-editor-pageswebapp\/page\/406c29_ad9c7e51443384e9823a4bcca63c33ec_1.json]},
{pageId:rz5e9,title:Test,urls:
[https:\/\/static.wixstatic.com\/sites\/406c29_4f07e319f667853a8ea619bee7dc003
d_8.json.z?
v=3,https:\/\/staticorigin.wixstatic.com\/sites\/406c29_4f07e319f667853a8ea619b
ee7dc003d_8.json.z?v=3,
https:\/\/fallback.wix.com\/wix-html-editor-pageswebapp\/page\/406c29_4f07e319f667853a8ea619bee7dc003d_8.json]}]
Beautiful! You can see all of the pages have entries in this publicModel variable
including our test page that was set to hidden. This was the exact anomaly that I
was referring to in the opening paragraph of this blog post. By taking a closer look
at the JSON we can see a list of URLs that point to some JSON endpoints. If we visit
the first URL in the list:
https://static.wixstatic.com/sites/406c29_4f07e319f667853a8ea619bee7dc003d_8.js
on.z
You will see a full JSON document that has a bunch of information about the page
including some of the text content. By scrolling to the very end of the page you
should see a key in the JSON called pageUriSEO. This key contains the search engine
friendly title of the page. As well looking at the structure key we see another key
named id. By combining these pieces of information, we can then construct our
search engine friendly URL that will allow us to download and store the content of
the site as it is seen by search engines. If we use the above example on my test site
we can test whether we can access hidden content in our browser:
http://ts686680.wix.com/boutique-recruitment?_escaped_fragment_=test/rz5e9
Now that we can do this, lets create a script that will mine all pages from a Wix site
automatically for us including hidden content.
An Extra Tidbit
While reviewing all of this code, I also discovered a key called timeSincePublish which
is a timestamp value of when the owner of the site last clicked the Publish button.
This is NOT the original publish date of the site itself however. This timestamp can
Coding It Up
This is going to be pretty straightforward, we are just going to retrieve the target
domains Wix page, extract the Javascript code and walk through each URL that we
discover.
1
2
3
4
5
6
7
8
9
1
0
1
1
1
2
1
3
1
4
import
import
import
import
import
requests
json
os
argparse
sys
ap = argparse.ArgumentParser()
ap.add_argument("-d","--domain",
args = vars(ap.parse_args())
domain = args['domain']
if not os.path.exists(domain):
os.mkdir(domain)
Pretty straightforward code, we are just adding the necessary imports, putting
some argument parsing in place and we create a directory to store our results. Now
lets add the first set of requests:
1
6
1
7
1
8
1
9
2
0
2
1
2
= response.content[public_model:].find(";")
2
2
3
2
4
2
5
2
6
json_blob = response.content[public_model:public_model+end_model]
2
json_blob = json_blob.split("=",1)[1]
7
2
model
= json.loads(json_blob)
8
2
9
3
0
3
1
3
2
4
4
5
4
6
4
7
4
8
4
9
5
0
5
1
5
2
5
3
5
4
5
5
This is the final bit of code to make this badboy work, lets take a look at it:
Line 35: we are extracting the list of pages from the pageList key in
the publicModel Javascript code.
Lines 38-40: we grab the first JSON URL from the page list (38) and retrieve it
(40).
Line 45: here we are building the search engine friendly version of the page
so that we can store it as plain HTML. We are using the pageUriSEO key as
well as the pageId to build this URL.
Lines 47-55: we retrieve the HTML (47) and then store the page (50) and
store the raw JSON as well (54) for the page.
That is it! Find yourself a Wix site to target, and give it a run. I also encourage you to
test out hiding content on your own test site and see how it works.