1. If you can live with url sessions you should immediately disable it.In the portal-ext.properties file set:session.enable.url.with.session.id=false
This is how: session.enable.url.with.session.id=false in portal-ext.properties
Most of us don't need it since most of us require cookies for our sites || (don't require cookies && url sessions) I would actually recommend everyone do this. Here's an article that explains further why they suck and also shows you an alternative way to remove them if you aren't lucky enough to use liferay.
Why? Because googlebot and all the other crawlers don't support cookies so the url's to your site all end up with ;jsessionid=12356yourseoisscrewed5950 and you end up with duplicate urls because ;jsessionid=12356yourseoisscrewed5950 != ;jsessionid=6079messedupseo459056.
2. If you need them, you need to set up mod_rewrites for them immediately
I can't give a full answer for this one since I've never done it. But I did read alot of stuff so here are some suggestions...
Make a RewriteCond for Googlebot, MSNBot and all the other ones you can think of.
Then a RewriteRule that removes the jsessionid...
I'm not sure the what the answer is really... if you know leave a comment.
3. If you've messed up and trying to recover from your mistake (same boat as me)
First read this article. Then do what they say =). It took me awhile to figure it out since I didn't know apache that well because I was trying to do everything in http.conf. I had to do them in my virtual hosts conf file before it worked because I have my server running another site (maybe I can explain another day what I learned from that site).
You are basically writing a mod_rewrite that tells the crawlers that all urls with jsessionid's appended to them have moved and they will take them off the list of urls for your site. This is good because you won't have duplicate urls which makes them think you have duplicate content.
So there you go...
If someone can tell me how to use robots.txt for virtual hosts.. or even setting it for just one server since all my sites will use the same robot.txt And I did read the wiki... just need more explanations =)
---Liferay's wiki
No comments:
Post a Comment