Jump to content
TNG Community
Rob Severijns

Security / Privacy Question

Recommended Posts

Rob Severijns

Hi all,

My genealogy website is private and for now I want to keep it that way.

I'm trying to prevent robots/crawlers etc from indexing, following and using my data.

In order to achieve that I was thinking of modifying my .htaccess file by adding the followig:

<FilesMatch ".(docx|pdf|png|jpeg|jpg|gif|mp4)$">
    Header add X-robots-tag "noindex, nofollow, noimageindex, noarchive, nocache, notranslate, nosnippet, noyaca"
</FilesMatch>
  1. Is this the correct way to do it?
  2. Do I still need to adjust my robots.txt / tngrobots.txt file too?
  3. I suppose that if the robots.txt / tngrobots.txt are set to follow and index this is blocked by the .htaccess settings or am I wrong?
  4. Is it still usefull to adjust my genlib.php with:
<meta name="robots" content="noindex,nofollow" />

or a slightly different code?

I'm also using rel=”nofollow” in my external hyperlinks

<a rel=”nofollow” href=”https://www.example.nl/” >Example</a>

in cases where I need to link, but don’t want to be associated with the link target.

I have read the articles in Security too and that's why I was looking into this.

I know these are a lot of questions but hopefully relevant to other users too.

Any comments and/or advise is welcome here.

Rob

 

Share this post


Link to post
Share on other sites
Rob Severijns

I asked my ISP the same questions and await their answers.

None the less any reaction from forum members is welcome.

Rob

Share this post


Link to post
Share on other sites
Ken Roy

Rob,

Currently your site requires a login, so robots and crawlers cannot index your site. 

Are you looking at removing the login requirement?  What makes you think the robots and crawlers are indexing your site?

Share this post


Link to post
Share on other sites
Rob Severijns

Hi Ken,

Thank you for replying.

Since I'm not an expert on security & privacy I'm asking the above questions.

As far as I know websites are always being indexed, followed and subject to directory listing by search-engines like google, Facebook and other webcralers unless we specificaly tell them not to.

Isn't that why the robots.txt file has the 

User-agent: *
Disallow: /

options in it and isn't that why Darrin advises to put 

<meta name="robots" content="noindex,nofollow" />

in our genlib.php if we don't want to be indexed or followed?

Isn't it also true that TNG has Search-Engine Optimizations integrated in it's code?

If the TNG login and the bot-trap mod are enough to stop being indexed, followed etc that would be great but I'm not sure of that.

I'm not considering removing the login atm.

All in all I wanted to know more about security and privacy protection and verify if I'm on the right track.

This info would benifit not just me but other forum members too.

Rob

 

Share this post


Link to post
Share on other sites
Ken Roy

Rob,

I am not a web system administrator,  I worked on the IBM mainframe z/OS operating system before i retired

As far as I know the web bots and crawlers are no any smarter than I am in trying to access your web site.  If I am prompted for login, I think that stops the bots as well since they cannot go from your login screen to some place else on your site.

 

Share this post


Link to post
Share on other sites
theKiwi
34 minutes ago, Rob Severijns said:

Since I'm not an expert on security & privacy I'm asking the above questions.

As far as I know websites are always being indexed, followed and subject to directory listing by search-engines like google, Facebook and other webcralers unless we specificaly tell them not to.

Isn't that why the robots.txt file has the 

User-agent: *
Disallow: /

options in it and isn't that why Darrin advises to put 


<meta name="robots" content="noindex,nofollow" />

in our genlib.php if we don't want to be indexed or followed?

 

All of this is secondary to the requirement that you site can only be visited by people who log in to the site - the robots and crawlers can not get into a site that requires a password.

You can find out what is indexed by opening a new browser window and type in 

site: your_site_url

and Google will tell you how many pages are indexed - like this for my site

image.png

which actually shows a depressingly small number at the moment...

Roger

Share this post


Link to post
Share on other sites
tngrlkrz
2 hours ago, theKiwi said:

depressingly small number at the moment...

Yeah, my result is similar: 

image.png

Share this post


Link to post
Share on other sites
Rob Severijns

Thank you all for the feedback.

Glad to see my results 😁 (No results at all)

image.png

Guess I'm on thhe right track with my privacy settings.

By the way, there is no difference between a .htaccess with or without the code

<FilesMatch ".(docx|pdf|png|jpeg|jpg|gif|mp4)$">
Header add X-robots-tag "noindex, nofollow, noimageindex, noarchive, nocache, nosnippet, noyaca"
</FilesMatch>

but with a robots.txt containing

User-agent: *
Disallow: /

just to be sure I will put the code

<meta name=”robots” content=”noindex, nofollow>

in my genlib.php.

Thx again,

Rob

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×