Class: UrlPrivacy

Inherits:
Object
  • Object
show all
Defined in:
lib/url_privacy.rb

Overview

Usage:

UrlPrivacy.clean(url)

Constant Summary collapse

TRACKING_PARAMS =

Remove these params from URLs. Taken from Neat URL and CleanURLs plus some others manually found.

%w[pf_rd_*@imdb.com [email protected] gclid ref
terminal_id igshid tracking_id action_object_map action_type_map
action_ref_map spm@*.aliexpress.com scm@*.aliexpress.com
aff_platform aff_trace_key algo_expid@*.aliexpress.*
algo_pvid@*.aliexpress.* btsid ws_ab_test pd_rd_*@amazon.*
_encoding@amazon.* psc@amazon.* tag@amazon.* ref_@amazon.*
pf_rd_*@amazon.* pf@amazon.* qid@amazon.* sr@amazon.*
srs@amazon.* __mk_*@amazon.* spIA@amazon.* ms3_c@amazon.*
ie*@amazon.* refRID@amazon.* colid@amazon.* coliid@amazon.*
*adId@amazon.* qualifier@amazon.* _encoding@amazon.*
smid@amazon.* field-lbr_brands_browse-bin@amazon.* ved@google.*
bi*@google.* gfe_*@google.* ei@google.* source@google.*
gs_*@google.* site@google.* oq@google.* esrc@google.*
uact@google.* cd@google.* cad@google.* gws_*@google.*
atyp@google.* vet@google.* zx@google.* _u@google.* je@google.*
dcr@google.* ie@google.* sei@google.* sa@google.* dpr@google.*
hl@google.* btn*@google.* sa@google.* usg@google.* cd@google.*
cad@google.* uact@google.* [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
[email protected]* ath*@walmart.com* utm_* ga_source ga_medium
ga_term ga_content ga_campaign ga_place yclid _openstat
fb_action_ids fb_action_types fb_source fb_ref fbclid
action_object_map action_type_map action_ref_map gs_l mkt_tok
hmb_campaign hmb_medium hmb_source ref ref_ ref_*@twitter.com
[email protected] trackId@netflix.* tctx@netflix.* jb*@netflix.*
[email protected] [email protected] [email protected]
[email protected] guce_referrer_*@techcrunch.com
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected]
tt_medium@twitch.* tt_content@twitch.* [email protected]
[email protected] [email protected] [email protected]
*[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] _trkparms@ebay.*
_trksid@ebay.* _from@ebay.* [email protected] [email protected]
[email protected] [email protected] [email protected] [email protected] [email protected]
[email protected] [email protected] mkt_tok trk trkCampaign ga_* gclid
gclsrc hmb_campaign hmb_medium hmb_source spReportId spJobID
spUserID spMailingID itm_* s_cid elqTrackId elqTrack assetType
assetId recipientId campaignId siteId mc_cid mc_eid pk_*
sc_campaign sc_channel sc_content sc_medium sc_outcome sc_geo
sc_country utm_* nr_email_referer vero_conv vero_id yclid
_openstat mbid cmpid cid c_id campaign_id Campaign hash@ebay.*
fb_action_ids fb_action_types fb_ref fb_source fbclid
[email protected] [email protected] gs_l gs_lcp@google.*
ved@google.* ei@google.* sei@google.* gws_rd@google.*
gs_gbg@google.* gs_mss@google.* gs_rn@google.* _hsenc _hsmi
__hssc __hstc hsCtaTracking [email protected]
[email protected] tt_medium tt_content lr@yandex.*
redircnt@yandex.* [email protected] [email protected] wt_zmc
source@google.* iflsig@google.* sclient@google.*
[email protected] [email protected] [email protected]
[email protected] hc_*@facebook.com *ref*@facebook.com
[email protected] [email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected]].uniq.freeze

Class Method Summary collapse

Class Method Details

.clean(url) ⇒ String

Clean the given URL. If the URL can’t be parsed, returns the URL unmodified.

Caches in case there’re duplicates.

Parameters:

  • (String)

Returns:

  • (String)


81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/url_privacy.rb', line 81

def clean(url)
  @cleaned_urls ||= {}
  @cleaned_urls[url] ||= begin
    uri = URI(url)

    if uri.query && uri.hostname
      hostname = uri.hostname.sub(/\Awww\./, '')
      params = URI.decode_www_form(uri.query).to_h

      # Remove params by name first
      params.reject! do |param, _|
        TRACKING_PARAMS.include? param
      end

      # Remove params with globs
      params.reject! do |param, _|
        simple_tracking_params.any? do |pattern_param|
          File.fnmatch(pattern_param, param)
        end
      end

      # Remove params matching by hostname and then param
      params.reject! do |param, _|
        complex_tracking_params.any? do |pattern_hostname, pattern_params|
          next false unless File.fnmatch(pattern_hostname, hostname)

          pattern_params.any? do |pattern_param|
            File.fnmatch(pattern_param, param)
          end
        end
      end

      uri.query = URI.encode_www_form(params)
    end

    uri.to_s
  end
rescue URI::Error
  @cleaned_urls[url] ||= url
end