Class: UrlPrivacy
- Inherits:
-
Object
- Object
- UrlPrivacy
- Defined in:
- lib/url_privacy.rb
Overview
Usage:
UrlPrivacy.clean(url)
Constant Summary collapse
- TRACKING_PARAMS =
Remove these params from URLs. Taken from Neat URL and CleanURLs plus some others manually found.
%w[pf_rd_*@imdb.com [email protected] gclid ref terminal_id igshid tracking_id action_object_map action_type_map action_ref_map spm@*.aliexpress.com scm@*.aliexpress.com aff_platform aff_trace_key algo_expid@*.aliexpress.* algo_pvid@*.aliexpress.* btsid ws_ab_test pd_rd_*@amazon.* _encoding@amazon.* psc@amazon.* tag@amazon.* ref_@amazon.* pf_rd_*@amazon.* pf@amazon.* qid@amazon.* sr@amazon.* srs@amazon.* __mk_*@amazon.* spIA@amazon.* ms3_c@amazon.* ie*@amazon.* refRID@amazon.* colid@amazon.* coliid@amazon.* *adId@amazon.* qualifier@amazon.* _encoding@amazon.* smid@amazon.* field-lbr_brands_browse-bin@amazon.* ved@google.* bi*@google.* gfe_*@google.* ei@google.* source@google.* gs_*@google.* site@google.* oq@google.* esrc@google.* uact@google.* cd@google.* cad@google.* gws_*@google.* atyp@google.* vet@google.* zx@google.* _u@google.* je@google.* dcr@google.* ie@google.* sei@google.* sa@google.* dpr@google.* hl@google.* btn*@google.* sa@google.* usg@google.* cd@google.* cad@google.* uact@google.* [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]* ath*@walmart.com* utm_* ga_source ga_medium ga_term ga_content ga_campaign ga_place yclid _openstat fb_action_ids fb_action_types fb_source fb_ref fbclid action_object_map action_type_map action_ref_map gs_l mkt_tok hmb_campaign hmb_medium hmb_source ref ref_ ref_*@twitter.com [email protected] trackId@netflix.* tctx@netflix.* jb*@netflix.* [email protected] [email protected] [email protected] [email protected] guce_referrer_*@techcrunch.com [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] tt_medium@twitch.* tt_content@twitch.* [email protected] [email protected] [email protected] [email protected] *[email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] _trkparms@ebay.* _trksid@ebay.* _from@ebay.* [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] mkt_tok trk trkCampaign ga_* gclid gclsrc hmb_campaign hmb_medium hmb_source spReportId spJobID spUserID spMailingID itm_* s_cid elqTrackId elqTrack assetType assetId recipientId campaignId siteId mc_cid mc_eid pk_* sc_campaign sc_channel sc_content sc_medium sc_outcome sc_geo sc_country utm_* nr_email_referer vero_conv vero_id yclid _openstat mbid cmpid cid c_id campaign_id Campaign hash@ebay.* fb_action_ids fb_action_types fb_ref fb_source fbclid [email protected] [email protected] gs_l gs_lcp@google.* ved@google.* ei@google.* sei@google.* gws_rd@google.* gs_gbg@google.* gs_mss@google.* gs_rn@google.* _hsenc _hsmi __hssc __hstc hsCtaTracking [email protected] [email protected] tt_medium tt_content lr@yandex.* redircnt@yandex.* [email protected] [email protected] wt_zmc source@google.* iflsig@google.* sclient@google.* [email protected] [email protected] [email protected] [email protected] hc_*@facebook.com *ref*@facebook.com [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]].uniq.freeze
Class Method Summary collapse
-
.clean(url) ⇒ String
Clean the given URL.
Class Method Details
.clean(url) ⇒ String
Clean the given URL. If the URL can’t be parsed, returns the URL unmodified.
Caches in case there’re duplicates.
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/url_privacy.rb', line 81 def clean(url) @cleaned_urls ||= {} @cleaned_urls[url] ||= begin uri = URI(url) if uri.query && uri.hostname hostname = uri.hostname.sub(/\Awww\./, '') params = URI.decode_www_form(uri.query).to_h # Remove params by name first params.reject! do |param, _| TRACKING_PARAMS.include? param end # Remove params with globs params.reject! do |param, _| simple_tracking_params.any? do |pattern_param| File.fnmatch(pattern_param, param) end end # Remove params matching by hostname and then param params.reject! do |param, _| complex_tracking_params.any? do |pattern_hostname, pattern_params| next false unless File.fnmatch(pattern_hostname, hostname) pattern_params.any? do |pattern_param| File.fnmatch(pattern_param, param) end end end uri.query = URI.encode_www_form(params) end uri.to_s end rescue URI::Error @cleaned_urls[url] ||= url end |