Feed Into for Ruby

Merge multiple different data streams to a custom structure based on categories. Also easy to expand by a custom module system.


Examples

Merge multiple Streams

require 'feed_into'

channels_settings = {
    name: :blockchain,
    sym: :web,
    options: {},
    regexs: [ [ /https:\/\/your*website.com/ ] ],
    download: :general,
    mining: :rss_one,
    pre: [],
    transform: nil,
    post: [ :pre_titles ]
}

feeds = FeedInto::Group.new( 
    single: { channels: [ channels_settings ] } 
)

urls = [
    'https://your*website.com/1.xml',
    'https://your*website.com/2.xml'
]

feeds
    .analyse( items: urls )
    .merge
    .to_rss( key: :unknown )


Create .rss Categories from multiple Streams

require 'feed_into'

channels_settings = {
    name: :blockchain,
    sym: :web,
    options: {},
    regexs: [ [ /https:\/\/your*website.com/ ] ],
    download: :general,
    mining: :rss_one,
    pre: [],
    transform: nil,
    post: [ :pre_titles ]
}

feeds = FeedInto::Group.new( 
    single: { channels: [ channels_settings ] } 
)

item = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feeds
    .analyse( items: urls )
    .merge
    .to_rss_all



Table of Contents

  1. Examples
  2. Quickstart
  3. Setup
  4. Input Types
  5. Methods
  6. Structure
  7. Options
  8. Channels
  9. Contributing
  10. Limitations
  11. Credits
  12. License
  13. Code of Conduct
  14. Support my Work



Quickstart

require 'feed_into'

channels = [
    {
        name: :blockchain,
        sym: :web,
        options: {},
        regexs: [ [ /https:\/\/your*website.com/ ] ],
        download: :general,
        mining: :rss_one,
        pre: [],
        transform: nil,
        post: [ :pre_titles ]
    }
]

feed = FeedInto::Group.new( 
    single: { channels: channels } 
)

urls = [ 'https://your*website.com/1.xml' ]
feed
    .analyse( items: urls )
    .status



Setup

Add this line to your application's Gemfile:

gem 'feed_into'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install feed_into

On Rubygems:



Input Types

A valid url string is required. If you use ::Group you need to wrap your strings in an array. Consider to use a Hash Structure for best results.

FeedInto::Single

2 types of inputs are allowed String and Hash.

  • String must be a valid url.
  • Hash needs minimum an url: key with a valid url string. name: and category are optional.


A. 1. String URL

Input

cmd = 'https://your*website.com/1.xml'
feed.analyse( item: cmd )

Url must be from type String and a valid url.

Internal Transformation to:

{
    name: 'Unknown',
    url: 'https://your*website.com/1.xml',
    category: :unknown
}
Name Default Description
name: 'Unknown' Set Name of Feed. If empty or not delivered the Name will set to 'Unknown'
category: :unknown Set Category of Feed. If empty or not delivered the Category will set to :unknown

The keys name: and category are required internally. If not set by the user both will be added with the default values: "Unknown" and :unknown. See A.2. for more Informations


A.2. Hash Structure (cmd)

Struct

{
    name: String,
    url: String,
    category: Symbol
}

Example

cmd = {
    name: 'Channel 1',
    url: 'https://your*website.com/1.xml',
    category: :nft
}

feed.analyse( item: cmd )

Validation | Name | Type / Regex | Required | Default | Description | |------:|:------|:------|:------|:------| | name: | String | No | "Unknown" | Set Name of Feed. If empty or not delivered the Name will set to 'Channel 1' | | url | String and valid url | Yes | | Set url of Feed. | | category | Symbol | No | :unknown | Set Category of Feed. If empty or not delivered the Category will set to 'Channel 1' |

FeedInto::Group

2 types of Arrays are allowed: Array of String or Array of Hash.

  • Array of String must be a valid urls strings.
  • Array of Hash needs minimum an url: key with a valid url string per Hash.


B.1. Array of String

Example

cmds = [
    'https://your*website.com/1.xml',
    'https://your*website.com/2.xml'
]

feeds.analyse( items: cmds )

Validation Info see A.1.


B.2. Array of Hash (cmds)

Example

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feeds.analyse( items: cmds )

Validation Info see A.2.


Methods

The methods are split in 2 classes "Single" and "Group". Single process only one url and inherit from Single and have all methods for bulk/group processing. For more details see Structure.

FeedInto::Single

.new( modules: , options: )

Create a new Single Object to interact with.

require 'feed_into'

feed = FeedInto::Single.new( 
    modules: './a/b/c/', 
    options: {}
)

Input | Name | Type | Required | Default | Example | Description | |------:|:------|:------|:------|:------|:------| | module folder | String | No | nil | modules: './a/b/c/' | Set Module Folder path. | | options | Hash | No | {} | see #options | Set options |


.analyse( item: )

Start process of downloading, mining, modification and transforming based on your module setups.

require 'feed_into'

feed = FeedInto::Single.new( 
    modules: './a/b/c/', 
    options: {}
)

cmd = {
    name: 'Channel 1',
    url: 'https://your*website.com/1.xml',
    category: :crypto
}

feed.analyse( item: cmd )

# feed.analyse( item: 'https://your*website.com/1.xml' )

Input | Name | Type | Required | Example | Description | |------:|:------|:------|:------|:------| | item | String or Hash Structure (see Input A.2.) | Yes | item: 'https://your*website.com/1.xml' | Insert Url by String or Hash Structure |

FeedInto::Group

.new( modules:, group:, single: )

Create a new Group Object to interact with.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

Input | Name | Type | Required | Default | Example | Description | |------:|:------|:------|:------|:------|:------| | module folder | String | No | nil | modules: './a/b/c/' | Set Module Folder path. | | group | Hash | No | {} | see Options | Set group options | | single | Hash | No | {} | see Options | Set group options |

Return
Hash

.analyse( items: [], silent: false )

Start process of bulk execution.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed.analyse( items: cmds )

Input | Name | Type | Required | Default | Example | Description | |------:|:------|:------|:------|:------|:------| | items | Array of String or Array of Hash | Yes | | See Input B.1. and B.2. for examples and more details. | Set Inputs URLs | | silent | boolean | No | false | silent: false | Print status messages |

Return
Self

To return result use .to_h


.merge

Re-arrange items by category and simplify data for rss output.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .merge

Return
Self

To return result use .to_h


.to_h( type: )

Output data to string.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .merge
    .to_h( type: :analyse ) 

Input | Name | Type | Required | Default | Example | Description | |------:|:------|:------|:------|:------|:------| | type | Symbol | No | nil | :analyse or :merge | Define explizit which hash should be returned. If not set .to_h will return :merge if not nil otherwise :analyse |

Return
Hash

.to_rss( key:, silent: )

Output a .merge() category to a valid rss feed.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .merge
    .to_rss( key: :analyse ) 

Input | Name | Type | Required | Default | Example | Description | |------:|:------|:------|:------|:------|:------| | key | Symbol | Yes | nil | :nft | Only a single category will be transformed to rss. Define category here. | | silent | Boolean | No | false | | Print status messages |

Return
Hash

.to_rss_all( silent: )

Output .merge() categories to a valid rss feeds.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .merge
    .to_rss_all 

Input | Name | Type | Required | Default | Example | Description | |------:|:------|:------|:------|:------|:------| | silent | Boolean | No | false | | Print status messages |

Return
Hash

.status

Outputs useful informations about the .analyse() pipeline.

require 'feed_into'

feed = FeedInto::Group.new( 
    modules: './a/b/c/', 
    group: {},
    single: {}
)

cmds = [
    {
        name: 'Channel 1',
        url: 'https://your*website.com/1.xml',
        category: :nft
    },
    {
        name: 'Channel 2',
        url: 'https://your*website.com/2.xml',
        category: :crypto
    }
]

feed
    .analyse( items: cmds )
    .status

Input | Name | Type | Required | Default | Example | Description | |------:|:------|:------|:------|:------|:------| | silent | Boolean | No | false | | Print status messages |

Return
Hash


Structure

Class Overview

FeedInto::Single
FeedInto::Group

--> CLASS: Group
    ---------------------------------------
    |  - new( modules:, sgl:{}, grp:{} )  |
    |  - analyse( items:, silent: false ) |
    |  - merge                            |
    |  - to_h( type: nil )                |
    |  - to_rss( key: Symbol )            |
    |  - to_rss_all( silent: false )      |
    |                                     |
------> CLASS: Single                     |
    |   --------------------------------  |
    |   |  - new( modules:, opts:{} ) <---- MODULE FOLDER
    |   |  - analyse( item: )          |  |
    |   |                              |  |
    |   |   FUNCTIONS: General         |  |
    |   |   -------------------------  |  |
    |   |   |  - crl_general        |  |  |
    |   |   |   :download           |  |  |
    |   |   |   :pre_titles         |  |  |
    |   |   |   :mining_rss_one     |  |  |
    |   |   |   :mining_rss_two     |  |  |
    |   |   |   :format_url_s3      |  |  |
    |   |   |   :format_html_remove |  |  |
    |   |   -------------------------  |  |
    |   --------------------------------  |  
    ---------------------------------------

Custom Modules


    MODULE FOLDER "./a/b/c/"
    -----------------------------------------------
    |                                             |
    |   MODULE: #{Module_Name}                    |
    |    FILE:  #{module_name}.rb                 |
    |   -------------------------------------     |
    |   |  Required:                        |     |
    |   |  - crl_#{module_name}             |---  |
    |   |  - crl_#{module_name}_settings    |  |  | 
    |   |                                   |  |  | 
    |   |  Custom:                          |  |  |
    |   |  - crl_#{module_name}_custom_name |  |  |
    |   -------------------------------------  |  |
    |      |                                   |  |
    |      -------------------------------------  |
    |                                             |
    -----------------------------------------------

See Channels for more details.


Options

Options are split in 2 section: Single and Group.

  • In ::Single use .new( ... options: ) to set options.
  • In ::Group use .new( ... single:, group: ) to set options.

Example

options = {
    single: {
        format__title__symbol__vide: "🐨",
        format__title__symbol__custom: "👽"
    },
    group: {
        sleep__scores__user__value: 5,
        sleep__scores__server__value: 10
    }
}

# Single
feed = FeedInto::Single.new( 
    modules: './a/b/c/',
    options: options[:single]
)

# Group
feeds = FeedInto::Group.new( 
    modules: './a/b/c/',
    single: options[:single],
    group: options[:group]
)

FeedInto::Single

Nr Name Key Default Type Description
1. Title Symbol Video :format_titlesymbol_video "👾" String Set Symbol for Video, used in :pre_title
2. Title Symbol Custom :format_titlesymbol_custom "⚙️ " String Set Symbol for Custom, used in :pre_title
3. Title Symbol Web :format_titlesymbol_web "🤖" String Set Symbol for Web, used in :pre_title
4. Title Separator :format_title_separator `"\ "` String
5. Title More :format_title_more "..." String Used in :pre_title
6. Title Length :format_title_length 100 Integer Set a maximum length, used in :pre_title
7. Title Str :format_title_str "{{sym}} {{cmd_name__upcase}} ({{channel_name__upcase}}) {{separator}} {{title_item__titleize}}" String Set Title Structure, used in :pre_title
8. Download Agent :format_download_agent "" String Set a Agent for Header Request. Use version to generate a random version.

FeedInto::Group

Nr Name Key Default Type Description
1. Range :sleep__range 15 Integer Set how many items are relevant to calculate score for sleeping time.
2. Varieties :sleep__varieties [{:variety=>1, :sleep=>2}, {:variety=>2, :sleep=>1}, {:variety=>3, :sleep=>0.5}, {:variety=>4, :sleep=>0.25}, {:variety=>5, :sleep=>0.15}, {:variety=>6, :sleep=>0.1}] Array Set diffrent sleep times by diffrent variety levels
3. Scores Ok Value :sleep_scoresok_value 0 Integer Sleeping Time for :ok download.
4. Scores User Value :sleep_scoresuser_value 1 Integer Sleeping Time for :user download errors.
5. Scores Server Value :sleep_scoresserver_value 3 Integer Sleeping Time for :server download errors.
6. Scores Other Value :sleep_scoresother_value 0 Integer Sleeping Time for :other download errors.
7. Stages :sleep__stages [{:name=>"Default", :range=>[0, 2], :skip=>false, :sleep=>0}, {:name=>"Low", :range=>[3, 5], :skip=>false, :sleep=>2}, {:name=>"High", :range=>[6, 8], :skip=>false, :sleep=>5}, {:name=>"Stop", :range=>[9, 999], :skip=>true}] Array Set Sleep range for diffrent scores.


Channels

To recognize an url, a "channel" must be created. A channel requires a Hash which defines the pipeline for the given regex urls. You don´t need to write your own module if you use the standard components. To extend the functionalities you can write your own module and initialize by refer to your module folder.

Settings Structure

Every Channel need a Settings Structure to get recognized.

{
    name: Symbol,
    sym: Symbol,
    options: Hash,
    regexs: Nested Array,
    download: Symbol,
    mining: Symbol,
    pre: Array of Symbols,
    transform: Symbol,
    post: Array of Symbols
}
Name Type Required Example Description
name Symbol Yes :module_name Set your unique channel name as symbol class
sym Symbol Yes :web Assign a category sym to your channel. See Options for more details.
options Hash Yes { length: 23 } Set specific channel variable here
regexs Nested Array Yes [ [ /https:\/\/module_name/ ] ] To assign a given url to your channel use an Array (with multiple regexs) and wrap them in an Array. All Regexs from only one array must be true.
download Symbol Yes :general Select which 'download' method you prefer.
mining Symbol Yes :rss_one Select which 'mining' method you prefer.
pre Array Yes [] Select which 'pre' methods you prefer.
transform Symbol nil Select which 'transform' methods you prefer.
post Array Yes [ :pre_titles ] Select which 'post' methods you prefer.

Standard Components

Inject a struct with only standard components in this way. You can find more informations about the available components in Structure

require 'feed_into'

channels_settings = {
    name: :blockchain,
    sym: :web,
    options: {},
    regexs: [ [ /https:\/\/your*website.com/ ] ],
    download: :general,
    mining: :rss_one,
    pre: [],
    transform: nil,
    post: [ :pre_titles ]
}

feeds = FeedInto::Group.new( 
    single: { channels: [ channels_settings ] } 
)

feeds.analyse( items: [ 'https://your*website.com/1.xml' ] )

# feed = FeedInto::Single.new( 
#     options: { channels: struct } 
# )
# feed.analyse( item: 'https://your*website.com/1.xml' )

Custom Components

For custom functionalities you need to define a Module. Use the following boilerplate for a quickstart. Please note:

  • Every function name starts with the prefix 'crl_'
  • The channel will be automatically initialized by search for 'crl_module_name_settings'.
  • Every pipeline contains five stages download, mining, pre, transform, post.
  • The interaction with your Module is only over the function crl_module_name. Delegate the traffic by a case statement.
  • For later tasks you should give back a least :title, :url and [:time][:stamp].

Step 1: Create Module

./path/module_name.rb

module ModuleName
  def crl_module_name( sym, cmd, channel, response, data, obj )
    messages = []

    case sym
      when :settings
        data = crl_module_name_settings()
      when :transform
        data = crl_module_name_transform( data, obj, cmd, channel )
    else
      messages.push( "module_name: #{sym} not found." )
    end

    return data, messages
  end


  private


  def crl_module_name_settings()
    {
      name: :module_name,
      sym: :video,
      options: {},
      regexs: [ [ /www.module_name.com/, /www.module_name.com/ ] ],
      download: :general,
      mining: :rss_two,
      pre: [],
      transform: :self,
      post: [ :pre_titles ]
    }
  end


  def crl_module_name_transform( data, obj, cmd, channel )
    data[:items] = data[:items].map do | item |
        item = {
            title: '',
            time: { stamp: 1632702548 },
            url: 'https://....'
        }
    end
    return data
  end
end

Step 2: Initialize Module

require 'feed_into'

feeds = FeedInto::Group.new( 
    modules: './path/'
)

feeds
    .analyse( items: [ 'module_name.com/rss' ] )
    .merge
    .rss_to_all


Contributing

Bug reports and pull requests are welcome on GitHub at https:https://raw.githubusercontent.com/feed-into-for-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.


Limitations

  • Proof of Concept, not battle-tested.

Credits

This gem use following gems:


License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the feed-into-for-ruby project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

Star Us

Please ⭐️ star this Project, every ⭐️ star makes us very happy!