Merge multiple different data streams to a custom structure based on categories. Also easy to expand by a custom module system.
Merge multiple Streams
require 'feed_into'
channels_settings = {
name: :blockchain,
sym: :web,
options: {},
regexs: [ [ /https:\/\/your*website.com/ ] ],
download: :general,
mining: :rss_one,
pre: [],
transform: nil,
post: [ :pre_titles ]
}
feeds = FeedInto::Group.new(
single: { channels: [ channels_settings ] }
)
urls = [
'https://your*website.com/1.xml',
'https://your*website.com/2.xml'
]
feeds
.analyse( items: urls )
.merge
.to_rss( key: :unknown )
Create .rss Categories from multiple Streams
require 'feed_into'
channels_settings = {
name: :blockchain,
sym: :web,
options: {},
regexs: [ [ /https:\/\/your*website.com/ ] ],
download: :general,
mining: :rss_one,
pre: [],
transform: nil,
post: [ :pre_titles ]
}
feeds = FeedInto::Group.new(
single: { channels: [ channels_settings ] }
)
item = [
{
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :nft
},
{
name: 'Channel 2',
url: 'https://your*website.com/2.xml',
category: :crypto
}
]
feeds
.analyse( items: urls )
.merge
.to_rss_all
- Examples
- Quickstart
- Setup
- Input Types
- Methods
- Structure
- Options
- Channels
- Contributing
- Limitations
- Credits
- License
- Code of Conduct
- Support my Work
require 'feed_into'
channels = [
{
name: :blockchain,
sym: :web,
options: {},
regexs: [ [ /https:\/\/your*website.com/ ] ],
download: :general,
mining: :rss_one,
pre: [],
transform: nil,
post: [ :pre_titles ]
}
]
feed = FeedInto::Group.new(
single: { channels: channels }
)
urls = [ 'https://your*website.com/1.xml' ]
feed
.analyse( items: urls )
.status
Add this line to your application's Gemfile:
gem 'feed_into'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install feed_into
On Rubygems:
A valid url string is required. If you use ::Group
you need to wrap your strings in an array. Consider to use a Hash Structure
for best results.
FeedInto::Single
2 types of inputs are allowed String
and Hash
.
String
must be a valid url.Hash
needs minimum anurl:
key with a valid url string.name:
andcategory
are optional.
A. 1. String URL
Input
cmd = 'https://your*website.com/1.xml'
feed.analyse( item: cmd )
Url must be from type String
and a valid url
.
Internal Transformation to:
{
name: 'Unknown',
url: 'https://your*website.com/1.xml',
category: :unknown
}
Name | Default | Description |
---|---|---|
name: | 'Unknown' | Set Name of Feed. If empty or not delivered the Name will set to 'Unknown' |
category: | :unknown | Set Category of Feed. If empty or not delivered the Category will set to :unknown |
The keys name:
and category
are required internally. If not set by the user both will be added with the default values: "Unknown" and :unknown. See A.2. for more Informations
A.2. Hash Structure
(cmd)
Struct
{
name: String,
url: String,
category: Symbol
}
Example
cmd = {
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :nft
}
feed.analyse( item: cmd )
Validation
| Name | Type / Regex | Required | Default | Description |
|------:|:------|:------|:------|:------|
| name: | String
| No | "Unknown" | Set Name of Feed. If empty or not delivered the Name will set to 'Channel 1' |
| url | String
and valid url
| Yes | | Set url of Feed. |
| category | Symbol
| No | :unknown | Set Category of Feed. If empty or not delivered the Category will set to 'Channel 1' |
FeedInto::Group
2 types of Arrays are allowed: Array of String
or Array of Hash
.
Array of String
must be a valid urls strings.Array of Hash
needs minimum anurl:
key with a valid url string per Hash.
B.1. Array of String
Example
cmds = [
'https://your*website.com/1.xml',
'https://your*website.com/2.xml'
]
feeds.analyse( items: cmds )
Validation Info see A.1.
B.2. Array of Hash
(cmds)
Example
cmds = [
{
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :nft
},
{
name: 'Channel 2',
url: 'https://your*website.com/2.xml',
category: :crypto
}
]
feeds.analyse( items: cmds )
Validation Info see A.2.
The methods are split in 2 classes "Single" and "Group". Single process only one url and inherit from Single and have all methods for bulk/group processing. For more details see Structure.
FeedInto::Single
.new( modules: , options: )
Create a new Single Object to interact with.
require 'feed_into'
feed = FeedInto::Single.new(
modules: './a/b/c/',
options: {}
)
Input
| Name | Type | Required | Default | Example | Description |
|------:|:------|:------|:------|:------|:------|
| module folder | String
| No | nil
| modules: './a/b/c/'
| Set Module Folder path. |
| options | Hash
| No | {}
| see #options | Set options |
.analyse( item: )
Start process of downloading, mining, modification and transforming based on your module setups.
require 'feed_into'
feed = FeedInto::Single.new(
modules: './a/b/c/',
options: {}
)
cmd = {
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :crypto
}
feed.analyse( item: cmd )
# feed.analyse( item: 'https://your*website.com/1.xml' )
Input
| Name | Type | Required | Example | Description |
|------:|:------|:------|:------|:------|
| item | String
or Hash Structure
(see Input A.2.) | Yes | item: 'https://your*website.com/1.xml' | Insert Url by String or Hash Structure |
FeedInto::Group
.new( modules:, group:, single: )
Create a new Group Object to interact with.
require 'feed_into'
feed = FeedInto::Group.new(
modules: './a/b/c/',
group: {},
single: {}
)
Input
| Name | Type | Required | Default | Example | Description |
|------:|:------|:------|:------|:------|:------|
| module folder | String
| No | nil
| modules: './a/b/c/'
| Set Module Folder path. |
| group | Hash
| No | {}
| see Options | Set group options |
| single | Hash
| No | {}
| see Options | Set group options |
Return
Hash
.analyse( items: [], silent: false )
Start process of bulk execution.
require 'feed_into'
feed = FeedInto::Group.new(
modules: './a/b/c/',
group: {},
single: {}
)
cmds = [
{
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :nft
},
{
name: 'Channel 2',
url: 'https://your*website.com/2.xml',
category: :crypto
}
]
feed.analyse( items: cmds )
Input
| Name | Type | Required | Default | Example | Description |
|------:|:------|:------|:------|:------|:------|
| items | Array of String
or Array of Hash
| Yes | | See Input B.1. and B.2. for examples and more details. | Set Inputs URLs |
| silent | boolean
| No | false
| silent: false | Print status messages |
Return
Self
To return result use
.to_h
.merge
Re-arrange items by category and simplify data for rss output.
require 'feed_into'
feed = FeedInto::Group.new(
modules: './a/b/c/',
group: {},
single: {}
)
cmds = [
{
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :nft
},
{
name: 'Channel 2',
url: 'https://your*website.com/2.xml',
category: :crypto
}
]
feed
.analyse( items: cmds )
.merge
Return
Self
To return result use
.to_h
.to_h( type: )
Output data to string.
require 'feed_into'
feed = FeedInto::Group.new(
modules: './a/b/c/',
group: {},
single: {}
)
cmds = [
{
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :nft
},
{
name: 'Channel 2',
url: 'https://your*website.com/2.xml',
category: :crypto
}
]
feed
.analyse( items: cmds )
.merge
.to_h( type: :analyse )
Input
| Name | Type | Required | Default | Example | Description |
|------:|:------|:------|:------|:------|:------|
| type | Symbol
| No | nil
| :analyse
or :merge
| Define explizit which hash should be returned. If not set .to_h will return :merge
if not nil otherwise :analyse
|
Return
Hash
.to_rss( key:, silent: )
Output a .merge()
category to a valid rss feed.
require 'feed_into'
feed = FeedInto::Group.new(
modules: './a/b/c/',
group: {},
single: {}
)
cmds = [
{
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :nft
},
{
name: 'Channel 2',
url: 'https://your*website.com/2.xml',
category: :crypto
}
]
feed
.analyse( items: cmds )
.merge
.to_rss( key: :analyse )
Input
| Name | Type | Required | Default | Example | Description |
|------:|:------|:------|:------|:------|:------|
| key | Symbol
| Yes | nil
| :nft | Only a single category will be transformed to rss. Define category here. |
| silent | Boolean
| No | false
| | Print status messages |
Return
Hash
.to_rss_all( silent: )
Output .merge()
categories to a valid rss feeds.
require 'feed_into'
feed = FeedInto::Group.new(
modules: './a/b/c/',
group: {},
single: {}
)
cmds = [
{
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :nft
},
{
name: 'Channel 2',
url: 'https://your*website.com/2.xml',
category: :crypto
}
]
feed
.analyse( items: cmds )
.merge
.to_rss_all
Input
| Name | Type | Required | Default | Example | Description |
|------:|:------|:------|:------|:------|:------|
| silent | Boolean
| No | false
| | Print status messages |
Return
Hash
.status
Outputs useful informations about the .analyse()
pipeline.
require 'feed_into'
feed = FeedInto::Group.new(
modules: './a/b/c/',
group: {},
single: {}
)
cmds = [
{
name: 'Channel 1',
url: 'https://your*website.com/1.xml',
category: :nft
},
{
name: 'Channel 2',
url: 'https://your*website.com/2.xml',
category: :crypto
}
]
feed
.analyse( items: cmds )
.status
Input
| Name | Type | Required | Default | Example | Description |
|------:|:------|:------|:------|:------|:------|
| silent | Boolean
| No | false
| | Print status messages |
Return
Hash
Class Overview
FeedInto::Single
FeedInto::Group
--> CLASS: Group
---------------------------------------
| - new( modules:, sgl:{}, grp:{} ) |
| - analyse( items:, silent: false ) |
| - merge |
| - to_h( type: nil ) |
| - to_rss( key: Symbol ) |
| - to_rss_all( silent: false ) |
| |
------> CLASS: Single |
| -------------------------------- |
| | - new( modules:, opts:{} ) <---- MODULE FOLDER
| | - analyse( item: ) | |
| | | |
| | FUNCTIONS: General | |
| | ------------------------- | |
| | | - crl_general | | |
| | | :download | | |
| | | :pre_titles | | |
| | | :mining_rss_one | | |
| | | :mining_rss_two | | |
| | | :format_url_s3 | | |
| | | :format_html_remove | | |
| | ------------------------- | |
| -------------------------------- |
---------------------------------------
Custom Modules
MODULE FOLDER "./a/b/c/"
-----------------------------------------------
| |
| MODULE: #{Module_Name} |
| FILE: #{module_name}.rb |
| ------------------------------------- |
| | Required: | |
| | - crl_#{module_name} |--- |
| | - crl_#{module_name}_settings | | |
| | | | |
| | Custom: | | |
| | - crl_#{module_name}_custom_name | | |
| ------------------------------------- | |
| | | |
| ------------------------------------- |
| |
-----------------------------------------------
See Channels for more details.
Options are split in 2 section: Single and Group.
- In
::Single
use.new( ... options: )
to set options. - In
::Group
use.new( ... single:, group: )
to set options.
Example
= {
single: {
format__title__symbol__vide: "🐨",
format__title__symbol__custom: "👽"
},
group: {
sleep__scores__user__value: 5,
sleep__scores__server__value: 10
}
}
# Single
feed = FeedInto::Single.new(
modules: './a/b/c/',
options: [:single]
)
# Group
feeds = FeedInto::Group.new(
modules: './a/b/c/',
single: [:single],
group: [:group]
)
FeedInto::Single
Nr | Name | Key | Default | Type | Description |
---|---|---|---|---|---|
1. | Title Symbol Video | :format_titlesymbol_video | "👾" |
String | Set Symbol for Video, used in :pre_title |
2. | Title Symbol Custom | :format_titlesymbol_custom | "⚙️ " |
String | Set Symbol for Custom, used in :pre_title |
3. | Title Symbol Web | :format_titlesymbol_web | "🤖" |
String | Set Symbol for Web, used in :pre_title |
4. | Title Separator | :format_title_separator | `"\ | "` | String |
5. | Title More | :format_title_more | "..." |
String | Used in :pre_title |
6. | Title Length | :format_title_length | 100 |
Integer | Set a maximum length, used in :pre_title |
7. | Title Str | :format_title_str | "{{sym}} {{cmd_name__upcase}} ({{channel_name__upcase}}) {{separator}} {{title_item__titleize}}" |
String | Set Title Structure, used in :pre_title |
8. | Download Agent | :format_download_agent | "" |
String | Set a Agent for Header Request. Use version to generate a random version. |
FeedInto::Group
Nr | Name | Key | Default | Type | Description |
---|---|---|---|---|---|
1. | Range | :sleep__range | 15 |
Integer | Set how many items are relevant to calculate score for sleeping time. |
2. | Varieties | :sleep__varieties | [{:variety=>1, :sleep=>2}, {:variety=>2, :sleep=>1}, {:variety=>3, :sleep=>0.5}, {:variety=>4, :sleep=>0.25}, {:variety=>5, :sleep=>0.15}, {:variety=>6, :sleep=>0.1}] |
Array | Set diffrent sleep times by diffrent variety levels |
3. | Scores Ok Value | :sleep_scoresok_value | 0 |
Integer | Sleeping Time for :ok download. |
4. | Scores User Value | :sleep_scoresuser_value | 1 |
Integer | Sleeping Time for :user download errors. |
5. | Scores Server Value | :sleep_scoresserver_value | 3 |
Integer | Sleeping Time for :server download errors. |
6. | Scores Other Value | :sleep_scoresother_value | 0 |
Integer | Sleeping Time for :other download errors. |
7. | Stages | :sleep__stages | [{:name=>"Default", :range=>[0, 2], :skip=>false, :sleep=>0}, {:name=>"Low", :range=>[3, 5], :skip=>false, :sleep=>2}, {:name=>"High", :range=>[6, 8], :skip=>false, :sleep=>5}, {:name=>"Stop", :range=>[9, 999], :skip=>true}] |
Array | Set Sleep range for diffrent scores. |
To recognize an url, a "channel" must be created. A channel requires a Hash
which defines the pipeline for the given regex urls. You don´t need to write your own module if you use the standard components. To extend the functionalities you can write your own module and initialize by refer to your module folder.
Settings Structure
Every Channel need a Settings Structure to get recognized.
{
name: Symbol,
sym: Symbol,
options: Hash,
regexs: Nested Array,
download: Symbol,
mining: Symbol,
pre: Array of Symbols,
transform: Symbol,
post: Array of Symbols
}
Name | Type | Required | Example | Description |
---|---|---|---|---|
name | Symbol |
Yes | :module_name |
Set your unique channel name as symbol class |
sym | Symbol |
Yes | :web |
Assign a category sym to your channel. See Options for more details. |
options | Hash |
Yes | { length: 23 } |
Set specific channel variable here |
regexs | Nested Array |
Yes | [ [ /https:\/\/module_name/ ] ] |
To assign a given url to your channel use an Array (with multiple regexs) and wrap them in an Array. All Regexs from only one array must be true. |
download | Symbol |
Yes | :general |
Select which 'download' method you prefer. |
mining | Symbol |
Yes | :rss_one |
Select which 'mining' method you prefer. |
pre | Array |
Yes | [] |
Select which 'pre' methods you prefer. |
transform | Symbol |
nil |
Select which 'transform' methods you prefer. | |
post | Array |
Yes | [ :pre_titles ] |
Select which 'post' methods you prefer. |
Standard Components
Inject a struct with only standard components in this way. You can find more informations about the available components in Structure
require 'feed_into'
channels_settings = {
name: :blockchain,
sym: :web,
options: {},
regexs: [ [ /https:\/\/your*website.com/ ] ],
download: :general,
mining: :rss_one,
pre: [],
transform: nil,
post: [ :pre_titles ]
}
feeds = FeedInto::Group.new(
single: { channels: [ channels_settings ] }
)
feeds.analyse( items: [ 'https://your*website.com/1.xml' ] )
# feed = FeedInto::Single.new(
# options: { channels: struct }
# )
# feed.analyse( item: 'https://your*website.com/1.xml' )
Custom Components
For custom functionalities you need to define a Module. Use the following boilerplate for a quickstart. Please note:
- Every function name starts with the prefix 'crl_'
- The channel will be automatically initialized by search for 'crl_module_name_settings'.
- Every pipeline contains five stages
download
,mining
,pre
,transform
,post
. - The interaction with your Module is only over the function
crl_module_name
. Delegate the traffic by a case statement. - For later tasks you should give back a least
:title
,:url
and[:time][:stamp]
.
Step 1: Create Module
./path/module_name.rb
module ModuleName
def crl_module_name( sym, cmd, channel, response, data, obj )
= []
case sym
when :settings
data = crl_module_name_settings()
when :transform
data = crl_module_name_transform( data, obj, cmd, channel )
else
.push( "module_name: #{sym} not found." )
end
return data,
end
private
def crl_module_name_settings()
{
name: :module_name,
sym: :video,
options: {},
regexs: [ [ /www.module_name.com/, /www.module_name.com/ ] ],
download: :general,
mining: :rss_two,
pre: [],
transform: :self,
post: [ :pre_titles ]
}
end
def crl_module_name_transform( data, obj, cmd, channel )
data[:items] = data[:items].map do | item |
item = {
title: '',
time: { stamp: 1632702548 },
url: 'https://....'
}
end
return data
end
end
Step 2: Initialize Module
require 'feed_into'
feeds = FeedInto::Group.new(
modules: './path/'
)
feeds
.analyse( items: [ 'module_name.com/rss' ] )
.merge
.rss_to_all
Bug reports and pull requests are welcome on GitHub at https:https://raw.githubusercontent.com/feed-into-for-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
- Proof of Concept, not battle-tested.
This gem use following gems:
The gem is available as open source under the terms of the MIT License.
Everyone interacting in the feed-into-for-ruby project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.
Please ⭐️ star this Project, every ⭐️ star makes us very happy!