Class: Hadupils::Extensions::Hive
- Inherits:
-
Object
- Object
- Hadupils::Extensions::Hive
- Includes:
- AuxJarsPath
- Defined in:
- lib/hadupils/extensions/hive.rb
Overview
Hive-targeted extensions derived from filesystem layout
Concept
There are a few ways to “extend” one’s hive session:
-
Adding files, archives, jars to it (+ADD …+).
-
Setting variables and whatnot (+SET …+).
-
Registering your own UDFS.
-
Specifying paths to jars to make available within the session’s classpath (
HIVE_AUX_JARS_PATH
env. var.).
All of these things can be done through the use of initialization files (via hive’s -i
option), except for the auxiliary jar libs environment variable (which is.… wait for it… in the environment).
This class provides an abstraction to enable the following:
-
lay your files out according to its expectations
-
wrap that layout with an instance of this class
-
it’ll give an interface for accessing initialization files (#hivercs) that make the stuff available in a hive session
-
it’ll dynamically assemble the initialization file necessary to ensure appropriate assets are made available in the session
-
if you provide your own initialization file in the expected place, it’ll ensure that the dynamic stuff is applied first and the static one second, such that your static one can assume the neighboring assets are already in the session.
-
it’ll give you a list of jars to make available as auxiliary_jars in the session based on contents of
aux-jars
.
You lay it down, the object makes sense of it, nothing other than file organization required.
Filesystem Layout
Suppose you have the following stuff (denoting symlinks with ->
):
/etc/foo/
an.archive.tar.gz
another.archive.tar.gz
aux-jars/
aux-only.jar
ignored.archive.tar.gz
ignored.file.txt
jarry.jar -> ../jarry.jar
dist-only.jar
hiverc
jarry.jar
textie.txt
yummy.yaml
Now you create an instance:
ext = Hadupils::Extensions::Hive.new('/etc/foo')
You could get the hive command-line options for using this stuff via:
ext.hivercs
It’ll give you objects for two initialization files:
-
A dynamic one that has the appropriate commands for adding
an.archive.tar.gz
,another.archive.tar.gz
,dist-only.jar
,jarry.jar
,textie.txt
, andyummy.yaml
to the session. -
The
hiverc
one that’s in there.
And, the ext.auxiliary_jars
accessor will return a list of paths to the jars (only the jars) contained within the aux-jars
path; a caller to hive would use this to construct the HIVE_AUX_JARS_PATH
variable.
Notice that jarry.jar
is common to the distributed usage (it’ll be added to the session and associated distributed cache) and to the auxiliary path. That’s because it appears in the main directory and in the aux-jars
subdirectory. There’s nothing magical about the use of a symlink; that just saves disk space. 10 MB ought be enough for anyone.
If there was no hiverc
file, then you would only get the initialization file object for the loading of assets in the main directory. Conversely, if there were no such assets, but there was a hiverc
file, you would get only the object for that file. If neither were present, the #hivercs will be an empty list.
If there is no aux-jars
directory, or that directory has no jars, the ext.auxiliary_jars
would be an empty list. Only jars will be included in that list; files without a .jar
extension will be ignored.
Defined Under Namespace
Modules: AuxJarsPath
Constant Summary collapse
- AUX_PATH =
'aux-jars'
- HIVERC_PATH =
'hiverc'
Instance Attribute Summary collapse
-
#auxiliary_jars ⇒ Object
readonly
Returns the value of attribute auxiliary_jars.
-
#path ⇒ Object
readonly
Returns the value of attribute path.
Class Method Summary collapse
- .assemble_dynamic_extension(path) ⇒ Object
- .assemble_static_extension(path) ⇒ Object
-
.build_archive(io, dist_assets, aux_jars = nil) ⇒ Object
Writes a gzipped tar archive to
io
, the contents of which are structured appropriately for use with this class. - .find_auxiliary_jars(path) ⇒ Object
Instance Method Summary collapse
-
#dynamic_hivercs ⇒ Object
An array of dynamic, managed hive initialization objects (Hadupils::Extensions::HiveRC::Dynamic) based on the assets found within the #path.
-
#hivercs ⇒ Object
An array of hive initialization objects derived from dynamic and static sets.
-
#initialize(path) ⇒ Hive
constructor
A new instance of Hive.
-
#static_hivercs ⇒ Object
An array of static hive initialization objects (Hadupils::Extensions::HiveRC::Static) based on the presence of a
hiverc
file within the #path.
Methods included from AuxJarsPath
Constructor Details
#initialize(path) ⇒ Hive
Returns a new instance of Hive.
112 113 114 115 116 117 |
# File 'lib/hadupils/extensions/hive.rb', line 112 def initialize(path) @path = ::File.(path) @auxiliary_jars = self.class.find_auxiliary_jars(@path) @dynamic_ext = self.class.assemble_dynamic_extension(@path) @static_ext = self.class.assemble_static_extension(@path) end |
Instance Attribute Details
#auxiliary_jars ⇒ Object (readonly)
Returns the value of attribute auxiliary_jars.
109 110 111 |
# File 'lib/hadupils/extensions/hive.rb', line 109 def auxiliary_jars @auxiliary_jars end |
#path ⇒ Object (readonly)
Returns the value of attribute path.
110 111 112 |
# File 'lib/hadupils/extensions/hive.rb', line 110 def path @path end |
Class Method Details
.assemble_dynamic_extension(path) ⇒ Object
157 158 159 160 161 162 163 |
# File 'lib/hadupils/extensions/hive.rb', line 157 def self.assemble_dynamic_extension(path) Flat.new(path) do assets do |list| list.reject {|asset| [AUX_PATH, HIVERC_PATH].include? asset.name } end end end |
.assemble_static_extension(path) ⇒ Object
165 166 167 |
# File 'lib/hadupils/extensions/hive.rb', line 165 def self.assemble_static_extension(path) Static.new(path) end |
.build_archive(io, dist_assets, aux_jars = nil) ⇒ Object
Writes a gzipped tar archive to io
, the contents of which are structured appropriately for use with this class.
Provide the static hiverc and any other distributed cache-bound assets in dist_assets
, and any auxiliary jars to include in aux_jars
.
This utilizes a system call to tar
under the hood, which requires that it be installed and on your PATH
.
You can use any file-like writable thing for io
, so files, pipes, etc.
See this example:
File.open('foo.tar.gz', 'w') do |f|
Hadupils::Extensions::Hive.build_archive f,
['/tmp/here/blah.jar',
'/tmp/there/hiverc'],
['/tmp/elsewhere/foo.jar']
end
The following example would produce an archive named “foo.tar.gz”, the contents of which would be:
aux-jars/foo.jar
blah.jar
hiverc
Note that it collapses things into two distinct directories, such that basename collisions are possible. That’s on you to handle sanely.
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
# File 'lib/hadupils/extensions/hive.rb', line 201 def self.build_archive(io, dist_assets, aux_jars=nil) dist, aux = [dist_assets, (aux_jars || [])].collect do |files| files.collect do |asset| path = ::File.(asset) raise "Cannot include directory '#{path}'." if ::File.directory? path path end end ::Dir.mktmpdir do |workdir| basenames = dist.collect do |src| FileUtils.cp src, File.join(workdir, File.basename(src)) File.basename src end if aux.length > 0 basenames << AUX_PATH aux_dir = File.join(workdir, AUX_PATH) Dir.mkdir aux_dir aux.each do |src| FileUtils.cp src, File.join(aux_dir, File.basename(src)) end end ::Dir.chdir(workdir) do |p| Open3.popen2('tar', 'cz', *basenames) do |i, o| stdout = o.read io << stdout end end end true end |
.find_auxiliary_jars(path) ⇒ Object
145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/hadupils/extensions/hive.rb', line 145 def self.find_auxiliary_jars(path) target = ::File.join(path, AUX_PATH) if ::File.directory? target jars = Hadupils::Assets.assets_in(target).find_all do |asset| asset.kind_of? Hadupils::Assets::Jar end jars.collect {|asset| asset.path} else [] end end |
Instance Method Details
#dynamic_hivercs ⇒ Object
An array of dynamic, managed hive initialization objects (Hadupils::Extensions::HiveRC::Dynamic) based on the assets found within the #path. May be an empty list.
130 131 132 133 134 135 136 |
# File 'lib/hadupils/extensions/hive.rb', line 130 def dynamic_hivercs if @dynamic_ext.assets.length > 0 @dynamic_ext.hivercs else [] end end |
#hivercs ⇒ Object
An array of hive initialization objects derived from dynamic and static sets. May be an empty list. Dynamic are guaranteed to come before static, so a static hiverc
can count on the other assets being available.
123 124 125 |
# File 'lib/hadupils/extensions/hive.rb', line 123 def hivercs dynamic_hivercs + static_hivercs end |
#static_hivercs ⇒ Object
An array of static hive initialization objects (Hadupils::Extensions::HiveRC::Static) based on the presence of a hiverc
file within the #path. May be an empty list.
141 142 143 |
# File 'lib/hadupils/extensions/hive.rb', line 141 def static_hivercs @static_ext.hivercs end |