arrays - 如何在 ruby 中批处理枚举
问题描述
在我寻求了解 ruby 的过程enumerable
中,我有类似以下内容
FileReader.read(very_big_file)
.lazy
.flat_map {|line| get_array_of_similar_words } # array.size is ~10
.each_slice(100) # wait for 100 items
.map{|array| process_100_items}
尽管每个flat_map
调用都会发出一个约 10 个项目的数组,但我希望each_slice
调用以 100 个对项目进行批处理,但事实并非如此。即等到有 100 个项目,然后再将它们传递给最终.map
调用。
如何在反应式编程中实现类似于缓冲区功能的功能?
解决方案
要了解如何lazy
影响计算,让我们看一个示例。首先构造一个文件:
str =<<~_
Now is the
time for all
good Ruby coders
to come to
the aid of
their bowling
team
_
fname = 't'
File.write(fname, str)
#=> 82
并指定切片大小:
slice_size = 4
现在我将逐行阅读,将这些行拆分为单词,删除重复的单词,然后将这些单词附加到一个数组中。一旦数组包含至少 4 个单词,我将取前 4 个单词并将它们映射到 4 个单词中最长的单词。执行此操作的代码如下。为了显示计算如何进行,我将使用puts
语句对代码进行加盐。请注意,没有块的IO::foreach返回一个枚举器。
IO.foreach(fname).
lazy.
tap { |o| puts "o1 = #{o}" }.
flat_map { |line|
puts "line = #{line}"
puts "line.split.uniq = #{line.split.uniq} "
line.split.uniq }.
tap { |o| puts "o2 = #{o}" }.
each_slice(slice_size).
tap { |o| puts "o3 = #{o}" }.
map { |arr|
puts "arr = #{arr}, arr.max = #{arr.max_by(&:size)}"
arr.max_by(&:size) }.
tap { |o| puts "o3 = #{o}" }.
to_a
#=> ["time", "good", "coders", "bowling", "team"]
显示如下:
o1 = #<Enumerator::Lazy:0x00005992b1ab6970>
o2 = #<Enumerator::Lazy:0x00005992b1ab6880>
o3 = #<Enumerator::Lazy:0x00005992b1ab6678>
o3 = #<Enumerator::Lazy:0x00005992b1ab6420>
line = Now is the
line.split.uniq = ["Now", "is", "the"]
line = time for all
line.split.uniq = ["time", "for", "all"]
arr = ["Now", "is", "the", "time"], arr.max = time
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"]
arr = ["for", "all", "good", "Ruby"], arr.max = good
line = to come to
line.split.uniq = ["to", "come"]
line = the aid of
line.split.uniq = ["the", "aid", "of"]
arr = ["coders", "to", "come", "the"], arr.max = coders
line = their bowling
line.split.uniq = ["their", "bowling"]
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
line = team
line.split.uniq = ["team"]
arr = ["team"], arr.max = team
如果lazy.
删除该行,则返回值相同,但会显示以下内容(.to_a
最后是多余的):
o1 = #<Enumerator:0x00005992b1a438f8>
line = Now is the
line.split.uniq = ["Now", "is", "the"]
line = time for all
line.split.uniq = ["time", "for", "all"]
line = good Ruby coders
line.split.uniq = ["good", "Ruby", "coders"]
line = to come to
line.split.uniq = ["to", "come"]
line = the aid of
line.split.uniq = ["the", "aid", "of"]
line = their bowling
line.split.uniq = ["their", "bowling"]
line = team
line.split.uniq = ["team"]
o2 = ["Now", "is", "the", "time", "for", "all", "good", "Ruby",
"coders", "to", "come", "the", "aid", "of", "their",
"bowling", "team"]
o3 = #<Enumerator:0x00005992b1a41a08>
arr = ["Now", "is", "the", "time"], arr.max = time
arr = ["for", "all", "good", "Ruby"], arr.max = good
arr = ["coders", "to", "come", "the"], arr.max = coders
arr = ["aid", "of", "their", "bowling"], arr.max = bowling
arr = ["team"], arr.max = team
o3 = ["time", "good", "coders", "bowling", "team"]
推荐阅读
- javascript - 如何使用 XML 内容更新 zip 文件
- clr - 无法找到显示 MEMORY_CORRUPTION_LARGE 和 ACCESS_VIOLATION 的原因,而调用堆栈显示 clr!DontCallDirectlyForceStackOverflow
- angular - 拆分后的Angular9 GroupBy ngFor
- c# - 内部字段更改时强制对 DependencyProperty 进行绑定更新
- c# - 如果使用中间容器,为什么 ASP.NET 项目只能在 docker 中构建?
- java - 获取对象参数类
- api - 如何从社交网络获取视频网址
- chef-infra - 无法在 ruby 块中设置节点属性
- javascript - 为什么这个javascript代码会创建一个无限循环
- python - Pandas 基于多个值创建新变量