Fl4m3Ph03n1x

Fl4m3Ph03n1x

Ecto multiple streams in 1 transaction

Background

PS: the following situation describes an hypothetical scenario, where I own a company that sells things to customers.

I have an Ecto query that is so big, that my machine cannot handle it. With billions of results returned, there is probably not enough RAM in the world that can handle it.

The solution here (or so my research indicates) is to use streams. Streams were made for potentially infinite sets of results, which would fit my use case.

https://hexdocs.pm/ecto/Ecto.Repo.html#c:stream/2

Problem

So lets imagine that I want to delete All users that bought a given item. Maybe that item was not really legal in their country, and now me, the poor guy in IT, has to fix things so the world doesn’t come down crashing.

Naive way:

item_id = "123asdasd123"

purchase_ids =
      Purchases
      |> where([p], p.item_id == ^item_id)
      |> select([p], p.id)
      |> Repo.all()

Users
    |> where([u], u.purchase_id in ^purchase_ids)
    |> Repo.delete_all()

This is the naive way. I call it naive, because of 2 issues:

  • We have so many purchases, that the machine’s memory will overflow (looking at purchase_ids query)
  • purchase_ids will likely have more than 100K ids, so the second query (where we delete things) will fail as it hits Postgres parameters limit of 32K: https://stackoverflow.com/a/42251312/1337392

What can I say, our product is highly addictive and very well priced!
Our customers simply cant get enough of it. Don’t know why. Nope. No reason comes to mind. None at all.

With these problems in mind, I cannot help my customers and grow my empire, I mean, little home owned business.

I did find this possible solution:

Stream way:

item_id = "123asdasd123"

purchase_ids =
      Purchases
      |> where([p], p.item_id == ^item_id)
      |> select([p], p.id)

stream = Repo.stream(purchase_ids)

Repo.transacion(fn -> 
  ids = Enum.to_list(stream)

  Users
    |> where([u], u.purchase_id in ^ids)
    |> Repo.delete_all()
end)

Questions

However, I am not convinced this will work:

  • I am using Enum.to_list and saving everything into a variable, placing everything into memory again. So I am not gaining any advantage by using Repo.stream.
  • I still have too many ids for my Repo.delete_all to work without blowing up

I guess the one advantage here is that this now a transaction, so either everything goes or nothing goes.

So, the following questions arise:

  • How do I properly make use of streams in this scenario?
  • Can I delete items by streaming parameters (ids) or do I have to manually batch them?
  • Can I stream ids to Repo.delete_all ?

Marked As Solved

Fl4m3Ph03n1x

Fl4m3Ph03n1x

Every question post created here creates an entry in a dedicated thread in the official forum iirc. Nonetheless, I still post my questions in both places. And when I find an answer, I add it to both places as well.

I do this mainly for visibility, both for the community, and for the question itself, although the later one is less impactful due to the mentioned DevChat thread the official forum has.

Solutions

In regards to the question, there are two possible solutions.

One suggested by benwilson:

query = from u in Users,
  join: p in assoc(u, :purchase),
  where: p.item_id == ^item_id

Repo.delete_all(query)

And the other by Aleksei Matiushkin:

Repo.transacion(fn ->
  max_rows = 500

  purchase_ids
  |> Repo.stream(max_rows: max_rows)
  |> Stream.chunk_every(max_rows)
  |> Stream.each(fn ids ->
     Users
     |> where([u], u.purchase_id in ^ids)
     |> Repo.delete_all()
  end)
  |> Stream.run()
end, timeout: :infinity)

My pick

The first solution is great, but it requires the User Schema to have a belongs_to :purchase, Purchase definition in its schema. Unfortunately for me, this was a deal breaker, since changing any schemas in the project where I am working in is either not allowed or would result in a lengthy approval process.

So I went with the second solution that is self contained. It requires no changes to any schemas and it can work with the data as is.

Also Liked

jaeyson

jaeyson

hi @Fl4m3Ph03n1x, this might be spammy but, have you tried to ask this via ElixirForums or slack? so other people can see this.

Where Next?

Popular Backend topics Top

chasekaylee
Hi there everyone! Recently, I have fallen in love with programming with Elixir and have been having so much fun with it. I have been do...
New
TwistingTwists
Hello Folks, I am a novice developer from India. Intending to learn Elixir and web apps (phoenix framework). What are things that I MUS...
New
sampu
I have a use case where a client is invoking a Rest endpoint via a load balancer, which in turn invokes a third party endpoint which is r...
New
jeya
Dear Geeks I am new to pytest. I am following a youtube channel. I am writing the same code. learning to test login functionality of an...
New
AstonJ
If when trying to create (or recreate) your dev db with rails db:create you are getting: PG::ConnectionBad: connection to server on soc...
New
Fl4m3Ph03n1x
Background I have a personal project that is an elixir desktop application for PC Windows. It works pretty well, but now I want to give i...
New
sona11
I studied very basic PHP (I believe). After that, I feel like I’ve gotten a handle on the language. My dream is to work as a web develope...
New
harwind
I received this error for a binary search programme in C, despite the fact that it requested for inputs and produced the right output. Th...
/c
New
harwind
I have a large SQL database with millions of records, and I’ve identified duplicate entries. What’s the most efficient way to find and re...
New
jaeyson
Hi! I have clarifications (please correct me, as I mostly mix/confuse this details) with the following: The term RAG here where it read...
New

Other popular topics Top

ohm
Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
New
AstonJ
You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New
PragmaticBookshelf
Rails 7 completely redefines what it means to produce fantastic user experiences and provides a way to achieve all the benefits of single...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
PragmaticBookshelf
Author Spotlight Rebecca Skinner @RebeccaSkinner Welcome to our latest author spotlight, where we sit down with Rebecca Skinner, auth...
New
First poster: bot
Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...
New
DevotionGeo
I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...
New
sir.laksmana_wenk
I’m able to do the “artistic” part of game-development; character designing/modeling, music, environment modeling, etc. However, I don’t...
New