IQSS logo

IRC log for #dataverse, 2021-05-06

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
07:10 Virgile joined #dataverse
07:31 VJ joined #dataverse
08:08 juancorr joined #dataverse
09:16 dabukalam joined #dataverse
10:06 VJ joined #dataverse
12:30 VJ joined #dataverse
13:59 pdurbin joined #dataverse
14:17 pdurbin poikilotherm: heh, from that same thread from yesterday: "Just a small comment on your fear for community-driven projects to be discontinued in the future. For all I know, CentOS, PostgreSQL, Apache etc. etc. are also a community-driven projects, but to my knowledge a lot of organizations are using these tools despite (or I'd say: because) of that."
14:19 pdurbin Oh, and Don already replied about CentOS. Good. :)
14:21 pdurbin Yeah. Great reply. Here it is: https://groups.google.com/g/dataverse-community/c/EZEQKw3gj-k/m/3gycXpdRAgAJ
14:23 donsizemore joined #dataverse
14:24 donsizemore @pdurbin I'm hoping my reply wasn't too grizzled. I had two nightmares last night, woke up at 0330 and had a headache.
14:25 pdurbin You should be counting sheep instead of grey hairs in your beard. :)
14:26 VJ joined #dataverse
14:27 donsizemore I was being attacked by possessed people with chainsaws
14:28 pdurbin Hmm. Not the best time to count sheep.
16:59 nightowl313 joined #dataverse
17:08 nightowl313 my daily dumb question ... had a depositor try to upload a spreadsheet and got a "zip bomb detected" error (kind of a scary message for users!) ... i know there is a TabularIngestSizeLimit parameter, but i can't seem to formulate the correct api call to find the default global values. And, is there a recommended setting or way to calculate appropriate values? The spreadsheet being attempted is 500KB.
17:14 nightowl313 tried "curl -X GET http://localhost:8080/api/admin/settings/:TabularIngestSizeLimit"
17:16 nightowl313 just went ahead and set the limit to 2G and the file failed with a MIN_INFLATE_RATIO error in the log
17:31 pdurbin nightowl313: I'm not finding anything about a bomb in the code. I'm not sure why you're seeing that.
17:32 pdurbin Not sure about the MIN_INFLATE_RATIO thing either. :/
17:33 pdurbin Looks like both come from Apache POI: https://stackoverflow.com/questions/44897500/using-apache-poi-zip-bomb-detected
17:34 pdurbin nightowl313: is the spreadsheet an Excel file? If so, do you get different behavior if you try a CSV instead?
17:36 nightowl313 yes, it is excel format ... it has multiple tabs and graphs randomly thrown in everywhere ... i think we need to post some guidelines for our users on what will/will not ingest properly
17:38 nightowl313 was looking through threads about that zip bomb message and it doesn't look like something we want to mess with ... really there for protection against bad things
17:39 nightowl313 the limit that is
17:40 nightowl313 here is the actual error in the log if anyone is interested: "Ingest failure (IO Exception): Could not parse Excel/XLSX spreadsheet. Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data. This may indicate that the file is used to inflate memory usage and thus could pose a security risk. You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files wh
17:41 pdurbin nightowl313: you might want to leave a comment here (guidelines for excel): https://github.com/IQSS/dataverse/issues/7452
17:41 nightowl313 oh ha! your article already had it
17:44 nightowl313 yes will do ... i think it might be good to suppress that message from users (it appears on the front end if they hover on the ingest error) .. it sounds kind of like a malware notification
17:45 pdurbin Yeah. Based on what I'm seeing in that StackOverflow post, perhaps we can do something in the code to avoid this. Please feel free to open an issue for this. And if you can provide the Excel file you're using, it would be helpful.
17:46 nightowl313 oh perfect! will do ... tjanks as always for the assistance! =)
17:47 pdurbin If you can get a screenshot and put it in an issue, that would be nice. I agree that it's suspicious. However, we do try to give the user something they can act on, hence the details. (And we don't know what the details will be.)
17:48 nightowl313 yes, i have a screenshot of the error on the dataset, and a copy of the spreadsheet .. will put that in after my next meeting! =)
17:54 pdurbin thanks!
18:23 dataverse-user joined #dataverse
20:53 pdurbin left #dataverse
21:02 nightowl313 joined #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.