IQSS logo

IRC log for #dataverse, 2019-09-05

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
02:56 dataverse-user joined #dataverse
05:55 dataverse-user joined #dataverse
06:13 pdurbin hi dataverse-user
06:34 dataverse-user when I Uploaded files at dataverse web ui, there is a problem like "Exceed maximum number of files" I want to check if i can set configuration to enlarge the limited number.
06:40 pdurbin dataverse-user: are you uploading a zip file? Maybe you can try changing the :ZipUploadFilesLimit database setting. Please see http://guides.dataverse.org/en/4.16/installation/config.html#zipuploadfileslimit
06:45 dataverse-user uploading jpg files.
06:47 pdurbin Hmm, jpg file. Maybe instead you should play with the :MultipleUploadFilesLimit setting. It doesn't appear to be documented. :(
06:54 dataverse-user I got it. Thank you, and I will first try to upload zip typed files.
06:55 pdurbin Ok. it looks like that setting was added in https://github.com/IQSS/dataverse/pull/3459
06:56 pdurbin :MultipleUploadFilesLimit is 1000 files by default. If changing it helps (or even if it doesn't) please feel free to open an issue to document that setting.
06:58 pdurbin One nice thing about using zip files is that you can organize your files into folders: http://guides.dataverse.org/en/4.16/user/dataset-management.html#file-path
07:08 dataverse-user ok. many thanks~~
08:24 poikilotherm joined #dataverse
08:52 stefankasberger joined #dataverse
09:46 pdurbin dataverse-user: sure. You're welcome. Also, you are welcome to choose a different "nickname" and list yourself in the "who's who" spreadsheet linked in the topic of this channel.
09:47 pdurbin stefankasberger: do you know what would be interesting. To see how much code coverage we get from pointing pyDataverse at my server and then running the pyDataverse test suite.
09:48 pdurbin poikilotherm: I got code coverage of the API test suite working, with help from pameyer.
09:50 pdurbin I find it interesting to see which "commands" are being exercised by the API test suite: http://ec2-3-81-78-209.compute-1.amazonaws.com/target/coverage-it/edu.harvard.iq.dataverse.engine.command.impl/index.html
09:59 poikilotherm Good morning pdurbin :-)
10:01 poikilotherm pdurbin I need to get some stuff done over here for metadata...
10:01 poikilotherm I wan't to build in support for custom metadatablocks in K8s
10:01 poikilotherm Because we are going to use those... ;-)
10:01 poikilotherm Also 4.16 changed the citation block, so this needs to be addressed
10:02 poikilotherm And this also means handling schema changes for Solr
10:04 poikilotherm I was wondering if we could change the schema.xml abit
10:05 poikilotherm It would be totally awesome to include the dynamic fields in a separate XML file which is then xi:included in the schema.xml
10:05 poikilotherm That way generating it on the fly would be much easier
10:06 poikilotherm And Obviously also easier than having a template for schema.xml, which would need a separate processing step
10:33 pdurbin poikilotherm: I thought you were interested (and even willing to help) with this API test suite code coverage stuff. :) But if Solr is in focus now, you might be interested in the comment I made a few hours ago to pkiraly: https://github.com/IQSS/dataverse/issues/5989#issuecomment-528219284 . It's exactly what you're talking about. :)
10:33 poikilotherm Nope, it isn't ;-)
10:34 poikilotherm Ad interest: I need to get the new release done
10:34 poikilotherm I'm sure Slave will appreciate it ;-)
10:34 pdurbin It's close. :)
10:34 poikilotherm Slava
10:35 pdurbin If you could leave a comment for Peter on that issue with your idea, it would be fantastic. On Skype the other day he said he's back from vacation and plans to work on that issue.
10:35 poikilotherm And as it would be progress for things I do have to work on and is benefical for other stuff, this should be in focus ;-)
10:35 poikilotherm Ok, then I'll do that
10:35 pdurbin Thanks!
10:36 pdurbin I can even put it in your column on my board if you want to help Peter make a pull request.
10:37 poikilotherm I don't think so. Peter is looking into Managed Schema and more
10:37 poikilotherm Using Schema API
10:37 poikilotherm My idea is just a small workaround for easier shipping and deploying, but not touching the inner works
10:37 pdurbin Ok, are you thinking you'd make a pull request with that alternate approach?
10:40 poikilotherm Sure, if you would like me to do so
10:40 poikilotherm I wonder if I should create another issue
10:40 poikilotherm And reference things
10:41 poikilotherm Peters approach seems to be pretty cool, but I don't know how long it will take to get there
10:41 pdurbin If we go with your xi:included approach, we should make some decisions about which fields are factored out. Obvious candidates are the custom fields for Harvard such customMRA.tsv, customGSD.tsv etc. in https://github.com/IQSS/dataverse/tree/v4.16/scripts/api/data/metadatablocks . Yes a new issue would be fantastic.
10:41 poikilotherm Actually I looked at the schema.xml in the code
10:42 poikilotherm And I think that these parts should be moved into a included file:
10:42 poikilotherm https://github.com/IQSS/dataverse/blob/09fe94bdc6f4f5e79c61a203b7df8736692657ea/conf/solr/7.3.1/schema.xml#L223-L450
10:43 poikilotherm https://github.com/IQSS/dataverse/blob/09fe94bdc6f4f5e79c61a203b7df8736692657ea/conf/solr/7.3.1/schema.xml#L508-L735
10:44 poikilotherm Those are coming from the "export" at the Dataverse API api/admin/index/solr
10:44 poikilotherm +/schema
10:44 pdurbin Yes they are. All in one file?
10:44 poikilotherm We could split it. Also thought about changing the API endpoints to have two: one for fields, one copies
10:45 poikilotherm So this can be more easily piped into a file
10:45 poikilotherm But splitting by the "---" is also fine, just needs more script logic
10:45 pdurbin I'd prefer separate files.
10:45 poikilotherm Me too
10:45 pdurbin Seems cleaner.
10:46 poikilotherm I was going to say "leaner" :-D
10:46 pdurbin Cleaner and leaner.
10:51 poikilotherm All of this can be bundled in a script, fetching the fields from Dataverse, writting them to file and executing a solr core reload
10:52 poikilotherm I can bundle this in a maintenance job
10:52 poikilotherm Would you like me to add this script to upstream or shall it reside in dataverse-k8s?
10:53 pdurbin Upstream please!
10:53 poikilotherm Right. Then I am going to create an issue.
10:54 pdurbin Thanks! Your script should probably be called from a future version of https://github.com/IQSS/dataverse/blob/v4.16/scripts/api/setup-all.sh
10:54 pdurbin Here's where the "out of the box" metadata blocks are loaded: https://github.com/IQSS/dataverse/blob/v4.16/scripts/api/setup-datasetfields.sh
10:54 poikilotherm Oh BTW. Is there any reason you guys stopped at Solr 7.3.1? Just asking because there is 7.7.1 now and the Docker images for 7.3.1 are not supported anymore
10:55 pdurbin It's not? Woof. It's hard to keep up with Solr releases.
10:55 pdurbin Have you heard that we are also on an old version of Glassfish? ;)
10:56 poikilotherm Nope, not yet.
10:56 poikilotherm Are you?
10:56 poikilotherm %)
10:56 poikilotherm Or better 8)
10:56 pdurbin :)
11:15 poikilotherm https://github.com/IQSS/dataverse/issues/6142
11:16 pdurbin Yes, I see. And https://github.com/IQSS/dataverse-kubernetes/issues/85 . I have a question
11:17 pdurbin What changes are you planning for a future version of https://github.com/IQSS/dataverse/blob/v4.16/scripts/api/setup-datasetfields.sh ?
11:17 poikilotherm Nothing for now
11:18 pdurbin !
11:18 poikilotherm This happens during bootstrapping
11:18 poikilotherm So only once
11:18 pdurbin Are you small chunking me? :)
11:19 poikilotherm Loading other blocks or loading updated upstream blocks needs to happen aside from that script
11:19 pdurbin Ok, but are you planning any changes for a future version of https://github.com/IQSS/dataverse/blob/v4.16/scripts/api/setup-optional-harvard.sh#L50 ?
11:20 poikilotherm Should I?
11:20 pdurbin Yes please!
11:20 poikilotherm LOL
11:20 pdurbin That script is just for reference anyway. You won't break anything.
11:20 poikilotherm Why is that?
11:21 poikilotherm Actually, at some point one will need to load the TSV file
11:21 pdurbin Including a change in that "optional harvard" script would be a good way to communicate the new way of doing things.
11:22 poikilotherm I don't like the TSV approach very much, but it is as it is - changing that is a huge thing.
11:22 poikilotherm But aren't the metadata fields for those already in upstream schema.xml?
11:23 poikilotherm So there is no real necessity to change those, right?
11:23 pdurbin They are. That's what I was trying to say earlier. The custom Harvard stuff should be factored out of schema.xml. Either into harvard.xml or one file for each of Harvard's custom metadata blocks.
11:23 poikilotherm Ah!
11:24 pdurbin Cleaner and leaner.
11:24 poikilotherm Now we are on the same train
11:24 poikilotherm I don't think this is a good idea. Here's why
11:24 poikilotherm When you split up the fields by schema, you need to add these to schema.xml.
11:24 poikilotherm This involves templating
11:25 poikilotherm Which is cool, would be clean and lean
11:25 poikilotherm But also need more stuff to be done.
11:25 poikilotherm Like choosing the templating solution, adding it to installers etc
11:25 poikilotherm Of course one could try to do sth like an include chain
11:26 poikilotherm include these smaller chunks in the included file of schema.xml
11:26 poikilotherm I don't know if this is supported by the parser
11:26 poikilotherm And most likely it shouldn't be necessary
11:26 pdurbin I already do some "templating" when you call into http://localhost:8080/api/admin/index/solr/schema
11:26 poikilotherm Right
11:26 poikilotherm And that's perfect!
11:27 poikilotherm Just take that output and place it in those files
11:27 poikilotherm You will have all things in place you actually use, nothing else
11:27 poikilotherm Don't use custom schema X? Ok, will not be included in the schema.
11:28 pdurbin But you could make it so you call into http://localhost:8080/api/admin/index/solr/schema/oliverBlock1 and it "templates" just the stuff you need for your custom metadata block.
11:28 poikilotherm Huh is that possible now?
11:28 poikilotherm That API endpoint has no docs in the API docs
11:28 poikilotherm I didn't look at the code yet
11:28 pdurbin You'd have to change that API endpoint to take an additional argument, the name of the custom metadata block.
11:29 poikilotherm Ok
11:29 pdurbin I'm happy to advise on this. I wrote all that nasty code. :)
11:29 pdurbin We can even set up some REST Assured tests for it.
11:29 poikilotherm One could of course generate the complete schema.xml
11:30 poikilotherm And just dump and reload
11:30 poikilotherm Would be even easier
11:30 poikilotherm But I wanted to have a quick solution with minimum impact
11:30 poikilotherm As the approach from Peter should be preferred
11:30 poikilotherm Using Schema API totally makes sense
11:31 poikilotherm Especially when looking into the direction of SolrCloud, mulit instance etc
11:31 pdurbin I'm fine with whatever sized chunks provide value. :) I'm just trying to express what I see as a potentially larger chunk that what you may envision. And I can jump on your branch and help move it along. Make pull requests into your branch, I mean.
11:39 poikilotherm :-D
11:40 poikilotherm I think I should commit some stuff from my 64-dev-in-k8s stuff and share it. Might be usefull for testing this
11:40 pdurbin Yes, all the tests please.
11:41 pdurbin Don't forget that automated testing is the focus of our current sprint. Still two weeks left.
11:41 poikilotherm Yeah :-/ It's a pity that I need to get other things done, too
11:42 pdurbin There are always tests to write. :)
11:44 poikilotherm There are not many people around this week at IQSS, right?
11:44 poikilotherm Very low notification traffic
11:45 pdurbin Kevin is on vacation so nothing is getting merged.
11:45 poikilotherm :-) :-| :-/ :-( :'-(
11:46 pdurbin First day of school. Stepping out for a bit.
11:50 poikilotherm BTW pdurbin did you listen to Adams podcasts from monday?
11:50 donsizemore joined #dataverse
11:51 poikilotherm http://airhacks.fm/#episode_52
11:59 poikilotherm Hehe - I just created a branch name 6142-flex-solr-schema. This is kinda funny when you know that "flex" is a common term in German for an angle grinder (flex is a brand name still existing today. lots of people used that in the past, so it became "sticky"). thinking about slicing the schema.xml in parts with a "flex" :-D
12:00 poikilotherm Morning donsizemore! Did you see that security warning about upgrading jenkins?
12:04 donsizemore huh? i just updated the jenkins RPM yesterday
12:04 poikilotherm Oh ok
12:04 poikilotherm I saw them since Monday but had no chance to catch up
12:04 donsizemore (but to answer your question, no, I haven't seen the security announcement)
12:04 poikilotherm It's displayed in the UI
12:05 donsizemore then we're good, because I was in the UI quite a bit yesterday
12:05 donsizemore I get a number of CVE and other security announcements in various ways; I'm surprised I hadn't seen it
12:57 stefankasberger joined #dataverse
13:07 stefankasberger @pdurbin: i dont really get what you mean regarding code coverage testing.
13:45 poikilotherm @pdurbin: I do have working includes for the fields and copyFields
13:45 poikilotherm But I need some syntactic sugar around it
14:36 pdurbin poikilotherm: yes, I listened to it. I've listened to all of them. And I DM'ed him on Twitter to see if he's coming to FOSDEM. He's not but you and stefankasberger should go hear him speak at https://jax.de/programm/ . What other developers in the Dataverse community speak Geerman? :)
14:36 pdurbin German*
14:39 stefankasberger whom do you mean?
14:39 pdurbin stefankasberger: Adam Bien. These two talks:
14:40 pdurbin - https://jax.de/serverside-enterprise-java/tipps-tricks-und-workarounds-mit-jakarta-ee-microprofile-slideless/
14:40 pdurbin - https://jax.de/web-development-javascript/web-apps-ohne-frameworks-slideless-nomigrations/
14:44 poikilotherm Ok guys, I'm outta here for today. Gotta pickup kiddos
14:44 poikilotherm CU
14:49 pdurbin stefankasberger: what are you more interested in? Java or Web Components? :)
15:07 pdurbin donsizemore: mornin'. Anything we should try to bang out before 3?
15:08 donsizemore i think i'm good (and have an 1130) -- the next thing on my plate would be pestering you and pmauduit about JMX stuff for collectd/grafana
15:09 pdurbin Ok, here and there and hacking on https://dev2.dataverse.org/grafana and I'd love to get it working. :)
15:10 pdurbin donsizemore: but at the moment I'm a little more focused on getting API test code coverage reports like http://ec2-3-81-78-209.compute-1.amazonaws.com/target/coverage-it/index.html into dataverse-ansible and dataverse-jenkins. Let's discuss more at 3. :)
15:16 donsizemore i like test coverage
15:17 stefankasberger no java please. :) :) :)
15:17 pdurbin donsizemore: so I had to fuss with my server to get the API tests to run. Can I put https://github.com/IQSS/dataverse-ansible/issues/67 in your column? :)
15:17 donsizemore sure
15:18 pdurbin donsizemore: done, thanks
15:18 pdurbin stefankasberger: how about Web Components? :)
15:19 stefankasberger dont know about them to be honest. but sounds interesting. but i will do mostly DevOps stuff in the next months, so am learning right now about jenkins, selenium and so on. :)
15:20 pdurbin stefankasberger: oh! Do you want to help us with https://github.com/IQSS/dataverse-jenkins ? We have a pyDataverse job at https://jenkins.dataverse.org/job/pyDataverse/ (one of the few that's passing) :)
15:25 donsizemore @pdurbin you can test out Leonid's commit in a branch =)
15:54 stefankasberger am learning jenkins myself, so am not really in the situation to help others. but maybe in the future. :)
15:58 pdurbin stefankasberger: ok. The idea is that we can learn together. I hadn't installed Jenkins myself until I wrote INSTALL.md in that repo 3 months ago. :)
15:59 pdurbin stefankasberger: here's the "add config for pyDataverse" issue: https://github.com/IQSS/dataverse-jenkins/issues/12 :)
15:59 stefankasberger ohh. i will have a look at it. maybe i can use it to learn.
16:00 pdurbin stefankasberger: yes! Right now that issue is assigned to donsizemore but I can add you and me as assignees as well if you want.
16:28 stefankasberger i have noted it and will have a look at it in a few weeks, when i will focus on jenkins.
16:29 stefankasberger until the end of september the next release of pyDataverse is planned.
17:02 pdurbin stefankasberger: great! Where's the plan? GitHub milestones?
17:25 stefankasberger https://github.com/AUSSDA/pyDataverse/milestone/1
17:27 pdurbin stefankasberger: great! Can we also include https://github.com/AUSSDA/pyDataverse/issues/32 ? :) I'll help! I need it for https://github.com/IQSS/dataverse-sample-data/issues/5 :)
17:40 stefankasberger will check it out next week.
17:41 pdurbin cool, thanks :)
18:55 donsizemore joined #dataverse
18:56 donsizemore @pdurbin i'm standing by for our 3pm but am squatting in an unreserved conference room. if I ghost you two, I'm likely getting kicked out for interloping (or Dorian might take out our power)
18:59 pdurbin puppy power
19:31 joelmarkanderson joined #dataverse
19:37 joelmarkanderson @pdurbin: any activity on custom metadata blocks in solr?
19:40 pdurbin joelmarkanderson: hi! Oliver went home. He's in Germany so he's several hours ahead of us. Did you see the new issue he opened? :)
19:40 joelmarkanderson err...checking irc log now...
19:42 joelmarkanderson 6142?
19:43 pdurbin Yes! That one. But we can fix you up. Can you share your custom metadata block? The tsv file, I mean.
19:45 joelmarkanderson oh yes; how shall i share it?
19:53 joelmarkanderson should i attach a file to the RT email ticket?
20:00 pdurbin joelmarkanderson: sure! Did anyone reply to you yet?
20:01 joelmarkanderson no, i didn't know if your RT system accepted attachments
20:03 pdurbin It does.
20:03 pdurbin I'm expecting a field called "type" based on the Solr error.
20:04 joelmarkanderson it's actually "tag"
20:04 joelmarkanderson i can't find the RT thread in my inbox; i'll start a new one; you can merge them i assume
20:05 joelmarkanderson oh there it is
20:07 pdurbin joelmarkanderson: yes, please feel free to create a fresh one and I can merge it.
20:07 joelmarkanderson i found it and responded to the same ticket
20:07 pdurbin Oh, was it tag? Bad memory. Sorry.
20:09 pdurbin Ok. I have vtti-metadata-block.tsv. Want me to load it up on a test server?
20:11 pdurbin This went just fine: curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file /tmp/vtti-metadata-block.tsv
20:12 pdurbin Ok, so just the one field. Tag, like you said.
20:12 pdurbin optional field
20:14 pdurbin tagging with the bike one
20:14 pdurbin Error – The metadata could not be updated. If you believe this is an error, please contact Root Support for assistance.
20:15 pdurbin Caused by: org.apache.solr.client.solrj.impl.H​ttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/collection1: ERROR: [doc=dataset_72_draft] unknown field 'tag'
20:16 pdurbin So to fix this, we need to add the "tag" field to Solr's schema.xml. Twice because I assume you'd like "tag" to be available from basic search.
20:17 pdurbin Since there's only one field the easiest way to get what we need for schema.xml is this: curl http://localhost:8080/api/admin/index/solr/schema | grep tag
20:17 pdurbin Two lines:
20:17 pdurbin <field name="tag" type="text_en" multiValued="true" stored="true" indexed="true"/>
20:17 pdurbin <copyField source="tag" dest="_text_" maxChars="3000"/>
20:18 pdurbin su - solr
20:18 pdurbin (to become the solr user)
20:18 pdurbin backing up and editing /usr/local/solr/server/solr/​collection1/conf/schema.xml
20:21 pdurbin ok, I put those lines directly under these lines, respectively:
20:21 pdurbin <!-- Dynamic Dataverse fields from http://localhost:8080/api/admin/index/solr/schema -->
20:21 pdurbin <!--Dynamic Dataverse fields from http://localhost:8080/api/admin/index/solr/schema -->
20:22 pdurbin systemctl restart solr.service
20:23 pdurbin Ok, now the error is gone when I create a dataset. I just published it: http://ec2-3-81-186-127.compute-1.amazonaws.com/dataset.xhtml?persistentId=doi:10.5072/FK2/HD8VQQ
20:23 pdurbin joelmarkanderson: does that make sense?
20:25 joelmarkanderson i have indeed added exactly the same lines (field name and copyField) to schema.xml
20:26 joelmarkanderson i have not written as the solr user; is that of consequence?
20:26 pdurbin I don't think so. On my system the file is owned by the solr user.
20:27 joelmarkanderson no, solr still owns the file here
20:28 joelmarkanderson let me try to `systemctl restart solr.service`
20:28 pdurbin Ok. Something else you can try is this: http://localhost:8983/solr/collection1/schema/fields
20:29 pdurbin Whoops, and grep for "tag" I mean. I see this output: "name":"tag",
20:33 joelmarkanderson you mean curl?
20:33 pdurbin sorry, sorry, yes curl
20:33 pdurbin doing 5 things at once :)
20:34 pdurbin that "fields" endpoint will dump out all of your fields from Solr in JSON format
20:34 joelmarkanderson there is a "name":"tag" in the fields endpoint
20:34 pdurbin Great! But you still see the exception on the Dataverse side? In server.log?
20:36 joelmarkanderson Success! – The metadata for this dataset has been updated.
20:36 pdurbin hooray!
20:37 joelmarkanderson excellent support, thank you!
20:37 pdurbin joelmarkanderson: sure! While you're here, can I ask you about an AWS thing? :)
20:38 joelmarkanderson go ahead; i admit we have let those muscles atrophy, though
20:39 pdurbin Sorry, I meant Terraform. Are you still using it? And can you please comment on https://github.com/IQSS/dataverse-kubernetes/issues/81 ?
20:47 joelmarkanderson on it (but you may be disappointed in my response)
20:48 pdurbin heh, no problem
20:48 pdurbin and when you're done with that, I think I have one more issue to show you, if you have time :)
21:00 joelmarkanderson ok, what else?
21:01 pdurbin joelmarkanderson: this issue: https://github.com/IQSS/dataverse-aws/issues/11
21:01 joelmarkanderson ugh
21:01 pdurbin :)
21:01 pdurbin sorry :)
21:02 pdurbin joelmarkanderson: have you seen http://guides.dataverse.org/en/4.16/developers/deployment.html#deploying-dataverse-to-amazon-web-services-aws ? We're really into AWS now. :)
21:02 pdurbin But I don't use that dataverse-aws repo at all.
21:03 joelmarkanderson we're not exactly a dataverse development shop, but it's high time i put some work into our operations side
21:03 pdurbin :)
21:04 pdurbin well, if we can help at all, please let us know... our latest tricks are with prometheus and grafana, thanks to pmauduit
21:04 joelmarkanderson i have a grafana guy down the hall
21:05 joelmarkanderson at least you've heard of aws, unlike in your 3.* days
21:05 joelmarkanderson ;)
21:05 pdurbin heh, no kidding :)
21:05 pdurbin joelmarkanderson: can you please show this to your grafana guy? https://github.com/IQSS/dataverse-ansible/blob/master/files/grafana-dashboard.json
21:10 joelmarkanderson let me take a look at your new aws tricks and other repos around; we may need to refresh in the next few weeks
21:21 pdurbin sounds good

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.