IQSS logo

IRC log for #dataverse, 2018-11-01

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
04:30 nanoz joined #dataverse
04:41 jri joined #dataverse
05:52 nanozz joined #dataverse
08:56 nanozz joined #dataverse
10:57 nanoz joined #dataverse
12:21 andrewSC joined #dataverse
12:53 Jim__ joined #dataverse
13:15 donsizemore joined #dataverse
14:19 pameyer joined #dataverse
16:05 pdurbin pameyer: morning. Do you have any idea if the MOC talks were recorded or not?
16:06 pameyer pdurbin: they had cameras around; no ideas if/when/where any video would be public
16:10 pameyer I think there was some talk about having slides online too, but it sounded like that might be a week or so afterwards
16:12 pdurbin Ok, it's on my mind because I'm planning to send out the community news for October.
16:12 pdurbin Jim__: I'm planning on mentioning your new tool. Thanks again.
16:40 donsizemore joined #dataverse
16:44 pameyer pdurbin: relatively sure that nginx doesn't do ajp
16:47 pdurbin pameyer: looks like the found something that hasn't been updated for 3 years: https://github.com/IQSS/dataverse/issues/5261#issuecomment-435104093
16:50 donsizemore @pdurbin he could probably close that with a location directive
16:50 pameyer yeah - I'm assuming that datavese does API blocking based on the host header in the request
16:50 pameyer haven't looked at nginx/dataverse in a while though
16:52 donsizemore @pameyer location ^~ /api/admin { allow ipv4; deny all; }
16:52 pameyer @donsizemore that looks right to me
16:53 pameyer I think the requestor mentioned that as one of his options
16:58 barryr joined #dataverse
16:58 pdurbin barryr: welcome!
16:58 barryr left #dataverse
16:58 barryr91 joined #dataverse
16:58 barryr91 left #dataverse
16:59 barryr66 joined #dataverse
16:59 barryr66 hi
16:59 pdurbin barryr66: hi! Welcome!
17:00 pameyer hi barryr66
17:00 barryr66 apologies for bouncing.
17:00 pameyer assuming that 5261 is you; proxy_set_header Host $host might do the trick (or donsizemore's suggestion of blocking the location)
17:01 barryr66 there's no bad side effects of blocking /api is there? dataverse doesn't call it back internally?
17:02 donsizemore you probably wouldn't want to block /api entirely — only /api/admin
17:03 pameyer installation and setup scripts would call it; but as far as I know nothing automated hits /api/admin after that
17:03 pdurbin barryr66: bouncing probably isn't your fault. There's a memory leak in the IRC web interface I run (but don't use): https://github.com/IQSS/chat.dataverse.org/issues/3
17:04 barryr66 can someone talk me through the parts of "location ^~ /api/admin { allow ipv4; deny all; }"?
17:04 pdurbin donsizemore: and "builtin-users": http://guides.dataverse.org/en/4.9.4/installation/config.html#blocking-api-endpoints (both should be blocked out of the box)
17:05 donsizemore @barryr66 that's just an example directive you could add to your nginx config. substitute your desired ipv4 range(s) in the allow statement, then the deny would block everybody else
17:06 donsizemore @barryr66 i like @pameyer's host header setting better, it's cleaner
17:08 barryr66 okay, will try them out.
17:09 barryr66 but not now, its home time (17:00 GMT).
17:11 pdurbin good luck and thanks for stopping by
17:12 barryr66 just FYI this is going to be a data repository for a research consortium I'm involved with. 95% of the way there with the setup!
17:12 barryr66 thanks, /bye
17:12 barryr66 left #dataverse
17:13 * pdurbin adds it to "trial installations"
17:15 pameyer good luck barryr66
17:26 pdurbin I love this tweet: https://twitter.com/kiru/status/1058027114715324418
17:44 donsizemore @pdurbin given barry's response on 5261, want me to cobble together a sample nginx.conf? i can tinker with it locally before perpetrating it onto the documentation
17:46 dataverse-user joined #dataverse
18:04 pdurbin donsizemore: sure! Maybe grep -i for nginx in the guides and mention it as a commuity supported option?
18:07 Jim__ @pdurbin - FWIW: we only used a local nginx on a dev machine and have since removed it (AWS is proxying now and I don't have those configs handy).
18:08 pdurbin Jim__: ah, ok. Thanks.
18:09 donsizemore @pdurbin i only found a passing reference to nginx
18:12 pdurbin donsizemore: sounds right
18:12 pdurbin I just don't want to over emphasize support for anything other than Apache.
18:24 pameyer from what I remember, you need apache if you want shib
18:24 donsizemore we definitely don't want to anger Mr. Shib
18:24 pameyer possibly if you want rapache too.  even though I'm not using either, apache was the recommendation so that was what I went with
18:25 pameyer although if http2 had been enough of a performance boost, I might've reconsidered nginx
18:25 donsizemore @pameyer my biggest performance hit was in enabled rapache. our next production install won't
18:26 pameyer @donsizemore I've been ignoring rapache entirely, and haven't seen any problems.  but I also don't have to worry about tabular data
18:26 pdurbin in my day there was no nginx
18:36 pameyer pdurbin: probably not a suprise, but I can reproduce https://github.com/IQSS/dataverse/issues/5260 on fa38c5c794fe19b4ed28a0eeb8208ce7df612639
18:39 pdurbin pameyer: thanks for confirming
18:43 pameyer no problem
18:47 pdurbin Jim__: do you want to talk out the metadata blocks google doc stuff?
18:54 Jim__ pdurbin - sure. I just wanted to know what action, if any, to take.
18:54 Jim__ Right now there's questionable practice info in google and that doc is deprecated...
18:55 pdurbin Jim__: right, and IQSS isn't even the owner of that doc. Tim created it.
18:56 pdurbin So I can make suggestions to it but I think it has outlived its original purpose.
18:56 pdurbin ... which was to guilt us into writing documentation, I think :)
18:56 pdurbin And I thank Tim for it!
18:57 pdurbin But I think we need something new.
18:57 pdurbin Jim__: any suggestions?
18:57 pameyer there must've been *some* documentation before that google doc
18:57 pdurbin pameyer: nope :)
18:58 pdurbin there was how the code worked and oral tradition :)
18:58 pdurbin I'm re-reading Guns, Germs, and Steel and writing has turned out to be pretty important.
18:59 Jim__ I like the idea of something like a google doc for unofficial info - moving things that you don't want to support/don't want people to try without understanding the issues into official docs seems bad
18:59 Jim__ but whether it's this doc with the parts now moved to guides deleted, or something else, I'm open.
19:00 pdurbin I'm trying to decide what I don't like about a Google doc.
19:00 pdurbin It's hard for me to tell who wrote what.
19:00 Jim__ GG&S - who knew English became popular due to horses vs zebras...
19:01 pdurbin It's hard for me to tell when a line was written.
19:01 Jim__ Does it matter who wrote what?
19:01 pdurbin yes!
19:01 pdurbin people matter :)
19:01 pdurbin I love `git blame`. :)
19:01 Jim__ it takes a village...
19:02 Jim__ but not in sourcecode
19:02 pdurbin When it's in the guides, the project wrote it.
19:02 pdurbin The project can be blamed for typos and for leading people wrong. It's a bug.
19:02 pdurbin I'm sure I'm over thinking it.
19:03 pameyer markdown document in a different repo under iqss?
19:03 pdurbin I can create a new Google doc and link to it, I guess. But who gets to edit it? Anyone in the world? A curated list of people?
19:03 pdurbin pameyer: I do love me some markdown
19:03 pameyer I think the idea of a community tips kind of repo has been kicked around before
19:03 pdurbin yeah?
19:03 pameyer some folks perfer some kind of gui editing though - don't know one way or the other if github's got an interactive markdown editor
19:04 pdurbin pameyer: they do. See https://github.com/IQSS/dataverse-uploader/wiki/DVUploader%2C-a-Command-line-Bulk-Uploader-for-Dataverse/_edit
19:05 Jim__ :-)
19:05 pdurbin The experiement of a wiki at https://github.com/IQSS/dvn/wiki didn't go so well.
19:05 pameyer cool! - does it work for non-wiki stuff too?
19:06 pdurbin not sure, the issues editor isn't as friendly
19:06 Jim__ I thought the google doc was useful when I started - it was pretty clear that it was in progress, people were still editing/commenting, and I'd much rather have that advice than to have anyone not contribute over concern that they might have missed something...
19:06 pameyer @Jim__ good point - using github at all does raise the barrier to entry
19:09 pameyer and that barrier is already high enough I suspect
19:09 Jim__ My minimalist thought would be to just edit Tim's doc to say much of the material here has been moved to guides (and probably delete that info)
19:09 Jim__ and the rest is unofficial/not supported/...
19:09 pdurbin Jim__: I'm a little botherd by the lack of control, not being an "owner" of the doc. So I just made a new one.
19:10 pdurbin If you want edit access, please request it: https://docs.google.com/document/d/1XpblRw0v0SvV-Bq6njlN96WyHJ7tqG0WWejqBdl7hE0/edit?usp=sharing
19:12 Jim__ Got it. Are you making an initial cut/paste? If so, I 'll check after you do. Otherwise I can cut/paste my parts of the other doc.
19:12 pdurbin Jim__: nope, I'll let you cut and paste anything you want to preserve
19:12 Jim__ Do you want to keep edit and just have community members 'suggest' ?
19:13 pdurbin I'm find with known people editing directly.
19:13 pdurbin fine*
19:31 pdurbin Jim__ pameyer: ok, I linked to the new doc at https://github.com/IQSS/dataverse/commit/3bbde28
19:31 pdurbin I put it in a new "tips" folder so if you want other docs for other tips, lemme know.
19:31 Jim__ pdurbin - w.r.t PR #5169 - do my comments about updating the tsv make sense?
19:32 pdurbin Jim__: I've been meaning to circle back to that. And it's related to what I'm working on now... documenting "reloading" of metadata blocks, which I'm somewhat new to.
19:35 Jim__ OK  - when we had charset issues and I reloaded with the corrected utf-8 entries, the api wrote new values but left the old. Looking at the code, it looked to me as though, unless the identifier matched an existing entry, it would always create a new value.
19:36 pdurbin The code matches on name first, then it tries to match on identifier. One sec.
19:36 Jim__ OK - so maybe your case where the name is not changing won't hit this...
19:36 pdurbin See parseControlledVocabulary at https://github.com/IQSS/dataverse/blob/v4.9.4/src/main/java/edu/harvard/iq/dataverse/api/DatasetFieldServiceApi.java#L346
19:37 pdurbin sekmiller was saying that the nice thing about adding an identifier for controlled vocabulary values is that in the future they can be used to match on if you want to change the name of a controlled vocabular value. If that makes sense.
19:38 Jim__ Yep - you're right so this isn't an issue for adding identifiers, and in fact, new tip, if you add identifiers first to bad charset values, you can then update the names in a second reload :-)
19:38 pdurbin For example, "4" is not the identifier for "Funder" (in my branch) and if we want to change "Funder" to, uh... "Primary Funder" we could reload the tsv as long as we keep the identifier as "4".
19:39 Jim__ Right, that's why identifiers are good
19:39 pdurbin The thing is... I think the old strings are still sprinkled throughout the database.
19:39 pdurbin That's basicially what tcoupin is asking... how do I find the old broken non-UTF strings and fix them.
19:40 pdurbin old broken strings used in datasets, I mean
19:40 Jim__ Hmm - I thought his issue was ours - once the bad name is loaded, reloading the tsv with a correct value doesn't get rid of the old one, because there is no identifier
19:41 pdurbin Not sure. Did you open your own GitHub issue?
19:41 * pdurbin looks
19:42 pdurbin I don't see it.
19:43 Jim__ No - QDR had the problem and I just edited the db to remove the old values and reloaded the tsv to get the new (correct charset) ones, but that's why I wrote some of that up in Tim's doc
19:43 pdurbin It's hard to keep ~4000 issues in your head. :)
19:43 Jim__ you do an amazing job of that btw
19:44 pdurbin ok, so maybe move that tip over to the new doc and I'll try to remember to leave a comment on it
19:44 Jim__ I think though that if one adds identifiers for one reload and then changes the names (strvalue) in the next, you shouldn't have to edit the db.
19:44 pdurbin Right, exactly. That's what I'm trying to say.
19:44 Jim__ but for #5234
19:45 pdurbin feeds into https://github.com/IQSS/dataverse/blob/e39a94df05a86727ce5e01727882cd614b859af9/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst#how-to-determine-if-you-need-to-create-or-update-a-sql-upgrade-script (unmerged)
19:46 pdurbin but let me to read your comment again
19:46 Jim__ ???
19:47 pdurbin :)
19:48 pdurbin sorry, I'm losing you
19:48 pdurbin jumping around too much
19:50 pdurbin Jim__: over at https://github.com/IQSS/dataverse/issues/5234#issuecomment-435164271 I just asked you to move tcoupin's query to the new doc or open an issue
19:50 Jim__ For #5234, tcoupin has the problem of bad controlled vocab values in the db and he's found a query to find but not to correct them. From the discussion here, I think he could reload a tsv with the bad value and an identifier, and then reload with that identifier and a new corrected name to fix his db
19:51 Jim__ Ah - now I get that part
19:52 pdurbin yeah, that might work, kinda weird to stay in bad value mode when you want to get out of it as quickly as possible (I would imagine) but should work
19:52 pdurbin I doubt you or he wants to hear a developer say "just drop your database and start over like I do on my laptop all the time" :)
19:52 Jim__ and better than editing in the db. Also would be a fix where those vocab values have been used already...
19:53 Jim__ I'll make a comment on that issue...
19:53 pdurbin would it?
19:53 pdurbin don't the values get scattered all over the database when you use them?
19:53 Jim__ just because it reuses the old db entry and uses of it are connected by the db id I think...
19:54 Jim__ let me check...
19:54 pdurbin I'm checking too.
19:55 pdurbin oh! maybe not, good
19:56 pdurbin or maybe, hmm
19:57 pdurbin looking at http://phoenix.dataverse.org/schemaspy/latest/tables/controlledvocabularyvalue.html
20:00 pdurbin oh, I'm thinking about the stuff the user enters
20:11 Jim__ http://phoenix.dataverse.org/schemaspy/latest/tables/datasetfield_controlledvocabularyvalue.html - just the id  of the cvv is in the table when its used.
20:11 pdurbin yeah
20:12 pdurbin so I think my concern about "Funder" being scattered all over is unfounded
20:14 Jim__ I think so - (reading Gustavo's comments at the same time)
20:23 pdurbin I just dumped my database to a file and "Funder" only appears in one place. Not scattered. Good.
20:25 Jim__ An experimentalist!
20:30 pdurbin :)
21:56 donsizemore joined #dataverse
22:15 pameyer left #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.