<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Dat Project Blog]]></title><description><![CDATA[New developments and ideas from the Dat team.]]></description><link>https://blog.datproject.org/</link><image><url>https://blog.datproject.org/favicon.png</url><title>Dat Project Blog</title><link>https://blog.datproject.org/</link></image><generator>Ghost 1.21</generator><lastBuildDate>Thu, 21 Jun 2018 15:13:26 GMT</lastBuildDate><atom:link href="https://blog.datproject.org/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[May Community Call Recap!]]></title><description><![CDATA[Watch the recording of our second community call with a whopping 7 speakers from our sponsored projects Dat, Stencila, and ScienceFair - as well as speakers from the broader community. ]]></description><link>https://blog.datproject.org/2018/06/11/may-community-call-recap/</link><guid isPermaLink="false">5b15796ee47f8305fa968734</guid><category><![CDATA[Community]]></category><dc:creator><![CDATA[Danielle Robinson]]></dc:creator><pubDate>Mon, 11 Jun 2018 13:00:00 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/06/CSS-cc-1.jpeg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.datproject.org/content/images/2018/06/CSS-cc-1.jpeg" alt="May Community Call Recap!"><p>We hosted our second community call on May 31st with a whopping 7 speakers from our sponsored projects <a href="https://blog.datproject.org/2018/06/11/may-community-call-recap/datproject.org">Dat</a>, <a href="https://blog.datproject.org/2018/06/11/may-community-call-recap/stenci.la">Stencila</a>, and <a href="https://blog.datproject.org/2018/06/11/may-community-call-recap/sciencefair-app.com">ScienceFair</a> - as well as speakers from the broader community.</p>
<p>It's archived on AirMozilla (thanks AirMoz!) so you want watch it <a href="https://air.mozilla.org/mozilla-science-lab-may-2018-bi-monthly-community-call/">here</a> and follow along with the notes on <a href="https://public.etherpad-mozilla.org/p/CSS-community-call-May-31-2018">our Etherpad</a>.</p>
<p>There were over 20 lines of community announcements! Including:</p>
<ul>
<li><a href="https://decentralizedweb.net/">Decentralized Web Summit 2018 Global Visions / Working Code</a>,  July 31 – August 2, 2018 at the Internet Archive, San Francisco, CA <a href="https://decentralizedweb.net/">https://decentralizedweb.net/</a></li>
<li><a href="https://www.opencon2017.org/opencon_2018_announced">OpenCon 2018 is coming to Toronto</a></li>
<li>Fritter accessible on regular browsers via dat-polyfill and dat-gateway, <a href="https://mobile.twitter.com/RangerMauve/status/1001980249243095041">for more</a></li>
<li><a href="http://stenci.la/blog/2018-05-stencila-in-binder/">Stencila in Binder!</a> (From #eLifeSprint)</li>
<li>Check out  <a href="https://github.com/cabal-club/cabal-desktop">Cabal</a> - decentralized private chat</li>
<li><a href="https://prereview.org/">PREreview</a> and Dat will be at <a href="https://www.force11.org/meetings/force2018">FORCE2018 in Montreal</a></li>
</ul>
<p>We had an incredible line up of speakers:</p>
<ul>
<li>
<p><a href="https://github.com/mafintosh">Mathias Buus Madsen / Dat &amp; Beaker</a> joined us from Berlin and spoke about Dat and hyperdb, a distributed, scalable database. This is the next big step for the Dat protocol. Hyperdb sets the stage for enabling multiwriter databases. In other words, allowing multiple people to write to a Dat. This is the culmination of many years of work and will enable all kinds of new use cases, including <a href="https://github.com/jimpick">Jim Pick's</a> <a href="https://blog.datproject.org/2018/05/14/dat-shopping-list/">collaborative shopping list</a>, <a href="http://www.mmoma.ru/exhibitions/gogolevsky10-2/vedutsya_revolyucionnye_raboty/">art</a>, <a href="https://github.com/cabal-club/cabal-desktop">chat</a>, and more!</p>
</li>
<li>
<p><a href="https://www.yoshuawuyts.com/">Yoshua Wuyts</a> talked about working on a Rust implementation of Dat, funded by the German <a href="https://twitter.com/prototypefund/">Prototype Fund</a>. The issue he is addressing is that Dat is written in javascript - so it doesn't run everywhere. Yosh's approach is to port bits and pieces to low-level language. He's 3/4 done implementing hypercore and hasn't hit roadblock - so the experiment is going quite well! The goal is to build a mobile-friendly Dat.</p>
</li>
<li>
<p><a href="http://lucid00.com/">Hugh Isaacs II</a> joined us from New York. He's made a <a href="https://github.com/HughIsaacs2/DatPart">Dat Chrome extension</a> and he spoke at the last <a href="https://peer-to-peer-web.com/nyc">p2p web event in NYC</a> (we'll post the link to his talk when it's available). Hugh is interested in offline friendly tech. Zombie apocalypse example - limited power, limited networking -  how do we connect in those types of situations? Even in NYC, disasters like Hurricane Sandy show the importance of being ready with offline friendly tools. Hugh is focused on trying to get the word out there and highlight the ability for protocols like Dat to work offline.</p>
</li>
<li>
<p><a href="https://twitter.com/npscience">Naomi Penfold, PhD</a> just ran a productive and inclusive hackathon - <a href="https://elifesciences.org/inside-elife/b4ed92e1/innovation-collaboration-and-creativity-at-the-heart-of-the-elife-innovation-sprint-2018?utm_source=CSS&amp;utm_medium=referral&amp;utm_campaign=sprint-IE">The eLife Innovation Sprint</a> - in Cambridge, UK. Joe and I both went, and we invited Naomi to the call to talk about her experience planning and running the event. She flipped the Q and A by asking our audience questions in the Etherpad (line 243), including &quot;How can we reach more under-represented people in the tech space?&quot; Review her extensive notes on planning and executing the event, and <a href="mailto:n.penfold@elifesciences.org?subject=eLife%20Sprint%20on%20CSS%20call">reach out</a> to her with your comments!</p>
</li>
<li>
<p>Georgia Bullen and Chris Ritzo  joined us from <a href="https://www.measurementlab.net/">Measurement Lab</a> to tell us about the Open Internet Measurement Platform -- and yes, folks, it's all open source, open data.  Measurement Lab's mission is to measure the internet, save data, make it easy for people understand. To do this, they partner with Open Tech Institute, Planet Lab (Princeton), and Google. Since 2009, they've been working to provide OS platform to host measurement tests. This means you can ask &quot;how fast is my connection?&quot; and get a real answer for your actual house. They collect data from millions and millions of measurements per day. Right now they're working on a project to with colleagues at Simmons College &amp; Internet2 to make better system for libraries - learn more at <a href="https://docs.google.com/presentation/d/1xuJhB5rnVLTRXE2xP4vX2vRqOUd3y3_PzPaD2tcwhTM/edit#slide=id.g37eeb48ad2_0_97">Chris' slide deck</a>. Reach out to Measurement Lab and join <a href="https://groups.google.com/a/measurementlab.net/forum/#!forum/discuss">their mailing list</a>!</p>
</li>
<li>
<p><a href="https://blog.datproject.org/2018/06/11/may-community-call-recap/stenci.la">Nokome Bentley / Stencila</a> joined us from New Zealand to talk about what's new with Stencila. The team has released Stencila Desktop 0.28.0 a few weeks ago for beta testing - check it out! They've also started a beta testing program, learn more <a href="http://stenci.la/community/beta-testing.html">here</a> and reach out to get involved. Daniel Nüst and Min Ragan-Kelley created nbstencilaproxy for running Stencila inside Binder during the eLife sprint, which was a lot of fun for everyone. Nokome has been simplifying the approach to writing and registering custom functions in Stencila. He hopes to make this part of the next release.</p>
</li>
<li>
<p><a href="https://blog.datproject.org/2018/06/11/may-community-call-recap/sciencefair-app.com">Rik Smith-Unna / ScienceFair</a> joined us from the UK to tell us about &quot;The futuristic, fabulous and free desktop app for working with scientific literature 🦄&quot;. Right now ScienceFair is a desktop app for discovering, collecting, organizing, reading and analyzing scientific papers. Long-term vision for it is a complete rethink of how scientific literature is produced, distributed, discovered and used. Works p2p, built on dat (migrating to hyperdb). Stay tuned for bug-fix releases and more in the coming months! Join the ScienceFair community <a href="https://gitter.im/sciencefair-app/Lobby">on Gitter</a>.</p>
</li>
</ul>
<p>Thank you to all our speakers, AirMoz, Aurelia Moser, and to everyone who tuned in and asked questions. See you again in August. In the meantime, please reach out if you'd like to speak on a future call @codeforsociety on Twitter or by email at hi@codeforscience.</p>
<p>The <a href="https://blog.datproject.org/2018/03/05/css-community-call-03-2018/">last call was super fun too</a>, check it out!</p>
</div>]]></content:encoded></item><item><title><![CDATA[Shared Infrastructure: A Cooperative Preservation Network for Data]]></title><description><![CDATA[<div class="kg-card-markdown"><p>Research and cultural heritage institutions are facing increasing costs to preserve digital objects like scientific data, digital art, and other artifacts. As many institutions move data to cloud services, preservation costs and complexity are quickly becoming concerns. We are announcing a project to prototype shared infrastructure for digital preservation: Cooperative</p></div>]]></description><link>https://blog.datproject.org/2018/06/05/cdl-ia-dat-pilot/</link><guid isPermaLink="false">5b16c794e47f8305fa96873d</guid><category><![CDATA[Decentralization]]></category><category><![CDATA[Science]]></category><dc:creator><![CDATA[Joe Hand]]></dc:creator><pubDate>Tue, 05 Jun 2018 18:20:08 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/06/IMG_1112.JPG" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.datproject.org/content/images/2018/06/IMG_1112.JPG" alt="Shared Infrastructure: A Cooperative Preservation Network for Data"><p>Research and cultural heritage institutions are facing increasing costs to preserve digital objects like scientific data, digital art, and other artifacts. As many institutions move data to cloud services, preservation costs and complexity are quickly becoming concerns. We are announcing a project to prototype shared infrastructure for digital preservation: Cooperative Preservation Network.</p>
<p>Together with other leading institutions in decentralization and digital preservation, Code for Science &amp; Society, the Internet Archive, and California Digital Library, we aim to demonstrate how decentralized technology can bolster existing institutional infrastructure. Built on a decentralized network using the Dat Protocol, this project aims to enable public organizations who preserve digital cultural heritage to backup and monitor digital assets.</p>
<p>Dat is already being used by researchers, developers, and artists. This project aims to identify institutional limitations to using decentralized technology, whether technical and social. To test our assumptions, we will prototype a network allowing each participating entity to view and download the collections of other participants. If successful, this project will demonstrate how to reduce preservation costs while increasing preservation assurance, as members of a cooperative, decentralized network mutually support each other to ensure adequate copies of data are maintained.</p>
<p>Open infrastructure allows institutions to use digital preservation tooling without locking them into specific paid services or locking data into a patchwork of data silos. By working with the Dat Protocol, we will build this project to maximize flexibility and interoperability. Our goal is not to replace existing institutional infrastructure but to make it more capable by linking institutions at a foundational level. Building on value-driven open infrastructure, this project aims to identify new opportunities for collaboration between institutions and community engagement in data preservation.</p>
<h2 id="movingforward">Moving Forward</h2>
<p>Despite improvements in data preservation and access, today’s digital preservation solutions rely on storage of objects in centralized servers. This model is built on traditional web infrastructure, which was designed with the values of commercial organizations. It’s time for scholars to ask whether today’s data preservation technologies align with open scholarship’s values of access, preservation, privacy, and transparency.</p>
<p>This project will be a community-driven infrastructure that values openness and bakes access into the code. Want to learn more? Representatives of this project will be at <a href="https://www.force11.org/meetings/force2018">FORCE 2018</a>, <a href="http://www.jcdl.org/">Joint Conference on Digital Libraries</a>, <a href="http://www.or2018.net/">Open Repositories</a>, <a href="https://forum2018.diglib.org/">DLF Forum</a>, and the <a href="https://decentralizedweb.net/">Decentralized Web Summit.</a></p>
<p>More about CSS: <a href="https://codeforscience.org/">Code for Science &amp; Society</a> is a nonprofit organization committed to building public interest technology and low-cost decentralized tools with the <a href="https://datproject.org/">Dat Project</a> to help people share and preserve versioned digital information. Read more about CSS’ <a href="https://blog.datproject.org/tag/science/">Dat in the Lab</a> project, our recent <a href="https://public.etherpad-mozilla.org/p/CSS-community-call-May-31-2018">Community Call</a>), and <a href="https://codeforscience.org/">other activities</a>. <a href="https://codeforscience.org">codeforscience.org</a></p>
<p>More about IA: The <a href="https://archive.org/">Internet Archive</a> is a non-profit digital library with the mission to provide “universal access to all knowledge.” It works with hundreds of national and international partners providing web, data, and preservation services and maintains an online library comprising millions of freely-accessible books, films, audio, television broadcasts, software, and hundreds of billions of archived websites. <a href="https://archive.org/">archive.org</a></p>
<p>More about CDL and UC3: <a href="https://uc3.cdlib.org/">University of California Curation Center</a> (UC3) at <a href="https://www.cdlib.org/">California Digital Library</a> (CDL) provides innovative data curation and digital preservation services to the 10-campus University of California system and the wider scholarly and cultural heritage communities. Learn more about UC3’s collaboration with CSS in our previous <a href="https://blog.datproject.org/tag/science/">Dat in the Lab</a> project. <a href="https://www.cdlib.org/">https://www.cdlib.org/</a></p>
</div>]]></content:encoded></item><item><title><![CDATA[Code for Science & Society Community Call - May 31, 11am PST]]></title><description><![CDATA[Please mark your calendars for the next Code for Science & Society Quarterly Community Call.
May 31, 2018:  11am Pacific / 2pm Eastern / 7pm UK / 8pm Berlin / June 1, 6am NZ! Tune in on AirMozilla.]]></description><link>https://blog.datproject.org/2018/05/24/code-for-science-society-community-call/</link><guid isPermaLink="false">5b072af3e47f8305fa9686f9</guid><category><![CDATA[Community]]></category><dc:creator><![CDATA[Code for Science & Society]]></dc:creator><pubDate>Thu, 24 May 2018 21:42:38 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/05/42281336481_0b2feccc72_z.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.datproject.org/content/images/2018/05/42281336481_0b2feccc72_z.jpg" alt="Code for Science & Society Community Call - May 31, 11am PST"><p>We've been <a href="https://twitter.com/jondashkyle/status/990992820994433025">traveling</a> <a href="https://blog.datproject.org/2018/05/22/collaboration-czi/">so</a> <a href="https://elifesciences.org/labs/bdd4c9aa/elife-innovation-sprint-2018-project-roundup">much</a> that we are a little late on the announcement (eek it's next Thursday!) but please mark your calendars for the next Code for Science &amp; Society Quarterly Community Call.</p>
<p>Thursday, May 31, 2018:  11am Pacific / 2pm Eastern / 7pm UK / 8pm Berlin / June 1, 6am NZ</p>
<p>We have an incredible line up of speakers:</p>
<ul>
<li><a href="https://www.yoshuawuyts.com/">Yoshua Wuyts</a></li>
<li><a href="https://twitter.com/npscience">Naomi Penfold, PhD</a></li>
<li><a href="http://lucid00.com/">Hugh Isaacs II</a></li>
<li><a href="https://www.measurementlab.net/">Georgia Bullen / Measurement Lab</a></li>
<li><a href="https://github.com/mafintosh">Mathias Buus Madsen / Dat &amp; Beaker</a></li>
<li><a href="https://blog.datproject.org/2018/05/24/code-for-science-society-community-call/stenci.la">Nokome Bentley / Stencila</a></li>
<li><a href="https://blog.datproject.org/2018/05/24/code-for-science-society-community-call/sciencefair-app.com">Rik Smith-Una / ScienceFair</a></li>
</ul>
<p>The <a href="https://blog.datproject.org/2018/03/05/css-community-call-03-2018/">last one was super fun</a>, so you won't want to miss this.</p>
<p>How do you watch? Thanks to <a href="http://aureliamoser.com/">Aurelia Moser</a> at Mozilla, you can tune in on <a href="https://air.mozilla.org/mozilla-science-lab-may-2018-bi-monthly-community-call/">Air Mozilla</a> and ask live questions on <a href="https://public.etherpad-mozilla.org/p/CSS-community-call-May-31-2018">our Etherpad</a>.</p>
<p>To recap:</p>
<ul>
<li>May 31, 2018:  11am Pacific / 2pm Eastern / 7pm UK / 8pm Berlin / June 1, 6am NZ</li>
<li>Read along, add your updates, and ask questions here: <a href="https://public.etherpad-mozilla.org/p/CSS-community-call-May-31-2018">https://public.etherpad-mozilla.org/p/CSS-community-call-May-31-2018</a></li>
<li>Watch the call here: <a href="https://air.mozilla.org/mozilla-science-lab-may-2018-bi-monthly-community-call/">https://air.mozilla.org/mozilla-science-lab-may-2018-bi-monthly-community-call/</a></li>
<li>Can we feature your work on an upcoming call? You better believe we have a spreadsheet going, so email us <a href="mailto:hi@codeforscience.org">hi@codeforscience.org</a> to get on a future call.</li>
</ul>
<p>This photo of Joe explaining deeply exciting computer stuff is from the eLife Innovation sprint, taken by <a href="https://twitter.com/OrquideaRealPho">Julieta Sarmiento</a></p>
</div>]]></content:encoded></item><item><title><![CDATA[Collaborative Communities at CZI Human Cell Atlas Meeting]]></title><description><![CDATA[<div class="kg-card-markdown"><p>At the end of April we joined scientists working on <a href="https://www.chanzuckerberg.com/science">Chan Zuckerberg Science's</a> (CZI) <a href="https://www.humancellatlas.org/">Human Cell Atlas</a> (HCA) project to facilitate a session on collaboration for scientists. Improving communities ability to collaborate is a core focus of the <a href="https://codeforscience.org/">Code for Science &amp; Society</a> mission. We facilitate in-person events, share our</p></div>]]></description><link>https://blog.datproject.org/2018/05/22/collaboration-czi/</link><guid isPermaLink="false">5af9bc0ce47f8305fa9686c1</guid><category><![CDATA[Community]]></category><category><![CDATA[Science]]></category><dc:creator><![CDATA[Danielle Robinson]]></dc:creator><pubDate>Tue, 22 May 2018 13:00:00 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/05/IMG_1121.JPG" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.datproject.org/content/images/2018/05/IMG_1121.JPG" alt="Collaborative Communities at CZI Human Cell Atlas Meeting"><p>At the end of April we joined scientists working on <a href="https://www.chanzuckerberg.com/science">Chan Zuckerberg Science's</a> (CZI) <a href="https://www.humancellatlas.org/">Human Cell Atlas</a> (HCA) project to facilitate a session on collaboration for scientists. Improving communities ability to collaborate is a core focus of the <a href="https://codeforscience.org/">Code for Science &amp; Society</a> mission. We facilitate in-person events, share our processes openly, and join community groups like the <a href="https://osaos.org/">Open Source Alliance for Open Scholarship</a> and <a href="http://jrost.org/">Joint Roadmap for Open Source Tools</a> to advance this mission. Our unique perspective on collaboration and open source comes from work with projects and people across domains from science to the arts. What does this facilitation process look like? Read on to find out!</p>
<p>A successful collaboration depends on people and communication. Learning how to leverage tools like Git is part of the process. To develop a sustainable collaborative process, we work through a blockers and concerns to create a project-specific method for collaboration. Our process looks a little different each time and is shaped by the needs of the group.</p>
<h2 id="facilitatingcollaborationwithgitforcomputationalbiology">Facilitating Collaboration with Git for Computational Biology</h2>
<p>At the CZI HCA event, we worked with research scientists for two hours with the goal of improving their collaborative processes. The participants represented a mix of computational biologists, bench biologists, students, postdocs, and faculty. Most of the people at the session were familiar with Git but not regularly using it for collaboration. It's often easier to learn Git than it is to develop a Git-based workflow for a group. The basics of Git can be covered in a day. Developing a lab workflow that is sustainable takes communication, iteration, and commitment.</p>
<h3 id="startingwithtermsandconcepts">Starting with terms and concepts</h3>
<p>To ensure a successful session, we make sure everyone is comfortable with the terms and on the same page. We began with a terminology review of Git. This helps to raise a questions (e.g. when to make a branch vs when to fork) that we return to later. We also reviewed the role of common files, such as a README, that help set a project's norms and expectations.</p>
<h3 id="motivationsandblockers">Motivations and blockers</h3>
<p>After discussing terminology, we to motivations and blockers. We wanted to surface attendee motivations and blockers in team collaboration. To get at this information, we asked attendees to answer the following questions:</p>
<ul>
<li>What's the easiest part of your daily collaborations?</li>
<li>What do you want to improve?</li>
<li>What is keeping your team from using a Git-based workflow?</li>
</ul>
<h4 id="whatseasy">What's easy?</h4>
<p>To identify and create standardized communication and software development practices for their labs, we started with what's working. We ask the question &quot;What's the easiest part of your daily collaborations?&quot; to start participants off thinking about what works for them now. Later, when we discuss blockers, we can return to &quot;what's easy&quot; and re-frame the blocker in terms of what the participants have identified as easy for them. We saw answers such as:</p>
<ul>
<li>in-person conversations</li>
<li>regular communications they have with their lab</li>
<li>when I know who is responsible for what</li>
<li>we know who needs to make decisions</li>
<li>communication is clear</li>
</ul>
<p>Two categories of answers came up. Conversations with people and situations where expectations are clear are the core of well-done group collaboration.<br>
<em>(One participant who runs a research group brought up SCRUM, see <a href="https://drum.lib.umd.edu/handle/1903/10743">Adapting Scrum to Managing a Research Group</a> for more.)</em></p>
<h4 id="whatneedswork">What needs work?</h4>
<p>By asking &quot;What do you want to improve?&quot; we looked for common themes motivating participants to attend the session. Most of the participants already knew how to use Git. However, a knowledge of Git does not guarantee a smooth collaboration and a successful project.</p>
<p>There were two categories of responses: technical skills and project management. Some participants wanted to improve their working knowledge of Git, for example a person who understands branches but is not sure that they're doing <em>right</em> and is looking for best practices. Other participants wanted to improve their team's approach to project management and were interested in implementing code review, standups, and other processes to keep the team working together efficiently. Some of the answers were:</p>
<ul>
<li>get a handle on branches, forks, merges</li>
<li>get everyone on team aligned doing things the same way</li>
<li>getting feedback regularly</li>
<li>code review</li>
</ul>
<p>Code for Science and Society has put a lot of effort into understanding and developing systems for diverse teams. We've seen that the best process tends to be the one that can be adopted and sustained. In our discussions of best practices, we surfaced some key points, reviewed open source best practices, shared community resources like <a href="http://openopensource.org/">openopensource.org</a>, and discussed how to use issue templates.</p>
<h4 id="blockers">Blockers</h4>
<p>With a view of what was easy for participants and what they wanted to improve, we then moved on to discuss the specific blockers. We did this by asking, &quot;What is keeping your team from a Git-based workflow?&quot; Interestingly, the main blockers were things we classify as <em>people problems</em> rather than technical problems. Examples included:</p>
<ul>
<li>people in my lab have different levels of understanding of Git</li>
<li>communication — lack of common language, lack of Git familiarity</li>
<li>some people don't like change</li>
<li>students have projects of their own and don't have collaboration needs</li>
</ul>
<p>While most of our participants had a good understanding of Git, not everyone in their work groups are at the same level. This creates situations where there is no common process or understanding. Fortunately, team members don't need a deep understanding of Git to collaborate. People who are not familiar with Git can use the GitHub web interface to submit issues and comments, contribute to project management with GitHub Projects, and write documentation and user guides. This is critical to helping those people develop comfort with GitHub and incorporating them into a Git-based workflow, even if they won't be making commits for a while.</p>
<p>The last two points raised by participants fall into the &quot;we don't need to bother with this&quot; category. Some people don't like change. However, we have seen people who are resistant to change get on board quickly once they see a project progressing. An open source style workflow allows anyone in the group to see project progress, and this tends to have a motivating effect on supervisors and others.</p>
<p>The flip side of &quot;we don't need to bother with this&quot; is a scientist who is used to working independently. In most groups there is an expectation that their code and methods will be reused by future lab members. Even an independent project can benefit from code review. We encourage people to onboard independent workers with a code review process that formalizes and documents the comments that are already happening in the lab. <em>(See <a href="https://gist.github.com/vievehal/c3fa3ff081afb2d7f0d0bf8a024468ab">this Gist</a> on code review and scientific review in labs. See also Mozilla Science's <a href="https://mozillascience.github.io/codeReview/intro.html">Code Review in Labs</a>)</em></p>
<h3 id="initiatingastandardcollaborativeprocess">Initiating a standard collaborative process</h3>
<p>The majority people who attended this session were looking for advice on best practices that were specific and relevant to their work and processes. There's no magic recipe for a good collaboration. Attendees were able to connect with other groups who collaborate with Git. At an event like the CZI HCA meeting, the opportunity to make connections with other groups working towards better project management on similar projects is valuable. Finally, by creating a space for honest discussion, we set up groups to evaluate their existing project management workflows, surface and discuss blockers to best practices with their labs, and push for improvements.</p>
<p>We're always interested in working with groups to bring the best of open source workflows to projects in new domains. If you're interested in working with us — reach out! <a href="mailto:hi@codeforscience.org">hi@codeforscience.org</a></p>
</div>]]></content:encoded></item><item><title><![CDATA[Demo: A Collaborative Shopping List Built On Dat]]></title><description><![CDATA[We've been busy developing HyperDB, a distributed scalable database for peer-to-peer collaboration. Dat Shopping List is an application that makes grocery shopping fun again! HyperDB will be integrated in Hyperdrive to allow for multi-user collaboration with Dat archives.]]></description><link>https://blog.datproject.org/2018/05/14/dat-shopping-list/</link><guid isPermaLink="false">5af08d02e47f8305fa968698</guid><category><![CDATA[Announcement]]></category><category><![CDATA[Recently]]></category><category><![CDATA[Decentralization]]></category><dc:creator><![CDATA[Jim Pick]]></dc:creator><pubDate>Mon, 14 May 2018 09:54:58 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/05/tartine-manufactory-sf.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.datproject.org/content/images/2018/05/tartine-manufactory-sf.jpg" alt="Demo: A Collaborative Shopping List Built On Dat"><p>Over the last year, we've been developing <a href="https://github.com/mafintosh/hyperdb">HyperDB</a>, a distributed scalable database for peer-to-peer collaboration. To demonstrate HyperDB, I built Dat Shopping List — an application that makes grocery shopping fun again!* Along with being a powerful decentralized key-value database, HyperDB will be integrated in Hyperdrive to allow for multi-user collaboration with Dat archives. We are really excited about the potential of this and working on integration over the coming months. Read more to learn about the most decentralized shopping list ever built!*</p>
<p><small>*<em>wild claims may not be accurate</em></small></p>
<p><img src="https://blog.datproject.org/content/images/2018/05/decentralised-dog.png" alt="Demo: A Collaborative Shopping List Built On Dat"> <small>Dat works without centralized servers. (Credit: @mafintosh)</small></p>
<p>Dat is a purpose-built tool to allow sharing of research data in a way that enables effective collaboration and reproducibility. Out-of-the-box, it gives you a distributed network, data history, and security. With the development of HyperDB, we are excited to see Dat growing to the foundation for other community collaboration tools.</p>
<h2 id="datisgrowingandevolving">Dat is growing and evolving</h2>
<p>Since its initial release, Dat has attracted an enthusiastic developer community with ideas and use cases that extend beyond its original mission. Dat has evolved into a whole ecosystem of related open source tools, libraries, and services. Some examples:</p>
<ul>
<li>the <code>dat</code> <a href="https://docs.datproject.org/install#in-the-terminal">command line tool</a> for sharing data (<a href="https://try-dat.com/">online tutorial</a>)</li>
<li>the <a href="https://github.com/dat-land/dat-desktop">Dat Desktop</a> app for sharing files over the internet</li>
<li>a full ecosystem of open source libraries in <a href="https://docs.datproject.org/ecosystem">JavaScript</a> (plus a new <a href="https://datrs.yoshuawuyts.com/">Rust</a> project is growing fast)</li>
<li>commercial services such as <a href="https://hashbase.io/">Hashbase</a> are providing hosted solutions</li>
<li><a href="https://beakerbrowser.com/">Beaker Browser</a> is a powerful vision of a web browser that gives all people the power to publish their work to the web without servers</li>
<li>the community has even organized grassroots <a href="https://peer-to-peer-web.com/">Peer-to-Peer Web</a> conferences in Los Angeles and Berlin, and soon New York</li>
</ul>
<p><img src="https://blog.datproject.org/content/images/2018/05/peer-to-peer-web-la-tara.jpg" alt="Demo: A Collaborative Shopping List Built On Dat"><br>
<small>Tara Vancil giving a talk at Peer-to-Peer Web L.A. (Credit: @jimpick)</small></p>
<p>Dat is designed to evolve. The initial version of Dat was designed to allow a single researcher to publish a set of files to the world, and to be able to update the files over time ... all while preserving history for reproducibility. But quickly, Dat caught on with other use in a variety of communities.</p>
<h2 id="hyperdbdatmultiwriter">HyperDB: Dat Multiwriter</h2>
<p>One of the most desired features for Dat is collaboration between multiple users, known as <em>multiwriter</em> (as opposed to <em>singlewriter</em>, which Dat is currently). We especially heard this need in the <a href="https://blog.datproject.org/2017/10/06/dat-in-the-lab-ucdavis-1/">feedback from the research community</a>. Led by Mathias Buus (<a href="https://github.com/mafintosh">@mafintosh</a>), we have spent more than a year developing HyperDB, a distributed key-value database. Alongside this, we are upgrading the core internal libraries of Dat to use HyperDB and support improved collaboration. HyperDB is working today, but we are continuing to test and integrate into hyperdrive and other Dat tools.</p>
<p>Multiwriter support is a major change. To ensure a smooth roll out, we are working with the <a href="http://github.com/datprotocol/">Dat Protocol Working Group</a> to define backwards compatibility requirements. This process, and work on integration will be a priority in the coming months.</p>
<h2 id="datshoppinglistdemo"><em>Dat Shopping List</em> Demo</h2>
<p>The <em>Dat Shopping List</em> application demonstrates how to use HyperDB to collaborate with other users on a single database. We built this to showcase multiwriter support and guide others on its use. <a href="https://codeforscience.org/">Code for Science &amp; Society</a>, the non-profit organization sponsoring Dat, contracted me (<a href="https://jimpick.com/">Jim Pick</a>) to demonstrate how HyperDB works, improve any API peculiarities, and surface any remaining bugs.</p>
<p>You can launch the demo in any web browser:</p>
<ul>
<li><a href="https://dat-shopping-list.glitch.me/">https://dat-shopping-list.glitch.me/</a></li>
</ul>
<p><img src="https://blog.datproject.org/content/images/2018/05/dat-shopping-list-basic.gif" alt="Demo: A Collaborative Shopping List Built On Dat"><br>
<small>A quick preview of what you can do with the app</small></p>
<p>The demo is a simple to do list app, in the spirit of the <a href="http://todomvc.com/">TodoMVC</a> project.</p>
<h3 id="videowalkthrough">Video Walkthrough</h3>
<p>Here is a short (2.5 minute) walkthrough of the demo.</p>
<p><video src="https://dat-shopping-list-video-jimpick.hashbase.io/dat-shopping-list-1.mp4" controls></video></p>
<h3 id="mobilephonesupport">Mobile phone support</h3>
<p>Where the demo really shines is on your mobile phone. The demo is built using some <a href="https://developers.google.com/web/progressive-web-apps/">cutting-edge</a> web browser features, so you can use <em>Add to Home Screen</em> on your phone, and you can use it just like you would use any other app.</p>
<p><img src="https://blog.datproject.org/content/images/2018/05/dat-shopping-list-pwa.jpg" alt="Demo: A Collaborative Shopping List Built On Dat"></p>
<h3 id="offlinesupport">Offline support</h3>
<p>When you create a shopping list, under the covers, you are actually creating a multiwriter Dat archive. Each shopping list is a separate Dat archive, and each shopping list item is just a tiny little file inside a directory.</p>
<p><img src="https://blog.datproject.org/content/images/2018/05/dat-next-under-the-covers.png" alt="Demo: A Collaborative Shopping List Built On Dat"></p>
<p>The master copy is stored in the storage of your web browser, along with the secret key which ensures that you are the only one that can modify it.</p>
<p>Having the master copy inside your web browser is great, because that lets you make changes to the document, even when you aren't connected to the internet. So you can put your phone in airplane mode, and still use the app.</p>
<p><img src="https://blog.datproject.org/content/images/2018/05/online-offline.gif" alt="Demo: A Collaborative Shopping List Built On Dat"><br>
<small>Making changes anytime is okay, even when offline. Changes sync when going back online.</small></p>
<h3 id="synchronizingandkeys">Synchronizing and Keys</h3>
<p>Of course, a multiwriter demo would be boring if it only ran on one device. Dat has always had great support for replicating data across the internet.</p>
<p>Dat has been single writer up until now, and it's identity model was essentially a single <em>key pair</em>, a set of keys — one public key and one private key. With the new support for multiple writers, each writer will have its own key pair, and users exchange keys to grant access to new writers.</p>
<p>In this demo, the user shares their keys to the shopping list owners to get authorized.</p>
<p><img src="https://blog.datproject.org/content/images/2018/05/authorization-1.png" alt="Demo: A Collaborative Shopping List Built On Dat"><br>
<small>We authorize new writers by copying and pasting their keys into the owners shopping list.</small></p>
<p><img src="https://blog.datproject.org/content/images/2018/05/authorization-2.png" alt="Demo: A Collaborative Shopping List Built On Dat"><br>
<small>Once authorized, other users can update the shopping list.</small></p>
<p>A truly user-friendly application might hide the keys and key exchange details from the end users by incorporating some sort of identity-based account and login system. But, for this demo, we wanted to illustrate the core authorization mechanism that multiwriter Dat uses. Asking people to understand and do the key exchange manually is not very friendly, but it is flexible and powerful.</p>
<h3 id="nodejsgateway">Node.js Gateway</h3>
<p>This demo works in any modern browser. It has been tested in Chrome, Firefox, Safari and Microsoft Edge, as well as Mobile Safari and Chrome on Android.</p>
<p>If the demo had been written for <a href="https://beakerbrowser.com/">Beaker Browser</a>, which speaks the Dat protocol natively to the internet, there would be no need for a gateway. But in order to allow the demo to work across many browsers, the app replicates the shopping list data across a websocket connection to the server from which it was loaded. The Node.js server will then talk to the rest of the peer-to-peer swarm on behalf of the client.</p>
<p>By necessity, the data is shared with the gateway, so if you want to share data privately, you will need to run your own private gateway. It is not as scary as it sounds though... all the code is on <a href="https://github.com/jimpick/dat-shopping-list">GitHub</a>, and you can <a href="https://glitch.com/edit/#!/remix/dat-shopping-list">remix</a> the demo on Glitch with a single click.</p>
<h3 id="podcastinterviewwithmathiasbuus">Podcast Interview with Mathias Buus</h3>
<p>If you would like to learn a bit more, Mathias was a recent guest on the <a href="https://dat-cast.hashbase.io/">DatCast podcast</a> and talked a bit about the origins of the Dat project and how multiwriter and hyperdb came to be.</p>
<h2 id="supportthedatproject">Support the Dat Project!!!</h2>
<p>Thank you to <a href="https://codeforscience.org/">Code for Science &amp; Society</a> for hiring me to build the demo. They are a non-profit and depend on contributions to continue running. <a href="https://donate.datproject.org/">Donations</a> are greatly appreciated.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Data sharing between institutions]]></title><description><![CDATA[Interested in beta testing secure data sharing software? Try out the new Dat install and data sharing workflow with a friend and let us know how it went.]]></description><link>https://blog.datproject.org/2018/04/24/data-sharing-at-institutions-and-beyond-with-dat/</link><guid isPermaLink="false">5ade414de47f8305fa96865e</guid><category><![CDATA[Science]]></category><dc:creator><![CDATA[Dat Project]]></dc:creator><pubDate>Tue, 24 Apr 2018 22:17:39 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/04/0003811-DSC01725-resize.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.datproject.org/content/images/2018/04/0003811-DSC01725-resize.jpg" alt="Data sharing between institutions"><p>Data intensive research depends increasingly on the sharing and reuse of datasets. Unfortunately, collaboration can be stalled by technical roadblocks around sharing large datasets between institutions. Today, sharing files between institutions remains challenging without extensive technical infrastructure or external services.</p>
<p>As a part of the <a href="https://blog.datproject.org/tag/science/">Dat in the Lab</a> project — a project done in partnership with <a href="https://www.cdlib.org/">California Digital Library</a> and funded by the <a href="https://www.moore.org/">Gordon and Betty Moore Foundation</a>  — we have worked with University of California researchers to better understand barriers in their work. This is part of our participant-driven design process and our goal is to identify where we can remove blockers for people in data intensive fields with Dat-based workflows. What have we learned? Read on!</p>
<p>At the start of the project, one issue came up immediately — researchers still struggle to share large datasets between institutions. Why is this? Many commercial or open platforms exist for transferring files. But many of these tools require file transfer through a centralized server. By adding a third party, the act of transferring a file becomes more complex, more expensive, and reduces security. Dat works by linking the two people directly, without needing a third-party server. However, as anyone who's worked in institutional computing knows, connecting two institutional networks is not always straightforward. As we learned, institutional network security does not always mix well with peer-to-peer protocols.</p>
<p>We want make Dat great for researchers, but many researchers lack access to install software globally on their systems. With that in mind, we also created a new simplified install process and improved the feedback in the Dat command line tool to make it more intuitive. Our new installation method makes it easy to install Dat without <code>sudo</code> or <code>npm</code>, in just one step!</p>
<p>We've always aimed to make Dat great at data transfer. But sitting down with researchers and walking through each step showed us where we needed to improve. Dat in the Lab has provided new chances to try Dat out in new contexts, with folks at various levels of experience. From these experiences, we have made improvement that will make Dat easier for everyone to use.</p>
<h2 id="casestudyucdavistowashingtonstateuniversitydatatransfer">Case Study: UC Davis to Washington State University Data Transfer</h2>
<p>Here’s a short summary of a recent data sharing use case we facilitated. We've designed a workflow to enable researchers to share data across institutions, from easy installation to direct data transfer.</p>
<p>For Dat in the Lab, we helped a researcher from UC Davis securely share pre-publication genomic data with an external collaborator based in Washington State. At UC Davis, PhD student Ryan Peek is sequencing Sierra yellow-legged frogs (endangered species, Rana Sierrae) and Meghan Parsley, a PhD student at Washington State University (WSU) wants to collaborate on Ryan’s data.</p>
<p>In the end, we successfully helped Ryan and Meghan to send the data from Ryan’s server at Davis to Meghan's laptop at WSU. Ryan’s genomics group at Davis had a server that we installed Dat onto using the new standalone installer method. Ryan used the <code>dat share</code> command (in the folder where his data was located) to start sharing his research data and sent the Dat link to Meghan.</p>
<p>Initially, we tried to transfer the data onto a server on the internal network at WSU. We immediately ran into connection problems, where Dat was unable to connect to external networks. To troubleshoot this, we ran <code>dat doctor</code> on both ends. Dat doctor tests general connectivity and provides a command to test peer-to-peer connectivity. The Dat doctor determined that the server at WSU was not able to connect directly to Davis, most likely due to institutional firewall settings. We are always try to make the doctor better at spotting networking issues that may prevent Dat from working!</p>
<p>To get away from the institutional firewall, we instead transfered the data to a researcher's laptop. It worked! Ryan and Meghan were able to exchange data directly between their computers. This transfer did not require any external infrastructure or costly platforms. Additionally, Meghan was able to close her laptop and have the data transfer automatically resume when she was back online. Dat provides researchers with flexibility for quick file transfers with powerful features underneath for users that need more advanced features.</p>
<h2 id="workflowdatfordatatransfer">Work Flow — Dat For Data Transfer</h2>
<p>Interested in beta testing Dat for secure data sharing? Try out the data sharing workflow with a friend and let us know how it went!<br>
Follow this workflow and share your experience at <a href="mailto:hi@datproject.org">hi@datproject.org</a>, especially if you work as a researcher. We want to know if you succeed, or get stuck, or have an thoughts or feedback. We're interested in hearing from you, so don't hesitate!</p>
<ol>
<li><a href="https://github.com/datproject/dat/#installation">Install the Dat command line tool</a> on two separate computers that you want to share data between (can be both yours, or it's more fun with a friend).</li>
<li>With Dat, one person shares a folder and the other downloads. On the sharing side, <code>cd</code> into the folder of data you'd like to share and run the command <code>dat share</code>. It will print out a long link that starts with <code>dat://...</code>. If you want to cancel the share at any time, type CTRL+C. Your data is also private, so nobody can discover or access your data without giving the Dat link to that person.</li>
<li>On the downloading side, run the command <code>dat clone dat://…</code> with the link you received. This will create a new folder where the data will be downloaded.</li>
<li>During the transfer, you should see connection statistics and progress bars on both computers.</li>
<li>If the transfer fails or never starts, send a screenshot of the output of the command line to <a href="mailto:hi@datproject.org">hi@datproject.org</a> so we can understand where users are getting stuck.</li>
<li>If the transfer succeeds, you successfully distributed your data over a secure peer-to-peer network!</li>
</ol>
<p>Ok — how’d it go? We'd love to hear what you think, email and tell us about it. We're also interested to chat about potential use cases for Dat in your research workflow!</p>
<p>Want to play around some more? Try some more advanced topics over at <a href="https://try-dat.com">try-dat.com</a>. (<em>Note this tutorial starts with the <code>npm</code> installation, you can <code>npm</code> install or <a href="https://github.com/datproject/dat/#installation">one line install</a>.</em>)</p>
</div>]]></content:encoded></item><item><title><![CDATA[Practical Decentralization of Scholarly Data & Resources]]></title><description><![CDATA[It’s time for scholars to ask whether today’s data preservation technologies align with open scholarship’s values of access, preservation, privacy, and transparency. ]]></description><link>https://blog.datproject.org/2018/04/19/practical-decentralization/</link><guid isPermaLink="false">5ad92433e47f8305fa968658</guid><dc:creator><![CDATA[Danielle Robinson]]></dc:creator><pubDate>Thu, 19 Apr 2018 23:55:19 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/04/Screen-Shot-2018-04-19-at-4.27.15-PM-2.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><hr>
<img src="https://blog.datproject.org/content/images/2018/04/Screen-Shot-2018-04-19-at-4.27.15-PM-2.png" alt="Practical Decentralization of Scholarly Data & Resources"><p>The following post is based on my talk at the library and technology conference <a href="https://onlinenorthwest.org/">Online Northwest</a> in March 2018, titled “A vision for decentralized data preservation across a network of libraries and trusted institutions.&quot; Photo credit to <a href="https://twitter.com/saraheseymore">Sarah Seymore</a>. I'll update with the link to the recording when it's available.</p>
<hr>
<p>Technology is imbued with the values and biases of its creators (see the work of <a href="https://safiyaunoble.com/">Safiya Noble</a> for more). Today’s online data storage and preservation systems are no exception. They are built on traditional web infrastructure, which was designed with the values of hierarchical and commercial organizations. It’s time for scholars to ask whether today’s data preservation technologies align with open scholarship’s values of access, preservation, privacy, and transparency. Alternative communication tools, such as email and RSS, were built for interoperability and portability but are now largely relegated to (small) uses. Similar approaches built with decentralized technology have recently reached a new level of maturity and publicity.</p>
<blockquote>
<p>It’s time for scholars to ask whether today’s data preservation technologies align with open scholarship’s values of access, preservation, privacy, and transparency.</p>
</blockquote>
<p>Decentralized tools offer a more robust, open alternative for data management. At a fundamental level, decentralized systems distribute data across a network of linked participants. Beyond scholarly use cases, decentralized models are also changing the way the web is built as artists, activists, and technologists use them to rethink the web. This technology offers new models for community managed information sharing on the web. As decentralization remakes the legacy web, it presents scholarship with an opportunity to rethink who owns scholarly data and the pathways to access that information.</p>
<p>Today, decentralized approaches like the Dat Project offer foundational tools for integrating isolated data silos, operating at a lower level of web infrastructure to link information in existing systems. (At the end of this post is a plain language <a href="https://datproject.org/">Dat Project</a> explainer.) These modern tools present opportunities for scholars, librarians, and technologists to redesign long-term data preservation in a way that formalizes the shared values of the community within the technology itself.</p>
<h2 id="theinternetisbrokenandweareusingittoaccessanddistributeallofhumanknowledge">The Internet is broken and we are using it to access and distribute all of human knowledge.</h2>
<p>Today’s web is dominated by centralized, non-interoperable entities that sustain their businesses by enclosing and selling data we provide to them. The model of controlling content via an online platform has been implemented in every industry, from social media to scholarly publishing. We live in an age where data are increasingly collected, analyzed, and repackaged for sale. Across domains, data live online (I mean data in the most inclusive terms): the work of a writer, government data, newspaper archives, your family photos, scientific data, film archives, artist’s creations. These data live on the web with varying degrees of strategy, management, and plans for long-term upkeep.</p>
<p>The lines drawn around which data have political value, research value, or business value constantly shift. Scholarly data - the majority of which are presented, accessed, and stored online - are not an exception, and can move quickly from niche to politically charged (see <a href="http://www.ppehlab.org/datarefuge/">DataRescue</a>). Although data are valuable, data management and stewardship practices are extremely inconsistent and financially burdensome. Link rot (when links break) and content drift (when the information at a link changes) plague fields from <a href="https://www.cambridge.org/core/journals/legal-information-management/article/perma-scoping-and-addressing-the-problem-of-link-and-reference-rot-in-legal-citations/15A59548BF9882B06D3064DA7E290859">legal judgements and scholarship</a> to <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115253">biomedical research</a>. Together link rot and content drift create “reference rot” — you can read more about the impact of reference rot on scholarship across scholarly disciplines <a href="https://www.nature.com/news/the-trouble-with-reference-rot-1.17465">here</a>, <a href="https://ejournals.bc.edu/ojs/index.php/ital/article/view/9598">here</a>, and <a href="http://blogs.lse.ac.uk/impactofsocialsciences/2015/02/05/reference-rot-in-web-based-scholarly-communication/">here</a>.</p>
<p>In this landscape, librarians, technologists, and scholars are trying to manage systems that will sustain the preservation of humanity’s growing knowledge base forever.  The web is not designed for long term preservation of information. As <a href="https://stateimpact.npr.org/pennsylvania/2017/01/19/researchers-rush-to-preserve-environmental-data-they-believe-to-be-threatened-by-trump/">Laurie Allen said in 2017</a>: “the internet is a terribly unstable way to keep information available.” As decentralized models to connect people to information are developed, these models present a unique opportunity to rethink how scholarly data are stored and accessed online.</p>
<h2 id="anetworkapproach">A network approach</h2>
<p>Centralized data storage systems can only preserve what they hold in their servers. These models require custody to provide access. Data custody becomes increasingly expensive and difficult to manage as data volumes increase. Stephen Abrams asks the question, <a href="https://figshare.com/articles/_/5844369">“can we replace custody with easy access?</a>” In other words, is knowing where data are, and trusting the preservation standards of that location, equivalent to (or better than) custody? Can we reduce the burden on institutions to own everything with a mandate to know where data are and how many verified copies exist?</p>
<p>The idea of “preservation in place” where libraries bring “preservation services to the content” is <a href="https://www.google.com/url?q=http://www.cdlib.org/services/uc3/docs/Abrams-Cruse-Kunze-Preservation-is-not-a-place-final.pdf&amp;sa=D&amp;ust=1524179362754000&amp;usg=AFQjCNEQgSxzZZfwkxpKrNgzvpEl2Kc7eQ">not new</a>. In a decentralized model, custody is not required for access. By bringing preservation services to content, we replace custody with access and preservation. Data then live in a network of linked institutions that model a commons of trust.</p>
<p>To create a functional system wherein information is shared across silos between trusted institutions is a utopian idea. What would such a network require beyond technology? The most critical factor is trust. Trust in each participating entity’s standards and processes. Trust between institutions. Trust from the community that a commons can be sustained without tragedy. Today's decentralized models make preservation in place technically feasible, and interoperable with existing data preservation silos. But perhaps the cultural part of modeling a new system will be more challenging than writing the software? ;)</p>
<h2 id="joinus">Join us</h2>
<p>Scholarship today is another form of online content creation. I resisted the idea for years. But the parallels are clear. Scholarly work is used by for-profit entities to drive clicks, bring advertisers and subscribers, and sustain their businesses. We give up custody of — and sometimes rights to — our work. We then must pay for access to our own data and publications, or lose access to those resources. I believe this system is unsustainable because it is not based on the community’s values. Reducing reliance on centralized systems will help to return the control of scholarly assets to the creators and trusted institutions that value access and scholarship. Data preservation and access are fundamentally about trust. Who do you trust to steward human knowledge? I trust libraries, scholars, and public interest technologists over entities with clear business interests.</p>
<blockquote>
<p>What’s important to the scholarly community? Are those values are reflected in our technology?</p>
</blockquote>
<p>Today’s online scholarly infrastructure values systems of centralized control, people with reliable internet connections, people/institutions with money to pay for access (when there’s no money — it values people/institutions with the technical capacity and time to do the work themselves).  The web is being reimagined today as a network wherein data are freely shared between linked users. For scholars and librarians, it’s a chance to step back and assess what assumptions and values are baked in to today’s open scholarly infrastructure. What’s important to the scholarly community? Are those values are reflected in our technology? Let’s reexamine how scholarship and data live on the web and move into a future where our values are reflected in our technology choices.</p>
<p>Save the date for our next community call May 31st 11am PST. Join our <a href="https://datproject.us16.list-manage.com/subscribe/post?u=993df3c1e35c9b224b64ccf72&amp;id=128a796b8e">mailing list</a> for a reminder. Follow me us on the tweets, @daniellecrobins, @dat_project, and @codeforsociety.</p>
<hr>
<p>Thanks to the <a href="https://onlinenorthwest.org/about/">2018 Online Northwest Program Planning Committee</a> for inviting me to speak and organizing such a fantastic conference! And extra thanks to Joe Hand, Karissa McKelvey, John Chodacki, Robin Champieux, and Stephen Abrams for great discussion and comments on drafts of this work.</p>
<hr>
<h2 id="whatsdatwhyareweworkingwiththistechnology">What’s Dat why are we working with this technology?</h2>
<p>Dat is an open source, non-profit backed peer-to-peer file sharing protocol originally developed to distribute large datasets. It’s not blockchain based, and instead uses an append only log to track changes and a private key to allow an author to make changes.</p>
<p>When a folder is tracked with Dat, it creates a unique persistent identifier for that package of data (whatever is in the folder). This unique identifier is not based on location or the folder’s content. In this system, a folder of data can change location or contain dynamic content while keeping the same identifier. Dat then tracks changes to the contents of the folder with a transparent change log. Any reader can view the change log, early versions of the dataset, or keep the folder synced to always have the latest version.</p>
<p>The Dat identifier can be used to track that package of data across a network. Using the identifier, anyone can see how many verified copies exist, download copies, and re-share it from another computer. A Dat package can contain any file type and tracking with Dat does not change the contents of the package. It’s a lightweight and flexible system that prioritizes user control of data sharing across a decentralized network. For more on how Dat works, check out the <a href="https://github.com/datproject/docs/blob/master/papers/dat-paper.pdf">whitepaper</a>.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Dat Project Updates: Focusing Our Goals & Community Structure]]></title><description><![CDATA[<div class="kg-card-markdown"><p>Over the last few years, Dat Project has grown from a tool for data transfer to a wider community building peer-to-peer applications. We are continually impressed and excited about the work being done in our community. As the community has grown its needs have changed; to better support and sustain</p></div>]]></description><link>https://blog.datproject.org/2018/03/22/dat-governance-updates/</link><guid isPermaLink="false">5aa19bf14653f455ab95dc49</guid><category><![CDATA[Announcement]]></category><category><![CDATA[Community]]></category><category><![CDATA[Business]]></category><dc:creator><![CDATA[Dat Project]]></dc:creator><pubDate>Thu, 22 Mar 2018 14:00:00 GMT</pubDate><content:encoded><![CDATA[<div class="kg-card-markdown"><p>Over the last few years, Dat Project has grown from a tool for data transfer to a wider community building peer-to-peer applications. We are continually impressed and excited about the work being done in our community. As the community has grown its needs have changed; to better support and sustain the Dat community, we'll shift our governance and priorities in these areas:</p>
<ul>
<li>Code for Science &amp; Society (CSS), a nonprofit, and Dat Project, an open source project, will have separate governance structures and leadership.</li>
<li>Dat will be led and governed with an open source structure through community working groups.</li>
<li>The Dat Project will focus on building and maintaining a small core of software that:
<ul>
<li>Effortlessly synchronizes data between many computers.</li>
<li>Makes it really easy to build decentralized applications.</li>
</ul>
</li>
</ul>
<p>These changes are motivated by seeing what has been working in the Dat community and being open to what needs to be improved for the health of the project. We really like people experimenting with different ways to use Dat and the commitment to creating value-driven decentralized software. We do not like how confusing it can be to get into the Dat ecosystem and the ambiguity over what is supported. We hope these changes improve these hurdles and continue to foster a growing Dat community.</p>
<h2 id="cssdatprojectorganizationalstructure">CSS &amp; Dat Project Organizational Structure</h2>
<p>The biggest structural change is that Dat and Code for Science &amp; Society (the nonprofit that supports Dat) will have separate governance frameworks. The needs, mission, and responsibilities of the nonprofit are different from those of Dat. Dat is an open source project, that needs to sustain a community and provide leadership paths to outside contributors. Our structure has served us so far, but to maintain transparency and focus for both organizations we see the value in separating the governance and leadership.</p>
<h4 id="cssinvolvementindat">CSS Involvement in Dat:</h4>
<ul>
<li>Code for Science &amp; Society will continue to fiscally sponsor Dat Project.</li>
<li>CSS will also sponsor related projects that use Dat, such as <a href="http://sciencefair-app.com/">ScienceFair</a>.</li>
<li>CSS will continue to build tools with Dat, such as <a href="https://datbase.org">DatBase</a> and software for <a href="https://blog.datproject.org/tag/science/">Dat in the Lab</a>, focused on advancing the CSS mission.</li>
</ul>
<h2 id="datopensourcegovernance">Dat Open Source Governance</h2>
<p>We are prioritizing leadership and governance structures around the Dat Project, community governance, and underlying protocol specification in the coming months.</p>
<ul>
<li>To support the strategy and vision for Dat Project, we will create an open source governance team, led by core Dat Project members and long-time outside contributors. This group will also be involved in financial and legal decisions regarding the Dat Project.</li>
<li>To advance the protocol development and third-party implementations, we have convened a <a href="http://github.com/datprotocol/working-group">Dat Protocol Working Group</a>. This group will document all aspects of the <a href="https://www.datprotocol.com">Dat specification</a> and make <a href="https://github.com/datprotocol/DEPs">protocol decisions</a> moving forward.</li>
<li>To uphold community values and foster a welcoming community, we will create a working group of community members to uphold the Code of Conduct.</li>
</ul>
<p>In creating these leadership teams and working groups, we aim to ensure a wide variety of stakeholders are involved in strategic and technical decisions. Additionally, we want to provide a pathway to leadership for people from a variety of organizations and with various backgrounds and types of professional expertise.</p>
<h2 id="datprojectcorefocus">Dat Project Core Focus</h2>
<p>We set out to improve access to public data and created a new protocol along the way. In our work on developing Dat, we found a under-served need. Decentralized software has potential to return control of digital information to the people. Today, building peer-to-peer applications presents both technical and ethical challenges but Dat is slowly changing that. Through projects such as <a href="https://beakerbrowser.com">Beaker Browser</a> and Dat-based collaborative tools (e.g. <a href="https://medium.com/@pvh/pixelpusher-real-time-peer-to-peer-collaboration-with-react-7c7bc8ecbf74">Pixel Pusher</a>), our community has demonstrated the promise of a new model for digital tools.</p>
<p>To encourage people to experiment and innovate with decentralized technology, we aim to make Dat foundational software for peer-to-peer applications – one that is backed by a mission-driven nonprofit. To realize this future, we want to make sure Dat is really good at the core underlying needs of peer-to-peer applications. We hope that with our small but critical focus, we can create a strong building block for the Dat ecosystem.</p>
<hr>
<h3 id="asalwaysfeedbackwelcome">As always, feedback welcome!</h3>
<p>There has been great discussion and feedback from our community around these issues, and we want to thank everyone! Let us know if you have questions on <a href="https://twitter.com/dat_project">Twitter</a> or via email (<a href="mailto:hi@datproject.org">hi@datproject.org</a>).</p>
</div>]]></content:encoded></item><item><title><![CDATA[Apply to work with us as a Ford-Mozilla Open Web Fellow!]]></title><description><![CDATA[<div class="kg-card-markdown"><p>We are thrilled for Code for Science &amp; Society to be a <a href="https://medium.com/read-write-participate/welcoming-11-new-partners-in-the-quest-for-internet-health-843c8d1b2bf9">host organization</a> for the 2018 Ford-Mozilla Open Web Fellowship. Our Open Web Fellow will work on using Dat and peer-to-peer technologies for cultivating a healthier internet for the public good.</p>
<p><strong>What does this mean?</strong> You can <a href="https://foundation.mozilla.org/fellowships/apply/">apply to</a></p></div>]]></description><link>https://blog.datproject.org/2018/03/21/css-apply-open-web-fellow/</link><guid isPermaLink="false">5ab01278857e5731dd29d371</guid><category><![CDATA[Announcement]]></category><dc:creator><![CDATA[Danielle Robinson]]></dc:creator><pubDate>Wed, 21 Mar 2018 14:00:00 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/04/mozfellows-1.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.datproject.org/content/images/2018/04/mozfellows-1.png" alt="Apply to work with us as a Ford-Mozilla Open Web Fellow!"><p>We are thrilled for Code for Science &amp; Society to be a <a href="https://medium.com/read-write-participate/welcoming-11-new-partners-in-the-quest-for-internet-health-843c8d1b2bf9">host organization</a> for the 2018 Ford-Mozilla Open Web Fellowship. Our Open Web Fellow will work on using Dat and peer-to-peer technologies for cultivating a healthier internet for the public good.</p>
<p><strong>What does this mean?</strong> You can <a href="https://foundation.mozilla.org/fellowships/apply/">apply to work with us</a>, or one of the other great host organizations, for 10 months starting in September 2018! You'll get paid! And health insurance and childcare supplements! You'll go to MozFest! You'll meet so many people that your world will expand.</p>
<p>At <a href="https://codeforscience.org">Code for Science &amp; Society</a> (CSS) and the Dat Project, we believe that community-driven, decentralized technologies are the future of the web. Today, the rise of centralized services threatens freedom and privacy. Decentralized, open alternatives exist, but are not yet widely adopted. Dat is an open source peer-to-peer protocol. It’s the basis of peer-to-peer browsers (<a href="https://beakerbrowser.com/">Beaker Browser</a>), social networks, and data sharing tools. Backed by CSS, Dat takes a user-centered approach to developing projects. Hackers, activists, and champions of the open web have been drawn to the Dat community where they are creating a new ecosystem of peer-to-peer applications.</p>
<h2 id="keydetails">Key details</h2>
<p><strong>Application link:</strong> <a href="https://foundation.mozilla.org/fellowships/apply/">https://foundation.mozilla.org/fellowships/apply/</a></p>
<p><strong>Application dates:</strong> The fellowship application opens today(!) and closes on April 20th (Friday at 5pm ET).</p>
<p><strong>What are we looking for in a fellow?</strong> We are looking for someone who will make an impact on the peer-to-peer web over 10 months. We are a small, flexible organization and we are looking for a good fit. The fellow has the opportunity to select a focus area of their choice related to our work.</p>
<p><strong>Does the fellow have to work with Dat?</strong> The Dat Project is our flagship sponsored project, and much of our community and resources center around it. We are looking for a fellow who will make use of our existing strengths, which will probably involve working with Dat in some way. You <em>do not</em> need to be an experienced Dat user or developer to be a good fit for this fellowship.</p>
<p><strong>Location, citizenship:</strong> A fellow can be based anywhere and will work remotely! <a href="https://twitter.com/daniellecrobins">Danielle</a> and <a href="https://twitter.com/joeahand">Joe</a>, will mentor the fellow. We are both located in Portland, OR, USA. The Dat team is distributed across the world. If you want to come work with us in Portland for some (or all) of your fellowship, that works too! Just be aware that we don't have an office, we work out of our homes and coffee shops. The coffee is good here!</p>
<p><strong>Application logistics:</strong> There are a few different fellowship options, select &quot;Open Web&quot; to apply to work with us.</p>
<p><strong>Questions?:</strong> Reach out to us (<a href="mailto:hi@codeforscience.org">hi@codeforscience.org</a>, <a href="http://twitter.com/codeforsociety">@codeforsociety</a>) or contact Mozilla (<a href="mailto:fellowships@mozilla.org">fellowships@mozilla.org</a>) with any questions. Also keep an eye out for the Mozilla AMAs on the application.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Code for Science & Society - Community Call wrap up!]]></title><description><![CDATA[Last week we brought together people from across industries including academia, publishing, and technology for our first community call.]]></description><link>https://blog.datproject.org/2018/03/05/css-community-call-03-2018/</link><guid isPermaLink="false">5a9da5264653f455ab95dc35</guid><category><![CDATA[Community]]></category><dc:creator><![CDATA[Code for Science & Society]]></dc:creator><pubDate>Mon, 05 Mar 2018 22:21:20 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1442504028989-ab58b5f29a4a?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;s=53555c8bcfa230f78e9e8ae6d64e4f78" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://images.unsplash.com/photo-1442504028989-ab58b5f29a4a?ixlib=rb-0.3.5&q=80&fm=jpg&crop=entropy&cs=tinysrgb&w=1080&fit=max&ixid=eyJhcHBfaWQiOjExNzczfQ&s=53555c8bcfa230f78e9e8ae6d64e4f78" alt="Code for Science & Society - Community Call wrap up!"><p>Last week we brought together people from across industries including academia, publishing, and technology for our first community call. We will be hosting these calls every quarter to spread ideas, make connections, and learn about each others projects. As a nonprofit, we work to improve access to knowledge across several domains and this call is a perfect venue for facilitating that.</p>
<p>Did you miss it? You can watch the <a href="https://air.mozilla.org/code-for-science-society-community-call/">the recording</a> and view <a href="https://public.etherpad-mozilla.org/p/CodeforScienceandSociety-Community-Call-2018-03-01">the notes</a>! Follow <a href="https://twitter.com/dat_project?lang=en">Dat Project on Twitter</a> or <a href="http://eepurl.com/dj7pnj">sign up for our email list</a> so you can catch the next one.</p>
<p><strong>Why host a call?</strong> We wanted to add another channel for regular project updates and a chance to make our online connections a little more IRL. We have a unique community! We can pull speakers from friendly and interesting projects across multiple domains who may not cross paths but face similar challenges. It's a chance to bring the global community together to interact between conferences and meetups.</p>
<p>We currently support three projects via fiscal sponsorship and are growing an awesome community of technology for the public good projects that span open data, open research, science, peer-to-peer web, and technology for the public good. Central to our mission to bringing people together from different fields to share ideas.</p>
<p>During the call, we hosted three speakers, updates from our sponsored projects, and non-verbal updates from the community:</p>
<ul>
<li>
<p>Karissa McKelvey (one of our fantastic <a href="https://blog.datproject.org/2018/02/13/css-board/">new board members</a>!!!) and Stephen Whitmore spoke about <a href="https://www.digital-democracy.org/">Digital Democracy</a>. Digital Democracy’s mission is to empower marginalized communities to use technology to defend their rights. In simple language,they fight for indigenous self-determination through tech &amp; local partnerships. They build tools that work in remote, rural locations that help people defend their land ownership and other rights.</p>
</li>
<li>
<p>Peter van Hardenberg did a demo of <a href="https://medium.com/@pvh/pixelpusher-real-time-peer-to-peer-collaboration-with-react-7c7bc8ecbf74">PixelPusher</a>. An experiment to really understand the limitations and needs for peer-to-peer applications and data sync. You really need to check out his demo! It was pretty cool to see real-time how collaborative conflict resolution can happen on via a peer-to-peer network.</p>
</li>
<li>
<p>Tara Vancil covered <a href="http://beakerbrowser.com/">Beaker Browser</a>, a project near and dear to our hearts. If you are following Dat and haven't seen Beaker, you are in for a treat! With Beaker you can browser websites over Dat and so much more! Tara gave some updates and vision of where they are headed.</p>
</li>
<li>
<p>Danielle and Joe did the <a href="https://blog.datproject.org/2018/03/05/css-community-call-03-2018/Datproject.org">Dat</a> update, calling out recent work on <a href="https://blog.datproject.org/tag/science/">Dat in the Lab</a>. Stay tuned for an update to the CLI to address data transfer issues for researchers. The new <a href="http://github.com/datprotocol/">Dat Protocol Working Group</a> is working to formalize existing Dat specification and create processes for updating protocol. This group meets every other Wednesday in #datprotocol IRC - all welcome!</p>
</li>
<li>
<p>Nokome Bentley delivered the <a href="https://stenci.la">Stencila</a> update. Stencila aims to make reproducible research more accessible through familiar word processor and spreadsheet interfaces. Check out the (most recent builds)[<a href="http://builds.stenci.la/stencila/">http://builds.stenci.la/stencila/</a>]. Stencila is also looking to develop function libraries, to make it easier for users to add custom functions and to write domain-specific function libraries that can be used within Stencila. Learn more at the Stencila <a href="https://github.com/stencila/libtemplate">work in progress libtemplate</a>. And finally, on the interoperability front, chek out Stencila's ongoing <a href="https://github.com/stencila/convert">work on converters</a>. Initial working version of converters for Jupyter Notebooks, RMarkdown to Stencila Articles (so you can port your markdown notebook and code into a JATS-based article format to write up a publication) are just one exciting thing discussed!</p>
</li>
<li>
<p>ScienceFair's Richard Smith delivered an async update. He's partnered with <a href="http://worldbrain.io">WorldBrain</a> to work on shared distributed knowledge infrastructure issues. In the process, he's developed blazing fast Dat compatible fulltext search index with WorldBrain in pure JS that scales linearly to millions of documents (expect release in the next month - boom! 🔥🔥🔥). Major work on ScienceFair datasources is underway, including all of <a href="https://www.ncbi.nlm.nih.gov/pmc/">PMC</a>. This would mean that theree full-text archive of biomedical and life sciences journal literature maintiained by the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM) would be searchable, annotatable, and sharable. Stay tuned!</p>
</li>
</ul>
<p>A few events were announced on the call:</p>
<ul>
<li>Danielle Robinson will keynote <a href="https://onlinenorthwest.org/">Online Northwest</a>, a conference on libraries, technology, and culture - March 30th in PDX</li>
<li>Tara Vancil (Beaker) and Mathias Buus (Dat) will speak at SXSW - <a href="https://schedule.sxsw.com/2018/events/PP73084">Democratizing Data Science with Offline First</a> - Mar 10, 2018</li>
<li>Paul Frazee (Beaker) will be speaking in London at a <a href="https://ti.to/we-love-tech/we-love-peer-to-peer-web">Tech We Love event</a> - Mar 21, 2018</li>
<li>May 10- 11 2018 - <a href="https://elifesciences.org/about/innovation">eLife Innovation</a> will host a <a href="https://elifesciences.org/events/c40798c3/elife-innovation-sprint-2018">sprint in location in Cambridge</a> to coincide with the <a href="https://foundation.mozilla.org/opportunity/global-sprint/">Mozilla Global Sprint</a>. If you join the Mozilla sprint to build something with Dat and/or Beaker be sure to tell us about it!</li>
</ul>
<p>Thanks to all our speakers, attendees, and extra special thanks to Aurelia Moser and Steph Wright (and the Mozilla Science family) who helped us get organized and live stream the call.</p>
<ul>
<li>Recording: <a href="https://air.mozilla.org/code-for-science-society-community-call/">https://air.mozilla.org/code-for-science-society-community-call/</a></li>
<li>Notes: <a href="https://public.etherpad-mozilla.org/p/CodeforScienceandSociety-Community-Call-2018-03-01">https://public.etherpad-mozilla.org/p/CodeforScienceandSociety-Community-Call-2018-03-01</a></li>
</ul>
<p>Follow <a href="https://twitter.com/dat_project?lang=en">Dat Project on Twitter</a> or <a href="http://eepurl.com/dj7pnj">sign up for our email list</a> to get notified for the next community call.</p>
<p>Have a project that you want to share on the next call? Let us know! Email <a href="mailto:hi@codeforscience.org">hi@codeforscience.org</a> with information about your project and why our community needs to learn about it.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Quarterly Community Call March 1]]></title><description><![CDATA[Save the date for the first ever Code for Science & Society Community call! Join us March 1, 2018:  11am Pacific / 2pm Eastern / 7pm UK / 8pm Berlin / March 2 8am NZ!]]></description><link>https://blog.datproject.org/2018/02/16/community-call-march-1/</link><guid isPermaLink="false">5a8725ed8b05bb1208ee4ee6</guid><category><![CDATA[Announcement]]></category><dc:creator><![CDATA[Code for Science & Society]]></dc:creator><pubDate>Fri, 16 Feb 2018 18:57:29 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/02/IMG_0829.JPG" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.datproject.org/content/images/2018/02/IMG_0829.JPG" alt="Quarterly Community Call March 1"><p>Save the date for the first ever Code for Science &amp; Society Community call!<br>
Join us March 1, 2018:  11am Pacific / 2pm Eastern / 7pm UK / 8pm Berlin / March 2 8am NZ!</p>
<p>Featuring updates from the <a href="http://datproject.org">Dat project</a>, <a href="http://stenci.la">Stencila</a>, and <a href="http://sciencefair-app.com">ScienceFair</a>. As well as speakers from the community. <a href="https://twitter.com/okdistribute?lang=en">Karissa McKelvey</a> will talk about Digital Democracy's <a href="https://www.digital-democracy.org/mapeo/">Mapeo</a> is an offline-first peer-to-peer desktop app for Open Street Map. It uses underlying pieces of Dat for it's data and replication models! <a href="https://twitter.com/pvh">Peter van Hardenberg</a> will talk about modeling merge conflicts with a pixel art editor. Team <a href="https://beakerbrowser.com/">Beaker Browser</a> will update us on their work.</p>
<p>Our agenda is evolving here: <a href="https://public.etherpad-mozilla.org/p/CodeforScienceandSociety-Community-Call-2018-03-01">https://public.etherpad-mozilla.org/p/CodeforScienceandSociety-Community-Call-2018-03-01</a></p>
<p>Do you want an <a href="http://eepurl.com/dj7pnj">email reminder</a>?</p>
<p>Our goal with the community call is to add one more place for the people to connect with our community of projects and people. We will feature CSS sponsored projects and speakers from the community talking about their work. It's a chance to bring the global community together to interact between conferences and meetups. It'll be fun.</p>
<p>We are still confirming the logistics, but the call will be streamed! We will update this post with information on how to tune in.</p>
<p>Do you want to speak on a future call? Are you doing something in the public interest technology space? Approaching a technical problem? Or did you make some weird/fun art we should check out? Want to share with our community? Email Danielle and Joe <a href="mailto:hi@codeforscience.org">hi@codeforscience.org</a>!</p>
</div>]]></content:encoded></item><item><title><![CDATA[Code for Science & Society Expands Board]]></title><description><![CDATA[We held our first board meeting in San Francisco last week. Our board oversees operations and will support us as we develop sustainable strategic vision for the nonprofit and the Dat project.]]></description><link>https://blog.datproject.org/2018/02/13/css-board/</link><guid isPermaLink="false">5a82174b8b05bb1208ee4eda</guid><category><![CDATA[Announcement]]></category><dc:creator><![CDATA[Code for Science & Society]]></dc:creator><pubDate>Tue, 13 Feb 2018 18:52:00 GMT</pubDate><media:content url="https://blog.datproject.org/content/images/2018/02/IMG_0869_crop.JPG" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.datproject.org/content/images/2018/02/IMG_0869_crop.JPG" alt="Code for Science & Society Expands Board"><p>We held our first board meeting in San Francisco last week where we welcomed two new board members and a board advisor. Our board oversees operations and will support us as we develop sustainable strategic vision for Code for Science &amp; Society (CSS) and the Dat Project.</p>
<p>The role of CSS, a nonprofit, has been two-fold, it has housed Dat leadership and acted as a fiscal sponsor for other projects (learn more at <a href="https://codeforscience.org/">codeforscience.org</a>). As we formalize the structure of CSS, we will ensure Dat governance is in line with open source best practices and the community's needs. CSS will continue to employ people involved in Dat Project, but new community-rooted processes will govern Dat and bring more transparency to the project. We will communicate more on the process of establishing a new governance model over the coming months. In the meantime, we’d like to introduce our board:</p>
<img src="https://blog.datproject.org/content/images/2018/02/kaitlin.jpg" width="200" height="auto" alt="Code for Science & Society Expands Board">
<p>Kaitlin Thaney is the Endowment Director of the Wikimedia Foundation and is our new  🎉 Board Chair 🎉!!! Her work focuses on long-term sustainability and support for the free and open knowledge movement. Prior to joining Wikimedia, she oversaw the Mozilla Foundation's network of community programs, a $10 million portfolio crossing science, education, Internet of Things, advocacy, and gender-based programming. She also worked to build programs such as the Mozilla Science Lab, Digital Science, early Open Access and Open Data work with Creative Commons, and Datakind UK. Kaitlin was formerly on the Dat project advisory board. Kaitlin was elected to the position of Board Chairperson at our February meeting - we are thrilled to have her at the helm of our board.</p>
<img src="https://blog.datproject.org/content/images/2018/02/karissa.jpg" width="200" height="auto" alt="Code for Science & Society Expands Board">
<p>Karissa McKelvey is an open source programmer, who has been a core Dat team member for four years, and now a member of the <a href="http://www.digital-democracy.org/">Digital Democracy</a> team. You may know her as <a href="https://twitter.com/okdistribute">@okdistribute</a> online, and maybe you saw her <a href="https://blog.datproject.org/2017/09/21/dat-commons/">Full Stack Fest Keynote</a>. She managed the Dat desktop application and DatBase public registry projects. She is an innovative software developer as well as an accomplished writer, public speaker, and activist who works to support an equitable web. Formerly a research scientist at Indiana University, her work studying online political communication resulted in multiple peer-reviewed papers and press in outlets such as NPR and the Wall Street Journal. She has successfully led teams to success with diverse projects throughout her career in academia, non-profits, and industry. Her deep understanding of the technology and experience working on open web issues will bring valuable perspective to the board.</p>
<img src="https://blog.datproject.org/content/images/2018/02/kristen.jpg" width="200" height="auto" alt="Code for Science & Society Expands Board">
<p>Kristen Ratan is co-Founder and Executive Director of the Collaborative Knowledge Foundation, a nonprofit with a mission to evolve how knowledge is created, produced, and shared. Kristen has a 20-year history working to accelerate advances in science and research communication, most recently as Publisher at the Public Library of Science (PLOS).  Prior to that Kristen held leadership positions at HighWire Press, Atypon and BIOSIS.  Kristen is on the Board of Directors of the American Institute of Physics Publishing and the nonprofit Community Resources for Science. She also serves on numerous advisory boards and industry committees.</p>
<img src="https://blog.datproject.org/content/images/2018/02/waldo.jpg" width="200" height="auto" alt="Code for Science & Society Expands Board">
<p>Waldo Jaquith is with 18F’s State and Local Acquisitions practice, working with governments across the U.S. to help them better acquire software to serve the needs of the public. Previously, he was the Senior Advisor to the Sunlight Foundation, Director of the Knight-funded U.S. Open Data, ran the Knight-funded State Decoded project, and worked for the White House Office of Science and Technology. He serves as an advisor to and on the board of various organizations that sit at the intersection of technology and government.</p>
<img src="https://blog.datproject.org/content/images/2018/02/Josh-Greenberg.jpg" width="200" height="auto" alt="Code for Science & Society Expands Board">
<p>Joshua M. Greenberg, PhD is joining as a board advisor. He is director of the Alfred P. Sloan Foundation's Digital Information Technology program. Dr. Greenberg received his Bachelor of Arts in History of Science, Medicine and Technology from the Johns Hopkins University, and both Masters and Doctoral degrees from Cornell University's Department of Science &amp; Technology Studies. Before Sloan, he was Director of Digital Strategy and Scholarship at the New York Public Library, where he created the NYPL Labs team and launched a number of projects focused on deepening engagement through access to digital collections.</p>
<p>Join us in welcoming our new board members! We had a great time in San Francisco. We spent three straight days with our friends at the Internet Archive, got the chance to hang out with the <a href="https://blog.datproject.org/2018/02/13/css-board/Beakerbrowser.com">Beaker Browser</a> team, and linked up with many others.</p>
<img src="https://blog.datproject.org/content/images/2018/02/IMG_0835-rotated-1.JPG" width="500" height="auto" alt="Code for Science & Society Expands Board"> 
<p><em>Dat Protocol Working Group members at the Internet Archive: Tara, Joe, Bryan, Paul, and Danielle</em></p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Most of the Dat protocol working group at the <a href="https://twitter.com/internetarchive?ref_src=twsrc%5Etfw">@internetarchive</a>! Missing <a href="https://twitter.com/okdistribute?ref_src=twsrc%5Etfw">@okdistribute</a> and <a href="https://twitter.com/mafintosh?ref_src=twsrc%5Etfw">@mafintosh</a> <a href="https://t.co/JaAGqnlvK3">pic.twitter.com/JaAGqnlvK3</a></p>&mdash; Tara Vancil (@taravancil) <a href="https://twitter.com/taravancil/status/958868029172940800?ref_src=twsrc%5Etfw">February 1, 2018</a></blockquote>
<p>Stay tuned for more updates. As always, you can reach out to Danielle and Joe at <a href="mailto:community@datproject.org">community@datproject.org</a> anytime.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Is Open Science ready for software containers?]]></title><description><![CDATA[As part of Dat in the Lab we are working with different campuses in the University of California network. One of our goals is to publish researcher's data, code, and executable Linux container all as files in a version controlled Dat repository. ]]></description><link>https://blog.datproject.org/2018/01/26/challenges-of-decentralized-hpc-containerization/</link><guid isPermaLink="false">5a6a28c38b05bb1208ee4ec9</guid><category><![CDATA[Science]]></category><dc:creator><![CDATA[Dat Project]]></dc:creator><pubDate>Fri, 26 Jan 2018 19:04:47 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1476778642660-a2c55571cf0d?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;s=3c8e2583e5bb2b6e1986b198308d96a7" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://images.unsplash.com/photo-1476778642660-a2c55571cf0d?ixlib=rb-0.3.5&q=80&fm=jpg&crop=entropy&cs=tinysrgb&w=1080&fit=max&ixid=eyJhcHBfaWQiOjExNzczfQ&s=3c8e2583e5bb2b6e1986b198308d96a7" alt="Is Open Science ready for software containers?"><p>As part of the Dat in the Lab project we are working with different campuses in the University of California network. One of our goals is to publish researcher's data, code, and executable Linux container all as files in a version controlled Dat repository. For this to be useful, a person should be able to execute these Linux environments (aka containers) anywhere.</p>
<p>If you own your computer, or rent a cloud server, you can use <code>sudo</code> or root access to run containers. Most container systems like Docker expect you to have superuser privileges. In many university computing clusters (aka High-performance computing - &quot;HPC&quot;), hundreds of researchers share a single Linux environment and do not have the ability to install global software packages or execute commands that require <code>sudo</code>. Even workstations and laptops that are owned by an institution often do not allow users these privileges.</p>
<p>We are working with researchers who want to run a bioinformatics pipeline that was developed at UCLA on the cluster located at University of California, Merced. Both campuses have shared computing clusters running CentOS 7 Linux. However, the clusters are different. The UCLA cluster has thousands of software packages installed. These have been requested over time by UCLA researchers. These packages are not a standard part of the CentOS ecosystem and are not available to easily install at UC Merced.</p>
<p>The pipeline we are working with requires specific versions of Python, Bash, R and GCC to be installed and compiled. The UC Merced cluster does not have these versions available. The traditional way researchers have dealt with these code portability issues is by asking their cluster sysadmins to install all of the packages they need. Sometimes the sysadmins say no, as upgrading to (for example) Python 3 globally would break all Python 2 applications. Sometimes sysadmins say yes, but it takes a while as this is tedious work.</p>
<p>To address global upgrade breakages at UCLA they use the <a href="http://modules.sourceforge.net/">Environment Modules</a> system to allow users to load in specific versions of packages into their environment. For example, when you first login to a compute node shell there will be no version of R loaded. But you can run a command like <code>module load R/3.4.2</code> to get a specific version loaded into your path for that specific shell session. The UCLA pipeline we are working with has a series of <code>module load</code> directives executed at the beginning of the pipeline to ensure the correct environment is set up.</p>
<p>However, moving the pipeline to another cluster requires moving all of the modules as well. Some modules are single binary executables, easy to move to any other server with the same architecture, but others (such as R) are a complicated dependency graph including things like a certain version of an Intel Fortran compiler needed to compile R packages at runtime.</p>
<p>Moving the pipeline code is easy, but moving the underlying machinery (the specifically compiled and installed software dependencies and overall supporting Linux environment) can be very complicated. The worst part is, even if you seem to get the pipeline to work, there is no way to verify that you installed and compiled everything <em>exactly</em> the same on the new machine as it was installed on the old machine. You can only see if your program works, but there may be subtle bugs lurking that may go undetected due to an environmental difference (e.g. a slightly different C++ compiler being used which throws off the Fortran calculations in an obscure old version of an R package).</p>
<p>Our initial idea was to simply copy <em>all</em> of the modules from UCLA onto UC Merced, but this proved impossible. Running <code>du -sh</code> would have told us see how big the total installed size of all modules on the UCLA cluster was, but after many hours the command never finished. We needed another approach.</p>
<h3 id="containers">Containers</h3>
<p>In theory, containers are a better way to address this problem. Instead of having everyone on a machine share a single Linux environment, containers allow for many environments to execute at the same time. The entire container can be version controlled for reproducibility, giving you cryptographically secure confidence that the code will run the same everywhere. Before we talk about containers, let's run over the basics of Linux.</p>
<p>Conceptually, Linux is made up of the kernel and user-space. The kernel interfaces with the physical hardware of the computer and presents an API (called system calls or syscalls) to interact with the hardware. Virtual machines (VMs) emulate everything from the hardware up, including the kernel, which provides a complete emulation of an environment but comes at the cost of performance, as your application is using e.g. virtual RAM which has to get mapped to physical RAM through multiple layers of abstraction. For a bioinformatics pipeline that might be designed to use 1TB of RAM, this will reduce performance and defeat the purpose of buying specialized hardware.</p>
<p>Containers differ from virtualization in that all containers running on one machine share the same kernel, which means they are not emulating any hardware or emulating the kernel. A program like VirtualBox or Xen does not qualify a container, but is rather classified as a VM because they emulate the whole machine. There are many ways to implement a container system depending on what your requirements are. A container system is any program that will use a set of syscalls to run a program in a Linux file system. What syscalls are used determine the security tradeoffs and kernel compatibility. You may have heard the phrase &quot;everything in Linux is a file&quot;. This means that you can simply take a filesystem, throw it in a ZIP archive, and you have a container. How you execute the files as processes depends on which syscalls you use, but a container is essentially just a folder full of files and a way to run a process in that folder.</p>
<p>By renting a cloud machine, one can pick a distribution with the latest Linux kernel and use <code>sudo</code> without issue. Thereby giving access to every syscall, including the new shiny ones that modern container systems rely on. However researchers using shared university servers don't have control over the kernel and probably don't have <code>sudo</code> access.</p>
<p>To return to our goals: We want to publish researcher's data, code, and executable Linux container all as files in a version controlled Dat repository. We want anyone to be able to execute these Linux containers anywhere. To do this, it needs to be easy for researchers to set up and maintain containers on university machines without special sysadmin permissions.</p>
<h3 id="possiblesolutions">Possible solutions</h3>
<p>All of that stuff above leads to the real challenge of containers for reproducible science: making containers run everywhere, including restricted shared computing clusters. With that in mind let's look at different approaches to running containers in Linux.</p>
<h4 id="chroot">chroot</h4>
<p>The <code>chroot</code> syscall is sort of the proto-container. It's been around in Linux forever. In Linux, your filesytem starts at <code>/</code>. Usually a user puts their stuff somewhere like <code>/home/alex</code>. If you unzipped an Ubuntu distribution at <code>/home/alex/ubuntu</code>, you can use <code>chroot</code> to spawn a new process that is running relative to <code>/home/alex/ubuntu</code> but thinks it's at <code>/</code>. At it's most basic form, this lets you 'nest' filesystems and have different processes sharing the same kernel, but using different Linux environments. Using our definition above, this qualifies as a container. The downside is that there is no built-in security preventing things in the chroot from achieving root privileges, and you have to be root to create a chroot.</p>
<h4 id="rootdaemondocker">Root Daemon (Docker)</h4>
<p>Docker runs a daemon process as root that, if you also have root, you can ask to download and boot containers for you. This means that with Docker, containers are executed as the root user. Also by default only other root users can talk to the Docker daemon at all. You can change the security so that users can run Docker without sudo, but this is not recommended by Docker as it means any user can execute root commands (e.g. Docker daemon access basically <a href="https://www.projectatomic.io/blog/2015/08/why-we-dont-let-non-root-users-run-docker-in-centos-fedora-or-rhel/">means root access</a>).</p>
<p>The advantage of this approach is that it's simple. The Docker daemon manages the lifecycle of all containers running on a system. You use the Docker protocol to talk to the daemon, and the daemon translates that to whatever syscalls are available to it on the kernel it's running on. If you move to a different kernel, the Docker protocol stays the same (in theory, in practice this can break their protocol a lot which defeats the purpose of Docker in our opinion, but that's a different blog post). The downside is that the containers don't run as the user, they run as root. University sysadmins don't like this.</p>
<h4 id="rootlesscontainersrunccharliecloud">Rootless containers (runC, Charliecloud)</h4>
<p>Since Linux Kernel 3.8 you have access to a syscall <code>CLONE_NEWUSER</code> which lets you create an user namespace as an unprivileged user. There's more info in <a href="https://www.cyphar.com/blog/post/20160627-rootless-containers-with-runc">this excellent blog post</a> but the tl;dr is that you can, without <code>sudo</code>, create a process tree that maps your user on the actual system to a virtual user (like root) inside the new process namespace, meaning you can run processes as a &quot;virtual&quot; root without needing root yourself. Unfortunately in CentOS/RHEL user namespaces are <a href="https://rhelblog.redhat.com/2015/07/07/whats-next-for-containers-user-namespaces/">still not enabled</a> by default due to security concerns.</p>
<p>The <a href="https://hpc.github.io/charliecloud/">Charliecloud</a> project from Los Alamos has developed a minimal (~900 lines of C) chroot-style container system that is based on having user namespaces enabled.</p>
<p>CentOS appears to be is the only distribution that doesn't have user namespaces enabled. CentOS also happens to be popular in universities. In the long term, this seems like the best security for containers for shared environments. For now, to support CentOS as well as older servers/kernels, other solutions are still needed.</p>
<h4 id="setuidsingularitybubblewrap">SetUID (Singularity, Bubblewrap)</h4>
<p>If you don't have a way to do rootless containers, but you don't want to give your researchers root access, then you have to find a middle ground. <a href="http://singularity.lbl.gov/">Singularity</a> is one such approach that works here. To install singularity, you need to be root, so that you can use the <code>setuid</code> syscall to 'mark' the <code>singularity</code> binary executable with root privileges. This means that <code>singularity</code> can do stuff as root, but the user's program cannot. As long as Singularity doesn't have any security exploits it's a nice tradeoff. Once you have Singularity installed, you can run containers without superuser privileges, and the container processes are owned by your user, not running as root.</p>
<p>Singularity will use user namespaces if they are available instead of <code>setuid</code>, so it works similarly to Docker in that it tries to provide a portal container execution context that works on a variety of new and old Linux distributions, but differs in that it doesn't require a central root daemon process, which makes it more popular with university sysadmins.</p>
<p><a href="https://github.com/projectatomic/bubblewrap">Bubblewrap</a> is another similar approach that specifically tries to be a minimal setuid-based program that can spawn a container process and de-escalate all other privileges. The goal of bubblewrap is to keep surface area of the program minimal so that it can be audited more effectively, which is a great engineering principle that is (in my opinion) often overlooked. OpenSSL vs. Libsodium comes to mind.</p>
<h3 id="conclusion">Conclusion</h3>
<p>It's worth mentioning <a href="https://opensource.com/business/14/7/docker-security-selinux">the article that coined the phrase &quot;containers don't contain&quot;</a> when discussing container security. In the university use case we described, we aren't looking for perfect security, but rather security that fulfills the requirements of university sysadmins.</p>
<p>So, which of the above approaches are we using at UC Merced? We've put in a request for the sysadmins there to install Singularity, which seems like the best approach since we can't run rootless containers on their CentOS. If they weren't running on CentOS we would probably just use rootless containers, as we could use them without asking the sysadmins to do anything. We'll do another blog post later in the year with more details on exactly what we're building with all this container stuff, but we wanted to take the opportunity here to talk about the scope of the challenge of deploying containers in the open science ecosystem.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Dat Privacy Models: Creating Communities of Trust]]></title><description><![CDATA[<div class="kg-card-markdown"><p>Privacy on the web is an elusive thing. We often think our data is private only to later learn the company we trusted is using it for something we find unsavory (or it gets hacked and released en masse). In our view, this is one of the central shortcomings of</p></div>]]></description><link>https://blog.datproject.org/2018/01/16/dat-privacy-models/</link><guid isPermaLink="false">5a2eb862fcaf6807729887fc</guid><category><![CDATA[Decentralization]]></category><category><![CDATA[Business]]></category><dc:creator><![CDATA[Joe Hand]]></dc:creator><pubDate>Tue, 16 Jan 2018 18:21:14 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1493305344584-c928c32c2af1?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;s=bc3ca3daafa13895060eaadc5f27e1c1" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://images.unsplash.com/photo-1493305344584-c928c32c2af1?ixlib=rb-0.3.5&q=80&fm=jpg&crop=entropy&cs=tinysrgb&w=1080&fit=max&s=bc3ca3daafa13895060eaadc5f27e1c1" alt="Dat Privacy Models: Creating Communities of Trust"><p>Privacy on the web is an elusive thing. We often think our data is private only to later learn the company we trusted is using it for something we find unsavory (or it gets hacked and released en masse). In our view, this is one of the central shortcomings of the current system — centralized services are the <em>only</em> option for communities. This forces us to trust these services with our data, even if we do not want to or do not know how they are using it. For many, they view this as the cost of life online (which is now all but required for life offline). But it does not need to be.</p>
<p>In <a href="https://blog.datproject.org/2017/12/10/dont-ship/">a recent post</a>, we shared questions we ask before shipping software. Part of the post raised privacy concerns around hosting Wikipedia on Dat. Many people in our community asked: what does this mean for privacy on Dat in general? Should I be worried!? These are great questions! And thank you for keeping us accountable.</p>
<p>We apologize for raising these points and not addressing all of them (the last post was getting a bit long!). You can read more about <a href="https://blog.datproject.org/2016/12/12/reader-privacy-on-the-p2p-web/">reader privacy models</a> specific to hosting large public datasets (like Wikipedia); this post will address how Dat privacy currently works and what we can do better.</p>
<h2 id="trustsomeone">Trust Someone</h2>
<p>To live in this world we need to trust people (I will try to keep this grounded in reality and back to privacy soon!). When we get in a car, we trust other drivers to follow the rules. We trust that the planners and engineers did their job correctly. We trust the mechanic and car designers thought about our safety. If not, we trust our government to hold people accountable. We cannot engage in social life without trust.</p>
<p>The same is true for life online. There are two questions you have to answer when thinking about trust and privacy online:</p>
<ul>
<li>Who am I trusting?</li>
<li>What information am I trusting them with?</li>
</ul>
<p>For example, if you post on Facebook, you are trusting the company and employees of Facebook with your data. You are also trusting the people you have added as friends to not share your data. This feels strange because we have two very different groups to trust. On one hand, you have a community of friends where you have some measure of how much to trust them. On the other, you have to extend trust to Facebook without much to substantiate that trust (beyond our democratic institutions and social or economic pressures).</p>
<p><em>(There are systems that aim to minimize trust, such as Tor or blockchain systems. While these do allow for less trust, they also have other negative trade-offs. In order to decrease trust, you have to make the system more complex and slower. We seek to find a balance where users can decide on how much complexity they want for their security needs (you can use Dat over Tor!). For example, we want to trust the local public library without needing to connect via Tor or use a blockchain to checkout a book.)</em></p>
<p>To us, the question is not: how can we remove all trust from the system? Trust is human. The question is: <strong>how can we create communities where we can choose who to trust and with what information?</strong> Right now we are forced to opt-in to trusting organizations to connect online. We strive to make that optional, more explicit, and more flexible.</p>
<h2 id="foundationofdatprivacy">Foundation of Dat Privacy</h2>
<p>The foundation of Dat privacy rests on private-by-default and end-to-end encryption for all metadata and content. Whenever you create and share information over Dat, only other users you <strong>explicitly share</strong> that information with can see it. There is no way other users can discover what you are sharing. This is as true for a &quot;public&quot; dat site and a private dat you are sharing with one person.</p>
<p>The second point, which we raised in the Wikipedia post, is how and when your IP address gets shared. This is where things get confusing because it depends on who you are connecting to and how you are connecting (and also where we can most improve documentation and implementation). We'll dive into different examples of how these work in practice.</p>
<h2 id="datprivacyexamples">Dat Privacy Examples</h2>
<h3 id="privatepeertopeer">Private Peer-to-Peer</h3>
<p>The simplest example, and default state for Dat, is private peer-to-peer sharing. In this case, we can imagine Joe wants to share some food pictures with Danielle. They discover each other using DNS and connect directly. No one else can see their IP addresses or Joes's food pictures. All metadata and content is private and encrypted in transit.</p>
<p>Key points in this example:</p>
<ul>
<li>Private Dat sharing has world class privacy (encrypted, no third-parties)</li>
<li>Users sharing a common Dat trust other connections with your logs. Not all applications need a large pool of peers, as in our Wikipedia example.</li>
</ul>
<h3 id="communitypeertopeer">Community Peer-to-Peer</h3>
<p>Sharing data with a community of peers is a feature unique to Dat. It is a feature impossible to mimic on how the web works today. This is somewhere in between BitTorrent (global peer-to-peer) and one-to-one connections (client-server or direct peer-to-peer). As the network grows, we have to start being a bit more careful about the privacy implications.</p>
<p>In this example, I create a feed to share on <a href="https://github.com/Rotonde/rotonde-client">Rotonde</a>, a decentralized social network. When I share that key with my friends, they can re-share my feed. Similarly to how we trust our friends with Facebook data, we may be trusting our friends to keep the information private. (However, we can start to add more complexity when privacy needs to be guaranteed, with trade-offs).</p>
<p>These communities can be organized socially (sharing with friends I know IRL) or more organically as we've seen with Rotonde, where connections of connections start sharing feeds. Thus, trust in this situation becomes a people problem, in other words — messy.</p>
<h3 id="publicdata">Public Data</h3>
<p>When considering publicly shared data, the chief concern is reader privacy (since we can assume that the data is okay to share widely). The two aspects of reader privacy are:</p>
<ul>
<li>Who can see what IP addresses are in the swarm?</li>
<li>Can users in the swarm see specifically what content other users are requesting?</li>
</ul>
<p>Concerns about sharing IP addresses may warrant its own post, as they become quite nuanced. IP addresses publicly identify your computer on a network. Whenever you connect to a website, they can view your IP address (and many analytics services store them). We want to minimize access to IP addresses and will continue to improve here. However, we need to be most careful in leaking IP addresses where there may be other surveillance or censorship concerns (such as in the Wikipedia example). Here again, we have to balance complexity and difficulty of use with common use cases — and maintain privacy in situations where it may be paramount.</p>
<p>The second question, of whether users can see what content is being requested, is similar to the difference between HTTPS and HTTP. Without HTTPS sites, ISPs (or other middlemen) can see exactly what page of wikipedia.org you request, whereas with HTTPS they only see a connection to wikipedia.org. Our <a href="https://blog.datproject.org/2016/12/12/reader-privacy-on-the-p2p-web/">reader privacy on the p2p web</a> post covers all the nuances, but we'll address some key points here.</p>
<h4 id="datwitholdwebsecurity">Dat with &quot;Old Web&quot; Security</h4>
<p>The first point is that Dat can <strong>always</strong> be at least as secure as the &quot;old web.&quot; That is, trusting one centralized organization with IP logs and information about what users are requesting.</p>
<p>Dat can be used over HTTP to act as a traditional web service (with the added benefit of having versioned websites and verifying content isn't corrupted between the source and the user). We can also have applications that connect to trusted peers directly, mimicking a client-server model but over the peer-to-peer network.</p>
<h4 id="peertopeer">Peer-to-Peer</h4>
<p>Public data with peer-to-peer distribution will be most useful in cases where we need to distribute bandwidth. As before, we can continue to increase complexity to add more privacy (see reader privacy post for more):</p>
<ul>
<li>Get data from a trusted sets of peers, i.e. Web of Trust.</li>
<li>Make it more difficult for single users to &quot;win&quot; discovery, ensuring a more unique distribution of peers in each swarm.</li>
<li>Mask content requests by adding in noise, making it harder to identify requested content.</li>
</ul>
<h3 id="protocoloptions">Protocol Options</h3>
<p>Beyond different peer-to-peer models, we can start using different protocols. Dat gives users control over how to discover &amp; connect to peers, giving more flexibility compared to the existing web. For example, you can run Dat in an offline network where all connections stay in the local network. This flexibility is what allows us to run Dat over HTTP connections.</p>
<p>Being protocol agnostic allows for more flexibility in security models. But with this customization we also have make sure security and privacy implications of different implementations are clear, an area we can really improve on.</p>
<h2 id="wherewecanimprove">Where We Can Improve</h2>
<p>Thinking through this post and the questions we got, we see a few areas we can definitely improve:</p>
<ul>
<li>Improve our DHT module so IP addresses are more secure</li>
<li>Provide option to opt-out of peer-to-peer for public data applications (i.e. use the web's traditional trust model)</li>
<li>Demonstrate how Dat over HTTP or other &quot;old web&quot; and trusted peer models can work via Dat.</li>
<li>Improve transparency! Make nice pictures that show how things work; this stuff is complicated.</li>
</ul>
<p>Most importantly, we can talk about privacy and security more! We can ask hard questions. This is an area we feel the old web really falls short and we are still building models for how this can work in the future. The balance will always be between complexity, speed, usability, and privacy. We want to be able to shift this balance as necessary for different applications but always make it clear what the trade-offs are.</p>
<p>Thanks to the community for bringing up the privacy questions. Let us know if you have more <a href="https://twitter.com/dat_project">on Twitter</a> or our <a href="http://chat.datproject.org/">chatroom</a>.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Updates on Organizational Changes: Looking Forward to the New Year]]></title><description><![CDATA[<div class="kg-card-markdown"><p>We announced organizational changes at Dat and Code for Science &amp; Society (CSS) <a href="https://blog.datproject.org/2017/12/20/organization-changes-dat-css/">last week</a>. Since that announcement, we have heard from many colleagues, community members, collaborators. We love your emails and tweets of support. Thank you to everyone for your kind words over the last week. We have started</p></div>]]></description><link>https://blog.datproject.org/2017/12/28/updates-new-year/</link><guid isPermaLink="false">5a453f321bd39535f0bf1c14</guid><category><![CDATA[Announcement]]></category><category><![CDATA[Community]]></category><dc:creator><![CDATA[Code for Science & Society]]></dc:creator><pubDate>Fri, 29 Dec 2017 00:12:25 GMT</pubDate><content:encoded><![CDATA[<div class="kg-card-markdown"><p>We announced organizational changes at Dat and Code for Science &amp; Society (CSS) <a href="https://blog.datproject.org/2017/12/20/organization-changes-dat-css/">last week</a>. Since that announcement, we have heard from many colleagues, community members, collaborators. We love your emails and tweets of support. Thank you to everyone for your kind words over the last week. We have started to outline our organizational priorities for the new year. And, importantly, Danielle and Joe took <a href="https://twitter.com/npscience/status/944314918328729600">Naomi’s advice</a> and set aside time for much-needed self-care and rest.</p>
<p>Though this situation presented many challenges, we continue to be inspired by support from the Dat and wider open source/science/scholarship/data communities. We can’t wait to build amazing things together in 2018! We are committed to transparently sharing throughout this process. Here’s what we have been working on this week:</p>
<ul>
<li>Drafting and fielding feedback on the new Dat <a href="https://github.com/datproject/Code-of-Conduct/blob/master/CODE_OF_CONDUCT.md">draft Code of Conduct</a>. We aims to showcase our values and community vision - are we hitting the mark? Please <a href="https://github.com/datproject/Code-of-Conduct/issues">open an issue</a> if something isn’t sounding right to you.</li>
<li>Identifying and communicating with candidate board members to build out the CSS board. (This sounds boring, but it’ll be quite exciting - stay tuned!)</li>
<li>Setting up individual @datproject.org email addresses! (Very professional. Enjoying the small wins.)</li>
<li>Working to expand funding opportunities for Dat and CSS. We’ll have an <a href="https://opencollective.com/">OpenCollective</a> page up in the new year! This will enable us to easily accept smaller donations and enable you to see how we are spending money. (In the meantime, if you’re looking to make a tax deductible end of year contribution, <a href="https://donate.datproject.org/">consider donating</a>!)</li>
<li>Discussing the development of a roadmap (v0.1) for organizations facing similar challenges. How can we help others navigate these issues long term? We’re discussing this idea with community stakeholders and look forward to opportunities for sharing our experiences through the last few weeks.</li>
</ul>
<p>As we move into 2018, we are confident that this incredible and supportive community will continue to grow amazing things.  As we push for long term sustainability for Dat, we want to share the public community response with our funders (who have invested much in Dat over the years). If you messaged us personally, we will reach out to ask permission to share your message. If you have additional feedback please email us at <a href="mailto:community@datproject.org">community@datproject.org</a> (and please tell us if you’re okay sharing your feedback).</p>
<p>If you're at CCC in Leipzig, say <a href="https://twitter.com/mafintosh/status/945992801170481152">hi to Mathias </a>! Happy New Year - we can’t wait to work together in 2018.</p>
<p>Thanks,<br>
Danielle + Joe<br>
(<a href="mailto:community@datproject.org">community@datproject.org</a>)</p>
</div>]]></content:encoded></item></channel></rss>