Session: Digging Into Privacy

Date: Thursday, July 30, 2020
Time: 3:00 - 3:45 pm (CDT) (UTC-05:00)
Track: Truck Sheep
Format: General Lecture Session

Who is this session for?

Anyone interested in privacy, but especially site managers, developers, and those that choose plugins or build sites.

Session description

Expectations and legal requirements around security and data privacy are evolving at a record pace. Let’s discuss the relevant tools and best practices for education organizations using WordPress in various ways. We’ll share how to find out what plugins might be collecting, how to minimize your data footprint, and what the WordPress community and project is actively working on for the future.

Presenter

Ronnie Burt

Headshot of Ronnie Burt
Chief Business Officer, Incsub

Ronnie leads the team behind the CampusPress, Edublogs, and WPMU DEV Hosting services, which power thousands of WordPress sites for higher ed institutions small and large. He got his start on the web doing web accessibility audits at The University of Texas nearly 20 years ago. A former educator, wannabe musician and math nerd, Ronnie lives in Austin, Texas.

Sessions

Session video

Session transcript

Presenter: These slides are linked to in my latest tweet @ronnieburt. Feel free to DM me there or find me in the WPCampus Slack community. I love to talk about this stuff.

What kind of stuff are we talking about? I will start with my background on privacy. I work for a company called Incsub. They have campus press, Edu and [unsure.] When GDPR was rolling out, none of us knew what we had to do. We knew we had work to do and we had to solidify our privacy practices and be transparent about them to our customers. That's when I started taking online courses and research the information about security. I love to share it with schools and universities.

When it comes to privacy, there are so many laws. There's the big one in the world, that's related with the EU, GDPR that we have heard about. It has impacted the services we subscribe to. Also, those of us in education, any privacy breach is headline worthy or front-page news. You don't want to be a part of that story and be a part of a data breach. I put a line through privacy shield. It has been struck down by the highest court in the European Union. It was a framework that 5000 companies, including ours, sent our privacy policies and practices and we were following those so we could process that data from Europe. The day after our company re-self-certified, it was stricken down. Companies can still subscribe so the FTC can still hold us accountable.

One interesting caveat. You might think this is about you and your data and what it means for the consumer but the reason it was struck down was the US and how they handled EU data. There's concern around how our government is being able to access the data of European citizens. Companies can still process date from people from Europe, but we have to use something other than Privacy Shield. There is interesting research that suggest that if the data is housed in the US, it's more secure than other countries.

I am in the Core Privacy Slack channel. This came from the cofounder of WordPress who is instrumental in the future of WordPress. I encourage you to read the full context of this quote, but the fear is that WordPress Core asserts that we can't reasonably accommodate every country and state. I think a lot of us were concerned at first, but I got to thinking maybe it's not as limiting as it sounds. Maybe we don't accommodate specific regulations, but we need to make sure WordPress Core and WordPress sites and plugins we are hosting are following privacy practices.

I am a big believer that compliance creates minimum standards. Everyone will meet the minimum standard, when the goals are to be as privacy conscious as possible. WordPress is not going to handle privacy, but we can say WordPress and all of us in the community and the ones developing WordPress sites, we can implement and follow best privacy practices.

One of the terms that is thrown around a lot in privacy is privacy by design. It's a construct from a Canadian organization. They came up with 7 topics or guidelines in privacy by design. There is a lot of information about this. You are not going to bolt on privacy when we build a new site or plugin. We need to be privacy conscious from the beginning. You aren't going to assume a user has consent and it has to be transparent and open and user centric. This is high level and a lot of folks can talk about it better than me. It's a good place to bring our conversations back to. When it comes to WordPress or our individual sites, we need to come back to privacy by design.

Another term you will hear is a risk-based approach. This is the idea that what is good for your situation might not be good enough for mine and vice-versa. I will go through a list of areas of concern and it will be overwhelming. If we take the risk-based approach, this will be easier. As we work with customers and security and privacy teams, we see that especially the larger an organization is, the higher the risk everything is.

I was reading guidelines yesterday from a large state school in California. It started with how we would classify data and I looked at how WordPress was going to host it. We weren't hosting HIPPA data. The last bullet said any personal information. That bit at the end means that, in WordPress database, when you create a WordPress user, they have an email attached to them. We are using single sign on with a school email, and we go in the third highest risk category.

They are being treated to the same standard as employee record database management and student transcripts. You are put in the same category for what security measures are in place because of the email address associated with the user. It does bring the conversation about what is personal information. If your state or country has a law, most of them will be specific around what's personal. In Arizona, it's first initial last name, or anything more than that. Email addresses, or social security or driver’s license numbers. Geolocation data, especially in the European Union. So, every visitor and every email that includes the IP address. Biometric data is not super relevant, but there is a plugin for thumbprints to login to WordPress. Browsing history, which is relevant to us. We want to know what pages they are visiting on our site and tying it to other identifiers to figure out where they were going, that's personal information.

Psychometric data is any psychological data like surveys is counted. Inferences are considered personal data, like if you are using geolocation data to show different content on a page when someone arrives based on if they came from a social media campaign, etc. That's a part of the personal information record. People talk about what's personal all the time and it does get complicated.

For any WordPress site you are a part of or working on or hosting, it's a good idea to regularly do a data audit. It's no fun. I have a list of a few areas you want to consider and think about when doing this audit. This is a risk-based approach. For landing pages, you want to do an audit that's more formal and you want to keep the records to a smaller subsection site in an ad-hoc, evidence you did it is fine, but what information your host collects, like IP addresses. Is it integrating with marketing or CRM tools? Does your site use cookies? What do they include? Do you have contact forms or other ways to collect data on the sites or notification emails coming from your site that might include personal data? Someone could send a form and get a notification email, -- we don't use comment sections much, but if they get a notification email. You likely do have a system that uses notifications. How is payment data stored? I don't have a formal example of what it would look like. A large organization might have an example for it.

You might want someone coming it to do it, so you might want to do an audit ahead of time. If it's a small, simple site, we all have friends that ask to look at their site or work on it, these are my go-to what I want to know. These were built with .com. It tells what service it's integrated with. You can see the cookies tab and different integrations in google chrome inspector.

When we were getting ready for our GDPR stuff, we had a service scan our domain for cookies, and it came across hundreds. We have a blog that's 15 years old and has 10,000 posts over those years. Any time any author embedded something a cookie got added to that load of that blog post, so it became a good report for us to see what old content we could repurpose or delete. We wanted to declare each cookie and its purpose so that list gets long quick.

Another big phrase is this slide is boring and gray but it's important. If you take one thing from this presentation is data minimization. It takes doing the audit first, but if you know what your site is doing, think about what data your site is collecting and ask if you are using that data. If not, get rid of it. I have this debate a lot with our marketing teams that like to store long-term data so we can run nice reports, but it's not -- we're not going to go through old data and fish through it. If we do, the data is old and stale anyway. We had some heatmapping tools that showed how folks were trafficking around sites and we had moved on without looking at the results. It was slowing down the site and we weren't going to use it.

What you are going to keep, can you anonymize what you are keeping? Do you need a full name or email address, or username attached? Can you delete it or hash it? Even though hash is personal data, it's better than clear text in the original form. Google analytics has some settings, we can talk about it if you need it to make sure you are anonymizing the traffic. If you do have a purpose and have visitors opt-in you will need to do that.

It's good to think about how long you will need data. We had old contact forms on our sites from contests years ago. That data was just sitting there, and we didn't need it laying around for someone to access it down the road. There's not a set of rules, but it depends on the risk and situation and if you have a purpose or need for it.

You can't have a talk about privacy without talking about security. You could have great privacy practices, but it can get stolen with bad security practices. We are all well informed in this area, but we will talk about up to date. Always stay on the latest version. I know our customers don't always like to update in the middle of a crazy time of year, and they want to just do a security patch. WordPress Core will do security updates to old versions if you know where to look as an example. We are also striving to be as up to date as possible. When you add more plugins, it becomes harder to juggle.

We are familiar with SSL and encryption in transit but not everyone is set up to encrypt the WordPress instance at rest, which is good practice. It's something we should all strive for in how we manage users through single sign on. We need multi-factor authentication. It depends on what your site needs. If a security plugin is needed, you don't need one just for the sake of having one, but it depends on what your site is doing. It's more normal to have a WAF in place. That's beyond traditional firewall IP blocking. It's monitoring traffic in real time and looking for signatures that are potentially malicious and blocking them.

There is a network WAF like Cloudflare, where DNS runs through that. If that traffic is okay, it's sent to your servers. You have them on the server side, but before it gets to WordPress and you have security plugins living in WordPress. It's definitely better than nothing.

You definitely need to have backups in place, which isn't new. I do want to point out two things though. This is less of an issue but making sure the backups are encrypted and not stored on the Same server as the WordPress site, so they are more helpful if something happens to the server. We might get backups through cPanel, but they aren't set up offsite when we do migrations and it puts them on the same server.

The good news for those that work in WordPress is that when you read the news and look through the data on schools and universities that are hacked, the site is not that common of a problem in the chain. The most common from what I have seen is phishing campaigns or other ways people have gotten into an employee's or students’ email. That's why we take annual security training to identify phishing attempts. What's relevant here from a WordPress perspective is contact forms. If you are sending a CSV file of data from your users forms as an attachment, people are looking for attachments when they get access to these accounts. They are looking for CSV files that might have data, so you will see that where attacks happen. If we don't send them as attachments and we send them as an expiring dropbox link, that's safer.

Another interesting story: about a year ago we got an email from a client that was an urgent help ticket. They were worried their site had been hacked. Our team went on alert and started digging. It was a WordPress multisite network and it had sites written in Arabic. It looked like it was just shoes or marketing apparel with good links. We thought it might be a student running a side business, but we got an email later saying the same thing, so it was not just a one-off thing.

We noticed some common threads in the content. We started working with the security departments of those companies. They were valid user accounts and had been validated. It wasn't directly a hack in that there was no hole or backdoor or a rogue plugin. We looked through google and saw other universities that weren't our customers saying the same thing. We reached out to them to tell them they should take it down. These accounts were purchased on the dark web that were legit accounts and some enterprise thought it was a good idea because EDU domains are good for SEO. It wasn't a true hack, but it gets back to emails.

If you are collecting data in forms, we had a client moving to us that had a practice we should avoid. They had built using a form plugin, their HR form for applications for jobs. This form was asking for uploads of transcripts and asking for copies of sensitive data. Passports and stuff like that. It was going into the WP-content folder along the rest of the content. Is WordPress, which is designed to be built for public content and information the best place to house this data? I think if you have a private WordPress site or multisite with good security, fine. You have to think about what your forms are collecting and if WordPress is the best place for it.

You can see a story about fortress DB that integrates with multiple forms to upload the data, so it's not stored in WordPress databases. Your marketing department shouldn't have access to that data, your HR department should. It's to make sure as few people have access as possible.

Just briefly another thing we see is getting consent. All you site designers are getting well versed in this. All data collection needs to be opt-in. You need to have people check a check box and you can't ask for consent for a hundred things at once. You need to make it friendly. You should ask for consent in real-time.

There are tools hidden under settings privacy, you will see the privacy language from the WordPress team. I don't recommend that a large school get their privacy policy from this link, but there are some information plugin providers can get that says what information they are collecting. It is hidden, and I can provide better examples for that. Encourage plugin providers to include that information if they aren't already. It's a big goal of mine to make it more obvious what plugin providers are collecting and their integrations.

Another new tool is data & deletion requests. It's about to work better under multisite. It's where someone can request the data WordPress has collected and plugins can hook into this. You can get a copy of it and delete or anonymize it using this tool, so hopefully more people do that. This is a requirement in California law, and probably all of us eventually that we can provide what data we have on someone as an organization and we need easy ways to do that.

There is a team working hard on a new consent API that plugin authors can hook into to have a standard way of collecting and storing a log of consent and not running analytics until someone has opted in. It was pitched as a promoted plugin, but I don't think the reception was as good as I'd hoped. The plugin is there, and usable and other plugin authors are taking it as a standard, so I would check that out.

You can also read about the privacy team and what they think WordPress Core could do. You may have noticed puns in the slides from my marketing team in our 2-3 times a week newsletter. Our design team did the graphics so I can't take credit for that either. I can stick around for any questions.

Room Host: Awesome. We don't have too much time so we will answer one and you can answer more on the website. One person asked what encrypted at rest means?

Presenter: It's encrypted on the database on the server, so if someone copied it, they couldn't open it. You can encrypt file uploads. The technicality might be over my head.

Room Host: We have time for the one more question. How does one implement good privacy practices with google analytics or Facebook business pages? Do you have resources for user privacy?

Presenter: We did some work around Google analytics. We will try to put something together and put them as an Answer to this question.

Login to WordPress