Week 1 notes completed

This commit is contained in:
levdoescode
2022-12-17 09:14:29 -05:00
parent 1fc9ec4c91
commit 92d152491f

View File

@ -0,0 +1,46 @@
# Where does data come from
* New Data
* Pre-existing
* Internal 'legacy' data
* External data
## New data
* Adding as you go
* Bulk data entry
## Pre-existing data
We may need to perform
* Extraction
* Conversion
* Cleaning
## External sources
Possitives
* No costs for data entry
* No costs for quality checks
* Delegate expertise
Negatives
* No control over data quality
* No control over data structure
* May be incomplete
* May be ambiguous
* Questions of trustworthiness
# What does your data look like?
Sometimes the external source of information may be ambiguous or incomplete according to our expectations.
Different interests will shape the content of the data we want to represent.
# Licenses, sharing and ethics
## Why would someone let me use their data?
* Drive sales (Commercial reasons)
* For the common good (ethical reasons)
* Contract requirements (contractual reasons, such as government contracts)
## Why not publish open data?
* Restrictions on source data (e.g. medical records)
* Control of use
* Value of the data, you're in the business of selling data.