Reading .csv files with and without headers using different delimiters in Mule 4
Flat files especially .csv files are very common for data loading and data transfers. We generally come across scenarios where we want to read a .csv file, which may or may not contain the header row. Also, there is a chance .csv files don't have ','(comma) as delimiter rather has ''|"(pipe) as a delimiter or any other character as a delimiter. This article will discuss various ways to read such files in Mule 4 and create a JSON output of it.
Mule Flow
This will be a simple flow, where we will be reading a file using a file connector. Then convert the read data to JSON using transform message. Finally log the JSON payload to console using Logger. Here, we are not doing any complex transformation, it will be more like a MIME type conversion from application/csv to application/json, for better understanding the structure of input data.
Steps:
.csv file having header row in the file content
Let us take below employeeData.csv file as an example. For simplicity, we are taking dummy data of only 5 employees.
id,first name,last name,email,gender,designation
1,Dom,Lissett,dlissett0@test.com,Male,Recruiter
2,Susan,Moogan,smoogan1@test.com,Female,Executive Secretary
3,Carl,Cotesford,lcotesford2@test.com,Male,Tech Architect
4,Debor,Sprakes,dsprakes3@test.com,Female,Account Coordinator
5,Priscella,Genny,pgenny5@test.com,Female,Environmental Tech
In this example, data in the first row signifies the headers separated by "," as a delimiter. Rest all other rows represent employee data.
INFO 2021-07-27 23:08:55,916 [[MuleRuntime].uber.03: [csv-read].csv-readFlow.CPU_INTENSIVE @1e42a374] [processor: csv-readFlow/processors/1; event: 895d1200-efb1-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
{
"id": "1",
"first name": "Dom",
"last name": "Lissett",
"email": "dlissett0@test.com",
"gender": "Male",
"designation": "Recruiter"
},
{
"id": "2",
"first name": "Susan",
"last name": "Moogan",
"email": "smoogan1@test.com",
"gender": "Female",
"designation": "Executive Secretary"
},
{
"id": "3",
"first name": "Carl",
"last name": "Cotesford",
"email": "lcotesford2@test.com",
"gender": "Male",
"designation": "Tech Architect"
},
{
"id": "4",
"first name": "Debor",
"last name": "Sprakes",
"email": "dsprakes3@test.com",
"gender": "Female",
"designation": "Account Coordinator"
},
{
"id": "5",
"first name": "Priscella",
"last name": "Genny",
"email": "pgenny5@test.com",
"gender": "Female",
"designation": "Environmental Tech"
}
]
Data in the first row became the keys for the JSON output.
.csv file without header row in the file content
Let us assume there is no header row in the content of employeeData.csv
1,Dom,Lissett,dlissett0@test.com,Male,Recruiter
2,Susan,Moogan,smoogan1@test.com,Female,Executive Secretary
3,Carl,Cotesford,lcotesford2@test.com,Male,Tech Architect
4,Debor,Sprakes,dsprakes3@test.com,Female,Account Coordinator
5,Priscella,Genny,pgenny5@test.com,Female,Environmental Tech
If we try to run with the same configurations, it will take the first row as keys and console output will look like.
Recommended by LinkedIn
INFO 2021-07-27 23:21:15,609 [[MuleRuntime].uber.03: [csv-read].csv-readFlow.CPU_INTENSIVE @1e42a374] [processor: csv-readFlow/processors/1; event: 4254cbd0-efb3-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
{
"1": "2",
"Dom": "Susan",
"Lissett": "Moogan",
"dlissett0@test.com": "smoogan1@test.com",
"Male": "Female",
"Recruiter": "Executive Secretary"
},
{
"1": "3",
"Dom": "Carl",
"Lissett": "Cotesford",
"dlissett0@test.com": "lcotesford2@test.com",
"Male": "Male",
"Recruiter": "Tech Architect"
},
{
"1": "4",
"Dom": "Debor",
"Lissett": "Sprakes",
"dlissett0@test.com": "dsprakes3@test.com",
"Male": "Female",
"Recruiter": "Account Coordinator"
},
{
"1": "5",
"Dom": "Priscella",
"Lissett": "Genny",
"dlissett0@test.com": "pgenny5@test.com",
"Male": "Female",
"Recruiter": "Environmental Tech"
}
]
By default, after setting MIME type to "application/json", mule considers the first row as the header row. In such cases, we don't want to consider the first row as the header row.
Now, by using the same file, we will be able to read data properly. But in this case, autogenerated values for keys like coloumn_0, column_1, etc. will be populated. Below is a sample log snippet.
INFO 2021-07-27 23:35:57,158 [[MuleRuntime].uber.05: [csv-read].csv-readFlow.CPU_INTENSIVE @6fbaa2fe] [processor: csv-readFlow/processors/1; event: 4fbf0590-efb5-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
{
"column_0": "1",
"column_1": "Dom",
"column_2": "Lissett",
"column_3": "dlissett0@test.com",
"column_4": "Male",
"column_5": "Recruiter"
},
{
"column_0": "2",
"column_1": "Susan",
"column_2": "Moogan",
"column_3": "smoogan1@test.com",
"column_4": "Female",
"column_5": "Executive Secretary"
},
{
"column_0": "3",
"column_1": "Carl",
"column_2": "Cotesford",
"column_3": "lcotesford2@test.com",
"column_4": "Male",
"column_5": "Tech Architect"
},
{
"column_0": "4",
"column_1": "Debor",
"column_2": "Sprakes",
"column_3": "dsprakes3@test.com",
"column_4": "Female",
"column_5": "Account Coordinator"
},
{
"column_0": "5",
"column_1": "Priscella",
"column_2": "Genny",
"column_3": "pgenny5@test.com",
"column_4": "Female",
"column_5": "Environmental Tech"
}
]
.csv file without header row and having a "|" as a delimiter
If we replace "," delimiter with "|" delimiter in employeeData.csv
1|Dom|Lissett|dlissett0@test.com|Male|Recruiter
2|Susan|Moogan|smoogan1@test.com|Female|Executive Secretary
3|Carl|Cotesford|lcotesford2@test.com|Male|Tech Architect
4|Debor|Sprakes|dsprakes3@test.com|Female|Account Coordinator
5|Priscella|Genny|pgenny5@test.com|Female|Environmental Tech
On running the same mule application with the above data, the output of the above employeeData.csv file will be.
INFO 2021-07-27 23:42:06,938 [[MuleRuntime].uber.05: [csv-read].csv-readFlow.CPU_INTENSIVE @6fbaa2fe] [processor: csv-readFlow/processors/1; event: 2c2f0f20-efb6-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
{
"column_0": "1|Dom|Lissett|dlissett0@test.com|Male|Recruiter"
},
{
"column_0": "2|Susan|Moogan|smoogan1@test.com|Female|Executive Secretary"
},
{
"column_0": "3|Carl|Cotesford|lcotesford2@test.com|Male|Tech Architect"
},
{
"column_0": "4|Debor|Sprakes|dsprakes3@test.com|Female|Account Coordinator"
},
{
"column_0": "5|Priscella|Genny|pgenny5@test.com|Female|Environmental Tech"
}
]
Now, we are getting only one key-value pair for each row. By default mule considers "," as delimiter for "application/csv" MIME type data. We need to explicitly define "|" as a delimiter in parameters under the MIME type.
Now, by using the same file, we will be able to read data properly as shown below.
INFO 2021-07-27 23:51:57,557 [[MuleRuntime].uber.06: [csv-read].csv-readFlow.CPU_INTENSIVE @6efd4175] [processor: csv-readFlow/processors/1; event: 8c31a710-efb7-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
{
"column_0": "1",
"column_1": "Dom",
"column_2": "Lissett",
"column_3": "dlissett0@test.com",
"column_4": "Male",
"column_5": "Recruiter"
},
{
"column_0": "2",
"column_1": "Susan",
"column_2": "Moogan",
"column_3": "smoogan1@test.com",
"column_4": "Female",
"column_5": "Executive Secretary"
},
{
"column_0": "3",
"column_1": "Carl",
"column_2": "Cotesford",
"column_3": "lcotesford2@test.com",
"column_4": "Male",
"column_5": "Tech Architect"
},
{
"column_0": "4",
"column_1": "Debor",
"column_2": "Sprakes",
"column_3": "dsprakes3@test.com",
"column_4": "Female",
"column_5": "Account Coordinator"
},
{
"column_0": "5",
"column_1": "Priscella",
"column_2": "Genny",
"column_3": "pgenny5@test.com",
"column_4": "Female",
"column_5": "Environmental Tech"
}
]
References:
Please share your valuable feedback 😊
Mulesoft Lead at HCL Technologies
1yHi Pranav Davar I have a requirement same as your last option .csv file without a header and "|" as a delimeter Tried in a same way and it didn't work for me. can you please help? This is my Input payload - 11281169|11091102|GULF COAST MARKET|2385 HIGHWAY 87|CRYSTAL BEACH|TX|77650| |4096842400|12040941614|5|0|0| |103|000980|S|04|0|D|0| | |0| |01|1000| |02|0|0|0|002|1001946|2023 TIER 1 (LF)|DPTX-W| |4416700000|0|0041|0|0|INDEPEND GROCERY|INDEPENDENT GROCERY|355|48|46526.88|0|LF| | | |GALVESTON| 11281611|11091102|BAYVUE GROCERY|1901 HWY 87|PORT BOLIVAR|TX|77650| |4096846220|17604685069|1|0|0| |112| |A|04|0|D|0| | |0| |01|1000| |01|HOU4287|0|0|002|1002433| |DPTX-W| |4416700000|0|0040|0|0|IBC|INDEPEND CONVENIENCE|355|48|0.00|0.00|SF| | | |GALVESTON| 11282565|11091102|DANNAY DONUTS|2124 HIGHWAY 87|CRYSTAL BEACH|TX|77650|KER|7136841778|32010714270|1|0|0| |214|000068|A|04|0|D|0| | |0| |01|1000| |01|0|0|0|002|1001231|REST/BARS/TAVERNS|DPTX-W| |4416700000|0|0040|0|0|ALL OTHER|QSR OTHER|355|48|0.00|0.00|SF| | | |GALVESTON|
SAP ABAP Specialist consultant | Mulesoft developer | Computer engineer
1yVery useful! 👏
Excellent article!
Software Engineer at Salesforce | Ex NTT Data | Ex Apisero
1yHI,if we want to define multiple delimiters then how do we do it like | and ;(semi-colon)etc
Solution Delivery Center Manager | Driving Operational Excellence & Client Satisfaction | Driving Operational Excellence | Strategic Planning & Execution | Process Improvement | Team Leader & Mentor
2yhow to remove the autogenerated keys "column_0, columns_1,..."?? I need the value only